Matrix Algorithms

This page intentionally left blank

Matrix Algorithms
Volume II: Eigensystems

G. W. Stewart
University of Maryland College Park, Maryland

Society for Industrial and Applied Mathematics Philadelphia

siam

Copyright © 2001 by the Society for Industrial and Applied Mathematics. 1098765432 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688. MATLAB is a registered trademark of The MathWorks, Inc., 3 Apple Hill Dr., Natick, MA 01760-2098, USA, tel. 508-647-7000, fax 508-647-7001, info@mathworks.com, http://www. mathworks. com. The Library of Congress has catalogued Volume I as follows: Library of Congress Cataloging-in-Publication Data Stewart, G. W. (Gilbert W.) Matrix algorithms / G. W. Stewart. p. cm. Includes bibliographical references and index. Contents: v.1. Basic decompositions ISBN 0-89871-414-1 (v. 1 : pbk.) 1. Matrices. I. Title. QA188.S714 1998 512.9'434-dc21 0-89871-414-1 (Volume I) 0-89871-503-2 (Volume II) 0-89871-4184 (set)
siam ELLEUTL is a registered trademark.

98-22445

CONTENTS
Algorithms Preface 1 Eigensystems 1 The Algebra of Eigensystems 1.1 Eigenvalues and eigenvectors Definitions. Existence of eigenpairs: Special cases. The characteristic equation and the existence of eigenpairs. Matrices of order two. Block triangular matrices. Real matrices. 1.2 Geometric multiplicity and defective matrices Geometric multiplicity. Defectiveness. Simple eigenvalues. 1.3 Similarity transformations 1.4 Schur Decompositions Unitary similarities. Schur form. Nilpotent matrices. Hermitian matrices. Normal matrices. 1.5 Block diagonalization and Sylvester's equation From block triangular to block diagonal matrices. Sylvester's equation. An algorithm. Block diagonalization redux. 1.6 Jordan form 1.7 Notes and references General references. History and Nomenclature. Schur form. Sylvester's equation. Block diagonalization. Jordan canonical form. 2 Norms, Spectral Radii, and Matrix Powers 2.1 Matrix and vector norms Definition. Examples of norms. Operator norms. Norms and rounding error. Limitations of norms. Consistency. Norms and convergence. 2.2 The spectral radius Definition. Norms and the spectral radius. 2.3 Matrix powers Asymptotic bounds. Limitations of the bounds.
v

xv xvii 1 1 2

6 8 10 15 20 23

25 25

30 33

vi

CONTENTS
2.4 Notes and references Norms. Norms and the spectral radius. Matrix powers. 3 Perturbation Theory 3.1 The perturbation of eigenvalues The continuity of eigenvalues. Gerschgorin theory. Hermitian matrices. 3.2 Simple eigenpairs Left and right eigenvectors. First-order perturbation expansions. Rigorous bounds. Qualitative statements. The condition of an eigenvalue. The condition of an eigenvector. Properties of sep. Hermitian matrices. 3.3 Notes and references General references. Continuity of eigenvalues. Gerschgorin theory. Eigenvalues of Hermitian matrices. Simple eigenpairs. Left and right eigenvectors. First-order perturbation expansions. Rigorous bounds. Condition numbers. Sep. Relative and structured perturbation theory. 36 37 37

44

52

2 The QR Algorithm 1 The Power and Inverse Power Methods 1.1 The power method Convergence. Computation and normalization. Choice of a starting vector. Convergence criteria and the residual. Optimal residuals and the Rayleigh quotient. Shifts and spectral enhancement. Implementation. The effects of rounding error. Assessment of the power method. 1.2 The inverse power method Shift-and-invert enhancement. The inverse power method. An example. The effects of rounding error. The Rayleigh quotient method. 1.3 Notes and references The power method. Residuals and backward error. The Rayleigh quotient. Shift of origin. The inverse power and Rayleigh quotient methods. 2 The Explicitly Shifted QR Algorithm 2.1 The QR algorithm and the inverse power method Schur from the bottom up. The QR choice. Error bounds. Rates of convergence. General comments on convergence. 2.2 The unshifted QR algorithm The QR decomposition of a matrix polynomial. The QR algorithm and the power method. Convergence of the unshifted QR algorithm. 2.3 Hessenberg form

55 56 56

66

69

71 72

75

80

CONTENTS
Householder transformations. Reduction to Hessenberg form. Plane rotations. Invariance of Hessenberg form. 2.4 The explicitly shifted algorithm Detecting negligible subdiagonals. Deflation. Back searching. The Wilkinson shift. Reduction to Schur form. Eigenvectors. 2.5 Bits and pieces Ad hoc shifts. Aggressive deflation. Cheap eigenvalues. Balancing. Graded matrices. 2.6 Notes and references Historical. Variants of the QR algorithm. The QR algorithm, the inverse power method, and the power method. Local convergence. Householder transformations and plane rotations. Hessenberg form. The Hessenberg QR algorithm. Deflation criteria. The Wilkinson shift. Aggressive deflation. Preprocessing. Graded matrices. 3 The Implicitly Shifted QR Algorithm 3.1 Real Schur form Real Schur form. A preliminary algorithm. 3.2 The uniqueness of Hessenberg reduction 3.3 The implicit double shift A general strategy. Getting started. Reduction back to Hessenberg form. Deflation. Computing the real Schur form. Eigenvectors. 3.4 Notes and references Implicit shifting. Processing 2x2 blocks. The multishifted QR algorithm. 4 The Generalized Eigenvalue Problem 4.1 The theoretical background Definitions. Regular pencils and infinite eigenvalues. Equivalence transformations. Generalized Schur form. Multiplicity of eigenvalues. Projective representation of eigenvalues. Generalized shifting. Left and right eigenvectors of a simple eigenvalue. Perturbation theory. First-order expansions: Eigenvalues. The chordal metric. The condition of an eigenvalue. Perturbation expansions: Eigenvectors. The condition of an eigenvector. 4.2 Real Schur and Hessenberg-triangular forms Generalized real Schur form. Hessenberg-triangular form. 4.3 The doubly shifted QZ algorithm Overview. Getting started. The QZ step. 4.4 Important miscellanea Balancing. Back searching. Infinite eigenvalues. The 2x2 problem. Eigenvectors. Vol II: Eigensystems

vii

94 104

110

113 113 116 117

128 129 130

143 147 152

viii
4.5 Notes and References The algebraic background. Perturbation theory. The QZ algorithm.

CONTENTS
155

3 The Symmetric Eigenvalue Problem 1 The QR Algorithm 1.1 Reduction to tridiagonal form The storage of symmetric matrices. Reduction to tridiagonal form. Making Hermitian tridiagonal matrices real. 1.2 The symmetric tridiagonal QR algorithm The implicitly shifted QR step. Choice of shift. Local convergence. Deflation. Graded matrices. Variations. 1.3 Notes and references General references. Reduction to tridiagonal form. The tridiagonal QR algorithm. 2 A Clutch of Algorithms 2.1 Rank-one updating Reduction to standard form. Deflation: Small components of z. Deflation: Nearly equal di. When to deflate. The secular equation. Solving the secular equation. Computing eigenvectors. Concluding comments. 2.2 A Divide-and-conquer algorithm Generalities. Derivation of the algorithm. Complexity. 2.3 Eigenvalues and eigenvectors of band matrices The storage of band matrices. Tridiagonalization of a symmetric band matrix. Inertia. Inertia and the LU decomposition. The inertia of a tridiagonal matrix. Calculation of a selected eigenvalue. Selected eigenvectors. 2.4 Notes and References Updating the spectral decomposition. The divide-and-conquer algorithm. Reduction to tridiagonal form. Inertia and bisection. Eigenvectors by the inverse power method. Jacobi's method. 3 The Singular Value Decomposition 3.1 Background Existence and nomenclature. Uniqueness. Relation to the spectral decomposition. Characterization and perturbation of singular values. First-order perturbation expansions. The condition of singular vectors. 3.2 The cross-product algorithm Inaccurate singular values. Inaccurate right singular vectors. Inaccurate left singular vectors. Assessment of the algorithm. 3.3 Reduction to bidiagonal form The reduction. Real bidiagonal form.

157 158 158

163

169

171 171

181 186

201

203 203

210

215

Combinations of subspaces.5 A hybrid QRD-SVD algorithm 3. Deflation. 1. Uniqueness of simple eigenspaces.1 Canonical angles Definitions. The differential qd algorithm. Subspaces of unequal dimensions.1 Definitions Eigenbases. Block deflation. 4 The Symmetric Positive Definite (S/PD) Generalized Eigenvalue Problem 4. The Rayleigh quotient. Back searching.4 The QR algorithm The bidiagonal QR step.2 Simple eigenspaces Definition. The generalized singular value and CS decompositions. Spectral projections. 4.1 Theory The fundamental theorem. 2. Bounds in terms of E.2 Residual analysis Optimal residuals.6 Notes and references The singular value decomposition. Choice of subspaces. 1.CONTENTS 3. Existence of eigenspaces. Ill-conditioned B. 3. The definite generalized eigenvalue problem. Computing canonical angles. Reduction to bidiagonal form and the QR algorithm. Final comments. 2. The cross-product algorithm. The QRD-SVD algorithm. Divide-and-conquer algorithms. Block diagonalization and spectral representations. Computation of R-T A R . Jacobi methods. Computing the shift. Angles between the subspaces. Residual bounds for eigenvalues of Hermitian matrices. Perturbation theory. Wilkinson's algorithm. Condition of eigenvalues. Band matrices.3 Perturbation theory The perturbation theorem. 2 Perturbation Theory 2. The resolvent. The Rayleigh quotient and the condition of L.3 Notes and references Perturbation theory. Negligible superdiagonal elements. Uniqueness of simple eigenspaces. Backward error. 2.3 Notes and references Eigenspaces.4 Residual bounds for Hermitian matrices Vol II: Eigensystems ix 217 226 226 229 229 231 234 239 240 240 244 247 248 248 251 258 262 . Residual bounds. 4 Eigenspaces and Their Approximation 1 Eigenspaces 1. 4. The quantity sep.1 .2 Wilkinson's algorithm The algorithm. Downdating a singular value decomposition.

4 Harmonic Ritz vectors Computation of harmonic Ritz pairs. 3 Krylov Subspaces 3. 4. Chebyshev polynomials. The Rayleigh quotient. Residual bounds for eigenspaces and eigenvalues of Hermitian matrices. The orthogonal Rayleigh-Ritz procedure. Convergence (non-Hermitian matrices).2 Convergence Setting the scene. Hermitian matrices. Convergence of Ritz blocks and bases. 3. Uniqueness. Reduced Arnoldi decompositions. 4. Residual bounds for eigenvalues. Refined Ritz vectors.x CONTENTS Residual bounds for eigenspaces.1 Krylov sequences and Krylov spaces Introduction and definition. 264 266 266 269 277 279 280 281 284 289 292 295 5 Krylov Sequence Methods 1 Krylov Decompositions 1.1 Arnoldi decompositions The Arnoldi decomposition. Convergence (Hermitian matrices). Residuals and convergence. The computation of refined Ritz vectors. Block deflation and residual bounds. Convergence of refined Ritz vectors. Non-Hermitian matrices. Elementary properties. Harmonic Ritz vectors. Termination. Canonical angle. Perturbation theory.1 Rayleigh-Ritz methods Two difficulties. Convergence redux. Computing Arnoldi 297 297 298 . Residual eigenvalue bounds for Hermitian matrices. Assessment of the bounds. 4. 2.5 Notes and references Historical. Convergence of Ritz values revisited.4 Notes and references Sources. Optimal residuals and the Rayleigh quotient.5 Notes and references General references. Convergence of the Ritz value. 3. Krylov sequences. Rayleigh-Ritz and generalized eigenvalue problems.2 Convergence The general approach. 4 Rayleigh-Ritz Approximation 4. Backward error.3 Block Krylov sequences 3.3 Refined Ritz vectors Definition. The polynomial connection. 4. Convergence. Multiple eigenvalues. Block Krylov sequences. Convergence of Ritz vectors.

restarting.2 Krylov-Schur restarting Exchanging eigenvalues and eigenblocks. Forward instability. Deflation in the Krylov-Schur method. Stability and shift-and-invert enhancement. Nonorthogonal bases. Shift-and-invert enhancement and the Rayleigh quotient. and shifts. 2. The Krylov-Schur method. 1. 3 The Lanczos Algorithm 3. Similarity transformations. Choice of tolerance. The restarted Arnoldi cycle. Krylov decompositions. Convergence criteria. 1. and deflation Stability. 2. The Krylov-Schur cycle. 2 The Restarted Arnoldi Method 2.5 Notes and references Lanczos and Arnoldi.CONTENTS decompositions. Equivalence to an Arnoldi decomposition. Krylov-spectral restarting. The Lanczos recurrence. 2.3 Semiorthogonal methods xi 306 308 314 315 316 317 325 332 342 344 347 348 351 352 Vol II: Eigensystems . and filter polynomials. Reverse communication.3 Stability. Forward instability. convergence. 1.4 Rational Krylov Transformations Krylov sequences. 1. Exact shifts. Convergence. Deflation. inverses. Comparison of the methods. 3. The rational Krylov transformation. Convergence with shift-and-invert enhancement. Shift-and-invert enhancement and Orthogonalization. ARPACK. Implicitly restarted Arnoldi. Deflation.4 Computation of refined and Harmonic Ritz vectors Refined Ritz vectors. Reduction to Arnoldi form.3 Krylov decompositions Definition. Practical considerations.5 Notes and references Arnoldi's method. Stability of deflation. Convergence and deflation. 3. The equivalence of Arnoldi and Krylov-Schur.2 Full reorthogonalization and restarting Implicit restarting.1 Reorthogonalization and the Lanczos algorithm Lanczos with reorthogonalization. The rational Krylov method. Harmonic Ritz vectors. Deflation in an Arnoldi decomposition. Reorthogonalization.1 The implicitly restarted Arnoldi method Filter polynomials. Orthogonalization and shift-and-invert enhancement. Stability. Exchanging eigenvalues. 2.2 Lanczos decompositions Lanczos decompositions. Implementation. Translation.

The B-Arnoldi method. Three approximations. The QR factorization of a semiorthogonal basis. The singular value decomposition.4 Notes and references The Lanczos algorithm. Semiorthogonal methods. Semidefmite B. 2 Newton-Based Methods 2. Periodic reorthogonalization. Filter polynomials. Bounding the residual. Computing residual norms. No reorthogonalization. B-Orthogonalization. Estimating the loss of orthogonality. The correction equation. Convergence. The effects on the Rayleigh quotient. Semidefinite B. 1. Restarting.2 Positive definite B The B inner product. The effects on the Ritz vectors. The B-Lanczos method with periodic reorthogonalization. The restarted B-Lanczos method. An example. Software. 4 The Generalized Eigenvalue Problem 4. Analysis. The QR connection. Block Lanczos algorithms. The size of the reorthogonalization coefficients. 4. Orthogonality. Updating HK. When to perform an SRR step.1 Theory Convergence. 3.1 Transformation and convergence Transformation to an ordinary eigenproblem. Residuals and backward error. Hermitian 381 381 382 386 391 394 395 396 404 .3 Notes and references The generalized shift-and-invert transformation and the Lanczos algorithm.2 The Jacobi-Davidson method The basic Jacobi-Davidson method. 1. Schur-Rayleigh-Ritz refinement. Chebyshev acceleration.2 Implementation A general algorithm. When to orthogonalize. Full reorthogonalization.4 Notes and references Subspace iteration. Local convergence. Deflation.xii CONTENTS Semiorthogonal bases. Real matrices. Orthogonal corrections.3 Symmetric matrices Economization of storage.1 Approximate Newton methods A general approximate Newton method. 1. 4. Chebyshev acceleration. 2. Extending Schur decompositions. Harmonic Ritz vectors. 365 367 368 370 379 6 Alternatives 1 Subspace Iteration 1. The bi-Lanczos algorithm. Residuals and backward error.

and matrices 2 Linear algebra 3 Positive definite matrices 4 The LU decomposition 5 The Cholesky decomposition 6 The QR decomposition 7 Projectors References Index xiii 418 421 421 424 425 425 426 426 427 429 451 Vol II: Eigensystems . The correction equation: Exact solution.CONTENTS matrices. vectors. Appendix: Background 1 Scalars. Inexact Newton methods. Hermitian matrices.3 Notes and references Newton's method and the eigenvalue problem. 2. The correction equation: Iterative solution. The Jacobi-Davidson method. Solving projected systems.

This page intentionally left blank .

2 2.2 Hermitian tridiagonal to real tridiagonal form 164 1.9 3.4 Counting left eigenvalues 192 2.1 The cross-product algorithm 211 3.3 The symmetric tridiagonal QR step 166 2.7 2. 184 2.2 2. Eigensystems 1.2 Reduction to bidiagonal form 216 xv . The QR Algorithm The shifted power method The inverse power method Generation of Householder transformations Reduction to upper Hessenberg form Generation of a plane rotation Application of a plane rotation Basic QR step for a Hessenberg matrix Finding deflation rows in an upper Hessenberg matrix Computation of the Wilkinson shift Schur form of an upper Hessenberg matrix Right eigenvectors of an upper triangular matrix Starting an implicit QR double shift Doubly shifted QR step Finding deflation rows in a real upper Hessenberg matrix Real Schur form of a real upper Hessenberg matrix Reduction to Hessenberg-triangular form The doubly shifted QZ step 18 64 67 83 85 90 90 93 96 98 99 103 120 122 124 125 146 151 Chapter 3. The Symmetric Eigenvalue Problem 1.3 Divide-and-conquer eigensystem of a symmetric tridiagonal matrix .3 3. .1 4.1 Reduction to tridiagonal form 161 1.6 2.1 3.1 1.6 Computation of selected eigenvectors 199 3.5 2.1 Solution of Sylvester's Equation Chapter 1.4 4.5 Finding a specified eigenvalue 194 2.8 2.4 2.2 Eigenvectors of a rank-one update 180 2.1 Rank-one update of the spectral decomposition 172 2.3 2.2 2.ALGORITHMS Chapter 1.1 2.2 3.

2 Complex bidiagonal to real bidiagonal form Bidiagonal QR step Back searching a bidiagonal matrix Hybrid QR-SVD algorithm Solution of the S/PD generalized eigenvalue problem The computation of C = R .3 Periodic orthogonalization I: Outer loop 3.1 4.2 The Lanczos recurrence 1.3 The Krylov-Schur cycle 2.5 The rational Krylov transformation 3.1 The Arnoldi step 1.xvi 3.3 3.5 3.3 Harmonic extension of a partial Schur decomposition 2. Krylov Sequence Methods 1.6 4.3 Krylov decomposition to Arnoldi decomposition 2.4 3.2 Extending a partial Schur decomposition 2.4 Periodic reorthogonalization II: Orthogonalization step Chapter 6.2 The restarted Arnoldi cycle 2.1 Lanczos procedure with orthogonalization 3.4 Deflating in a Krylov-Schur decomposition 2.T A R . Alternatives 1.5 Davidson's method .4 Solution of projected equations 2.1 ALGORITHMS 218 221 225 226 232 234 304 308 313 321 322 330 341 344 349 357 360 361 387 405 410 413 415 419 Chapter 5.1 Filtering an Arnoldi decomposition 2.2 Estimation of uTK+1uj 3.1 The basic Jacobi-Davidson algorithm 2.1 Subspace iteration 2.

including fast algorithms for structured matrices. This has necessarily restricted the scope of the series. In the fourth chapter the focus shifts to large matrices for which the computation of a complete eigensystem is not possible. and special topics. Although strictly speaking the individual volumes are not textbooks. in the fourth chapter we will present the algebraic and analytic theory of eigenspaces. the focus of the series is on algorithms. Accordingly. they are intended to teach. The first chapter is devoted to an exposition of the underlying mathematical theory. The reader is assumed to have a knowledge of elementary analysis and linear algebra and a reasonable amount of programming experience— about what you would expect from a beginning graduate engineer or an advanced undergraduate in an honors program. My aim is to bring the reader to the point where he or she can go to the research literature to augment what is in the series. The general approach to such problems is to calculate approximations to subspaces corresponding to groups of eigenvalues—eigenspaces or invariant subspaces they are called. Eigensystems. which treats Krylov sequence methods—in particular the Arnoldi and the Lanczos methods. The three following this volume will treat iterative methods for linear systems. The second and third chapters also treat the generalized eigenvalue problem. The first volume treated basic decompositions. My intended audience is the nonspecialist whose needs cannot be satisfied by black boxes. The subject of this volume is computations involving the eigenvalues and eigenvectors of a matrix. It seems to me that these people will be chiefly interested in the methods themselves— how they are derived and how they can be adapted to particular problems.PREFACE This book. but I hope the selection of topics will give the reader a sound basis for further study. sparse direct methods. with such topics as roundingerror analysis and perturbation theory introduced impromptu as needed. Up to this point the present volume shares the concern of the first volume with dense matrices and their decompositions. Consequently. which in its various manifestations has become the universal workhorse in this field. The second chapter treats the QR algorithm. The third chapter deals with symmetric matrices and the singular value decomposition. In xvii . is the second volume in a projected five-volume series entitled Matrix Algorithms. and my guiding principle has been that if something is worth explaining it is worth explaining fully. We then return to algorithms in the fifth chapter. The series is self-contained.

References to Volume I of this series [265] are preceded by the Roman numeral I—e. However. as I was preparing to write this volume. I have not hesitated to reference the first volume—especially its algorithms—where I thought it appropriate. David Dureisseix. In the first place. These summaries are for purposes of illustration and should not be regarded as finished implementations. SIAM publications and its people. I am indebted to the Mathematical and Computational Sciences Division of NIST for the use of their research facilities. who allowed me access to their excellent compendium while it was in preparation. I have come to feel that the reader will be better served by semi-independent volumes.g. Richard Lehoucq invited me to a . On reflection. Andrzej Mackiewicz. However.1. The book is divided into numbered chapters.2. John Pryce. Chapter 1. with the material in previous volumes. Numbering is by section.xviii PREFACE the last chapter we consider alternative methods such as subspace iteration and the Jacobi-Davidson method. in a general way. Briggs. Moreover.7.. References to items outside the current chapter are made explicitly—e. sections. they often leave out error checks that would clutter the presentation. such a procedure is not proof against transcription errors.5) refers to the fifth equations in section three of the current chapter. Special mention should be made of my long-time copy editor and email friend. Special thanks go to the editors of Templates for the Solution of Algebraic Eigenvalue Problems. To make life easier. With this second volume. have aided the writing and production of this book. so that (3. it is difficult to verify the correctness of algorithms written in pseudocode. and David Watkins for their comments and suggestions. Vickie Kearn first encouraged me to start the project and has helped it along too many ways to list. I posted it on the web and benefited greatly from the many comments and suggestions I received. Be on guard! A word on organization. I should like to thank K. she has turned an onerous job into a pleasant collaboration. Arnold Neumaier. and subsections. Wherever possible. Nicola Mastronardi. M. Mei Kobayashi. Initially. Jean Anderson. Austin Dubrulle. Kelly Thomas and Sam Young have seen the present volume through production. I have checked the algorithms against MATLAB implementations. I have had to come to grips with how the the volumes of this work hang together.g.. Thomas Strohmer. While I was writing this volume. Daniel Kressner. followed by unnumbered subsubsections. However. In 1997. I conceived the series being a continuous exposition of the art of matrix computations. Theorem 2. Henk van der Vorst. As usual. I have therefore been forced to assume that the reader is familiar. it is impossible to redevelop the necessary theoretical and computational material in each volume. Algorithm 1:3. with heavy cross referencing between the volumes. past and present. much of the necessary material is slipped unobtrusively into the present volume and more background can be found in the appendix. Many methods treated in this volume are summarized by displays of pseudocode (see the list of algorithms following the table of contents).

This book is dedicated to these pioneers and their coworkers. But any unified exposition makes it easy to forget that the foundations for these advances had been laid by people who were willing to work in an unexplored field. Beresford Parlett. W. I was stunned by the progress made in the area since the late sixties and early seventies. but any list would surely include Vel Kahan. G.PREFACE xix workshop on large-scale eigenvalue problems held at Argonne National Laboratory. where confusion and uncertainty were the norm. Naming names carries the risk of omitting someone. and Yousef Saad. The details of these new and improved algorithms are the subject of the second half of this volume. Shmuel Kaniel. when I had worked on the problem. Chris Paige. Stewart College Park. MD Vol II: Eigensystems .

This page intentionally left blank .

The first section consists of an exposition of the classic algebraic theory of eigensy stems. The predominant theme of this section is the systematic reduction of matrices to simpler forms by similarity transformations. the theory of eigensystems is best treated in terms of complex matrices. the workhorse of dense eigenvalue computations. The third section is devoted to perturbation theory—how eigensystems behave when the original matrix is perturbed. 1. The purpose of this chapter is to provide the mathematical background for the first half of this volume. eigenvectors. Consequently. This happy mixture of the theoretical and computational has continued to characterize the area— computational advances go hand in hand with deeper mathematical understanding. However. and we will only be able to survey the basics. His purpose was to survey the computational state of the art. some of which had been developed by Wilkinson himself. THE ALGEBRA OF EIGENSYSTEMS In this section we will develop the classical theory of eigenvalues. and their reality often results in computational savings. One of these forms — the Schur decomposition—is the goal of the QR algorithm. throughout this chapter: Unless otherwise stated A is a complex matrix of order n.1 ElGENSYSTEMS In 1965 James Hardy Wilkinson published his classic work on numerical linear algebra. with an emphasis on the reduction of a matrix by similarity transformations to a simpler form. This is a large subject. We then present Schur's reduction of a triangular matrix to triangular form by unitary similarity transformations—a decomposition that serves as 1 . The majority of eigenvalue problems involve real matrices. The second and third sections are devoted to the analysis of eigensystems. The second section concerns norms and the their relation to the spectrum of a matrix. which are important in the analysis of many matrix algorithms. Here we will pay special attention to the asymptotic behavior of powers of matrices. and reduction by similarity transformations. The Algebraic Eigenvalue Problem. but a significant part of the book was devoted to the mathematical foundations. We begin with basic definitions and their consequences.

it is natural to regard the identity matrix of order n as having n eigenvalues. the effect of a power of A along x can be determined by taking the corresponding power of A. The pair (A. we will refer to the one we are currently interested in as the eigenvector. a set in which a member may be repeated. The advantage of having this representation is that multiplication by A is reduced to a scalar operation along the eigenvector. Perhaps the best way to think of an eigenvector x of a matrix A is that it represents a direction which remains invariant under multiplication by A. In any event.2). i. rx) is also an eigenpair. The reduction depends on our ability to solve a matrix equation called Sylvester's equation. Since the two are inseparable. For example. EIGENSYSTEMS a springboard for computing most others. Let A be of order n. one often specifies a normalization ofx. The corresponding eigenvalue A is then the representation of A in the subspace spanned by the eigenvector x. .e.. Here are some comments on this definition. it is easy to show that if Ax = Xx then Thus. To remove this trivial nonuniqueness. Further reduction to block diagonal form is treated in § 1. y) is called a LEFT EIGENPAIR ify is nonzero and yHA = Xy11. when the eigenvector corresponding to an eigenvalue is unique up to a scalar multiple. which we investigate in some detail. we will formalize their pairing in the following definition. The pair (A. The multiset of eigenvalues of A is called the SPECTRUM of A and is denoted by A( A).1. To keep track of multiplicities we define the spectrum of a matrix to be a multiset. 1. • The unqualified terms "eigenpair" and "eigenvector" refer to right eigenpairs and right eigenvectors. x) is called an EIGENPAIR or a RIGHT EIGENPAIR of A if The scalar X is called an EIGENVALUE of A and the vector x is called an EIGENVECTOR.5. one might require that XHXx = 1 or that some component of x be equal to one. For example. Definition 1. x) is an eigenpair and r ^ 0. • Eigenvalues can be repeated. all equal to one (see Theorem 1. Finally in §1.2 CHAPTER 1. For example. then (A. Left eigenpairs and eigenvectors will always be qualified.1. EIGENVALUES AND EIGENVECTORS Definitions This volume has two heroes: the eigenvalue and its companion the eigenvector.6 we present the Jordan canonical form. • If (A.

then A has a null vector.Since any diagonal element of A must have a first appearance on the diagonal. A/ . x) is an eigenpair of A. and it is easily verified that is an eigenvector of A corresponding the the eigenvalue akk. and (A.. then ctkkl—AH is nonsingular. then \x Ax = 0. THE ALGEBRA OF EIGENSYSTEMS Existence of eigenpairs: Special cases 3 In a moment we will show that any matrix has an eigenpair. It then follows from the equation that akk£k = ^£k or A = akk. Conversely. a vector x / 0 for which Ax = 0. denotes the z'th unit vector (i. 1. If A is diagonal and e.e. Since x is nonzero. e{) is an eigenpair of A. say &. then Thus (a. it follows that the diagonal elements of A are eigenvalues of A. • Singular matrices. Conversely. If (A. x) is an eigenvector of A it must have a last nonzero component. If no diagonal element of AH is equal to akk. It follows that (0. before we formally establish the existence of eigenpairs. if XI . then it has a null vector x. However.. The characteristic equation and the existence of eigenpairs We now turn to the existence of eigenpairs. • Diagonal matrices.. it will be convenient to examine three important cases in which a matrix must by its nature have eigenpairs. x) is an eigenpair of A. i. Let A be a triangular matrix partitioned in the form where AH is of order k. or equivalently (\I-A)x = Q.A must be singular. x) is an eigenpair of A. • Triangular matrices.. If A is singular. Thus the eigenvalues of A are Vol II: Eigensystems .A is singular.Thus the eigenvalues of a triangular matrix are precisely its diagonal elements.e.SEC. if (A. the vector whose «th component is one and whose other components are zero).

in addition to a right null vector. To do so. it has a left null vector yH. Equivalently. then. we have already defined A( A) to be a multiset.. Matrices of order two In many applications it is useful to have the the characteristic polynomial of a 2x2 matrix. More precisely. i. We have thus shown that the eigenvalues of A are the roots of the characteristic equation (1. Since a polynomial of degree n has exactly n roots. • It follows from the theorem that we do not need to distinguish between "right" and "left" eigenvalues. the eigenvalues of A are the scalars for which The function ^(A) is a monic polynomial of degree n called the characteristic polynomial of A.e. This theorem and the accompanying definition has two consequences. say. counting algebraic multiplicities every matrix of order n has n eigenvalues. from the fundamental theorem of algebra we have the following theorem. Theorem 1.4 CHAPTER 1. andletpA(^) = det(A/ — A) be its characteristic polynomial Thenpp(X) can be factored uniquely in the form where the numbers A.. EIGENSYSTEMS the scalars for which A/ — A is singular.2. • It is important to keep track of the algebraic multiplicity of the eigenvalues. a matrix of order n has n eigenvalues. Since a matrix is singular if and only if its determinant is zero. Let A be of order n. t/H is a left eigenvector corresponding to A.3). are distinct and The eigenvalues of A are the scalars A. which are said to have ALGEBRAIC MULTIPLICITY mi. Thus every eigenvalue has both a left and right eigenvector. Thus. yHA = Ay H . For if A/ — A is singular. For this matrix An easily memorized equivalent is . We will also use the phrase "counting multiplicities" as a reminder that multiplicities count.

kl(x) inside R.) 7. A point source is located at x0 in the exterior of JR.52 Consider steady diffusion in an absorbing medium. Compare this result with diffraction by a thin disk of the shape of the aperture (with vanishing field on the disk. The total field u can be written where us(x.16] WIENER-HOPF METHOD 329 derivative of the field in the aperture. The absorption coefficient is the constant k2 outside the bounded region R and a function k2 . Characterize this normal derivative by an integral equation.53 Figure 7. The concentration u satisfies Express u everywhere in terms of its values inside R and derive an integral equation for u(x). In the presence of the screen this plane wave is diffracted.§7.22 is a cross-sectional view of two parallel plates with a screen obstructing in part the plane z = 0. The boundary condition on the plates and on the screen is that the normal derivative of the field vanishes. z) satisfies . x in R. the plane wave elv/Az could propagate in the channel. FIGURE 7. If the screen were absent.22 7.

then so is (A. then on taking conjugates we find that Ax = Xx. If (A. Theorem 13. Now the complex zeros of a real polynomial occur in conjugate pairs. This suggests the following definition. It is natural to conjecture that the algebraic and geometric multiplicities of an eigenvalue must be the same.2. that need not be true. The following theorem sets the stage for defining it. A is also real—a contradiction. The theorem shows that the set of eigenvectors corresponding to an eigenvalue along with the zero vector forms a subspace. y is nonzero. x). if A is real. the real and imaginary parts ofx are linearly independent 1. Unfortunately. Its proof is left as an exercise. . Since A and x are real and x ^ 0. It follows that the vector is real and nonzero. and such pairs have the same algebraic multiplicity. but it can be less. The dimension of that subspace is the maximal number of linearly independent eigenvectors associated with the eigenvalue. Definition 1. Hence on multiplying the relation Ax = \x by 1 — i r. The geometric multiplicity is never greater than the algebraic multiplicity (see the notes following Theorem 1. x) is a complex eigenpairofA. GEOMETRIC MULTIPLICITY AND DEFECTIVE MATRICES Geometric multiplicity The fact that we refer to the multiplicity of an eigenvalue as a root of the characteristic equation as its algebraic multiplicity suggests that there is another kind of multiplicity. We may summarize these results in the following theorem. Actually. To see this write x = y + i z and assume that. Theorem 1. Let X be an eigenvalue of A. we have effectively computed the conjugate eigenpair. we can say more. Then for y and z to be linearly dependent. Moreover. Thus. we get Ax = Ax. It follows that or is an eigenvector of A corresponding to A.5. Moreover. The GEOMETRIC MULTIPLICITY of an eigenvalue A is the dimension of the subspace spanned by its eigenvectors orequivalentlym\l(XI — A). as the following example shows. Specifically. EIGENSYSTEMS of a real matrix have a special structure that we can exploit to eliminate the use of complex arithmetic. the real and imaginary parts of the eigenvector x corresponding to a complex eigenvalue A are linearly independent. Then the set of all eigenvectors corresponding to A together with the zero vector is the null space of XI — A. where A is complex. For if Ax — Xx.12).4. if we have computed a complex eigenpair. It follows that the complex eigenvalues of a real matrix occur in conjugate pairs. say. we must have z = ry for some real scalar r.6 CHAPTER 1. then its characteristic polynomial is real. The complex eigenvalues of a real matrix A occur in conjugate pairs of equal algebraic multiplicity.

n). (We also say that the eigenpairs (A. The matrix differs from J2(0) by a perturbation ofe in its (2. for example. Example 1.. from which it follows that 0 is an eigenvalue of algebraic multiplicity two.) Then any vector y can be expressed in the form An expansion like this is particularly useful in numerical computations.. it follows that Thus we can replace the process of computing Aky by multiplication of y by A with the less expensive process of taking the powers of the eigenvalues and summing their products with their eigenvectors. the perturbed eigenvalues are ±10~8—eight orders of magnitude greater than the perturbation in the matrix.6 shows that not all matrices have complete systems of eigenvectors. Defectiveness introduces two practical difficulties. x z ) form a complete system.-. l)-element. Specifically. The second difficulty with defective matrices is that defective eigenvalues are very sensitive to perturbations in the matrix elements. we say that an eigenvalue whose algebraic multiplicity is less than its geometric multiplicity is defective. a defective matrix has too few eigenvectors. Hence the geometric multiplicity of the eigenvalue 0 is one. as the following example shows. For example. It is also an example of a defective matrix. If. since AkXi — X^Xi (i — 1. To see why this is a problem. Let 7 The characteristic equation of J2(0) is A2 = 0.6. 1. But its eigenvalues are ±V/e..7. . let us agree to say that the matrix A has a complete system of eigenvectors if it has n linearly independent eigenvectors Xi corresponding to eigenvalues Az-.SEC. On the other hand. THE ALGEBRA OF EIGENSYSTEMS Example 1. A matrix with one or more defective eigenvalues is also said to be defective. This represents a dramatic perturbation of the eigenvalue. VolII: Eigensystems . First. Unfortunately. the null space of J2(0) is spanned by ei. Defectiveness The matrix J2(0) is an example of a Jordan block. 6 — 10~16. Example 1.

For the eigenvalue problem. but those eigenvectors will be nearly linearly dependent. In fact.-.can be combined in the matrix equation .6) problematic.9. Xi) (i — 1. In other cases we must come to grips with it. . in transforming the least squares problem of minimizing 11 y—X b \ \ 2 one uses orthogonal matrices because they preserve the 2-norm. Thus defectiveness is a phenomenon we will have to live with. The class of transformations and the final form will depend on the problem. are just as sensitive as the original multiple eigenvalues. Definition 1. Example 1.8 CHAPTER 1. Let A be of order n and let U be nonsingular. In some cases. by using a Schur decomposition (Theorem 1. the transformations of choice are called similarity transformations. where defectiveness is only incidental to the problem.n). We also say that B is obtained from A by a SIMILARITY TRANSFORMATION. Let the matrix A have a complete system ofeigenpairs (A. To distinguish this latter class of eigenvalues.. a matrix that is near a defective matrix is not much better than a defective matrix. The following is an important example of a similarity transformation. We will also say the the corresponding eigenvector and eigenpair are simple. which makes the computation of expansions like (1. Let The the individual relations Ax{ — A t # t . SIMILARITY TRANSFORMATIONS A key strategy in matrix computations is to transform the matrix in question to a form that makes the problem at hand easy to solve.. For example. It may have a complete system of eigenvectors.3. 1. we can work around it—for example. it is sometimes argued that defective matrices do not exist in practice. EIGENSYSTEMS Because rounding a defective matrix will generally cause its eigenvalues to become simple.12).8. Simple eigenvalues We have just seen that multiple eigenvalues may be accompanied by theoretical and practical difficulties that do not affect their nonmultiple brethren. Moreover. we will say that an eigenvalue is simple if it has algebraic multiplicity one. However. the perturbed eigenvalues.. Then the matrices A and B = U~1AU are said to be SIMILAR. one of the most successful algorithms for solving least squares problems is based on the orthogonal reduction of X to triangular form. though distinct. We could wish our tools were sharper.

e. x) is an eigenpair of A. If (A. is the sum of the eigenvalues of A counting multiplicities and hence is unchanged by similarity transformations. Let A be of order n. whose columns are eigenvectors of A. i. Theorem 1. U lx) is an eigenpair of B. which is the product of its eigenvalues. the matrix X is nonsingular. VolII: Eigensystems . If (A. U~lx) is an eigenpair of B. similarity transformations preserve functions of the eigenvalues. Then the eigenvalues of A and B are the same and have the same multiplicities.SEC. then so that (A. 1. A nice feature of similarity transformations is that they affect the eigensystem of a matrix in a systematic way. Hence we may write Thus we have shown that a matrix with a complete system ofeigenpairs can be reduced to diagonal form by a similarity transformation. The following theorem shows that the trace is also invariant. • In addition to preserving the eigenvalues of a matrix (and transforming eigenvectors in a predictable manner). Let A be of order n and let B . we see that if A can be diagonalized by a similarity transformation X~l AX.. Theorem 1. Conversely. Thus A and B have the same characteristic polynomial and hence the same eigenval ues with the same multiplicities. then (A. is clearly unchanged by similarity transformations.10.U~1AU be similar to A. or) is an eigenpair of A. The trace of A. which form a complete system. then the columns ofX are eigenvectors of A. as the following theorem shows. The determinant of a matrix. by reversing the above argument. THE ALGEBRA OF EIGENSYSTEMS 9 Because the eigenvectors Xi are linearly independent.11.

Chapter 2. If it changes significantly. 1. Unitary transformations play a leading role in matrix computations. Thus we can throw an error made in a computed transformation back on the original matrix without magnification. and even when one can.10 CHAPTER 1. .26). so that they do not magnify errors. We have seen (Example 1. if we set a small element of B to zero—say when an iteration is deemed to have converged—it is equivalent to making a perturbation of the same size in A.9) that a matrix with a complete system of eigenpairs can be reduced to diagonal form by a similarity transformation. we find that the coefficient of A n 1 is the negative of the sum of the diagonal elements of A. They are trivial to invert.4. For example. see (2. The following theorem shows that this hope is realized.7)]. . More important. not all matrices can be so reduced. Schur form In asking how far one can reduce a matrix by a class of transformations. A unitary matrix has n2 elements. there is a bug in the code. we can write Moreover. They have spectral norm one. it is useful to ask how many degrees of freedom the transforming matrices possesses. . its matrix of eigenvectors may be nearly singular. where U is unitary and we perturb B by an error to get B = B -f F. • This result is invaluable in debugging matrix algorithms that proceed by similarity transformations. if. \\E\\2 = \\F\\2 [see (2. A n . For an example. . But these elements are constrained by n(n+l)/2 conditions: n(n-l)/2 orthogonality conditions and n normality conditions. From the expression on the left.1. . we can hope to reduce a matrix A to triangular form by a unitary similarity. Since a triangular matrix has n(n-l)/2 zero elements. we will now investigate the existence of intermediate forms that stop before full diagonality. Just print out the trace after each transformation. say. SCHUR DECOMPOSITIONS Unitary similarities This subsection is devoted to the reduction of matrices by unitary and orthogonal similarities. Consequently. The characteristic polynomial of A can be written in two ways: By expanding the determinant on the right. Unfortunately. Let the eigenvalues of A be A i . This leaves n(nl)/2 degrees of freedom. so that a unitary transformation can easily be undone. we find that the same coefficient is the negative of the sum of the eigenvalues of A. EIGENSYSTEMS Proof.

we may assume that there is a unitary matrix V such that V^MV — T^ where T^.5). Hence Now by (1. • Here are some comments on this theorem. . . By appropriate choice ofU. is an upper triangular matrix with the eigenvalues A 2 .. 1. In addition multiple eigenvalues can cause the decomposition not to be unique—e. may be made to appear in any order. Let x\ be an eigenvector corresponding to AI and assume without loss of generality that Z H X = 1.. Then there is a unitary matrix U such that where T is upper triangular. By an obvious induction. Proof. Let the matrix X — (x\ X^) be unitary. .12 (Schur). .SEC. If we set then it is easily verified that which has the required form. A n . the eigenvalues of A. It is not unique because it depends on the order of the eigenvalues in T. which are the diagonal elements of T. . Let AI . • The decomposition A = UTU11 is called a Schur decomposition of A. .g. the eigenvalues of M are A 2 . . there may be more Vol II: Eigensystems . THE ALGEBRA OF EIGENSYSTEMS 11 Theorem 1. Because X is unitary we h a v e x ^ x i — 1 a n d x ^ X i — 0.. . Let A be of order n. An appearing on its diagonal in the prescribed order. . Xn be an ordering of the eigenvalues of A.. Then the last equality following from the relation Ax\ — \\x\.

if one has an eigenpair at hand.12. Unique or not. In other words. more complicated forms in theoretical and computational applications. as can easily be verified. For example. • Given the eigenpair (Ai. to be treated later. so that where A is the diagonal matrix of eigenvalues. the Jordan canonical form. so that all the others are distinct from A. A nontrivial example is the matrix </2(0) of Example 1. xi). • Schur decompositions can often replace other. The zero matrix is a trivial example of a nilpotent matrix. Let X be the matrix of eigenvectors of A. EIGENSYSTEMS than one choice of the eigenvector x\ in the proof of Theorem 1.12 is not constructive. the Schur vectors bear an interesting relation to the eigenvectors. since it requires a ready supply of eigenpairs. it provides a constructive method for reducing the eigenvalue problem to a smaller one in which the eigenpair has been removed [see (1. This process is known as deflation and is an important technique in eigencalculations. the Schur vectors are the columns of the Q-factor of the matrix of eigenvectors arranged in the prescribed order. Nilpotent matrices We now turn to applications of the Schur decomposition. whose square is zero. the columns of U are called Schur vectors. beginning with an analysis of nilpotent matrices. It follows that the eigenvectors corresponding to A are confined to a subspace of dimension k and cannot span a subspace of dimension greater than k. Then it is easy to show that the last n—k components of any eigenvector corresponding to A must be zero [see (1. To prove the same fact via the Schur decomposition. shows that the geometric multiplicity of an eigenvalue cannot exceed its algebraic multiplicity. A matrix A is nilpotent if Ak = 0 for some fc > 1. • The proof of Theorem 1. then the diagonal and first k-l superdiagonals of Tk are zero. Let the first k diagonals of T = UHAU be A. the unitary matrix X = (xj ^2) in the proof can be constructed as a Householder transformation [see (2. • When A has a complete system of eigenvectors. Let be the QR decomposition of X. . Chapter 2]. The following theorem generalizes the above observations. Hence UHAU is a Schur decomposition of A.19). let A have algebraic multiplicity k.8)]. Then (1. Hence Tn = 0.9) is equivalent to The right-hand side of this relation is upper triangular with the eigenvalues of A on its diagonal.12 CHAPTER 1.2)].7. However. if T is an upper triangular matrix of order n with zeros on its diagonal. In general.

THE ALGEBRA OF EIGENSYSTEMS 13 Theorem 1. In this case. Conversely.-. Thus T is both upper and lower triangular and hence is diagonal. the Az are real. Since the diagonal elements of a Hermitian matrix are real. Proof. its eigenvalues are real. It follows that Hermitian matrices The elementary facts about Hermitian matrices can be obtained almost effortlessly from the Schur form.denotes the ith column of U then (A. then so that T itself is Hermitian. Let Ak = 0 and let (A. To summarize: If A is Hermitian. We call the decomposition the SPECTRAL DECOMPOSITION OF A Vol II: Eigensystems . 1. if A is Hermitian and T = U^AU is a Schur form of A.13.SEC. Moreover. A matrix A of order n is nilpotentifand only if its eigenvalues are all zero. say.1). and it follows that A = 0. Specifically. ttt-) is an eigenpair of A. and it has a complete orthonormal system of eigenvectors. x) be an eigenpair of A. Then from (1. Then any Schur decomposition of A has the form where T has zeros on its diagonal. from the relation we see that if ttt. suppose that the eigenvalues of A are all zero.

which uses the Frobenius norm || • ||p to be introduced in §2. (1 -11) We add for completeness that the eigenvalues of a unitary matrix have absolute value one.l)-blocks of each side of the relation A H A = AAH we find that Since for any matrix B it follows that If A is a normal matrix and U is unitary. Let the normal matrix A be partitioned in the form Then Proof. then UHAU is also normal. By computing the (l. skew Hermitian matrices (matrices for which AH = —A). We claim that T is also diagonal. and unitary matrices.15.10) is that the triangular factor in any Schur decomposition of a Hermitian matrix is diagonal. Theorem 1. . the matrix T is normal. It follows that: A normal matrix has a complete orthonormal system ofeigen vectors. Definition 1. The eigenvalues of a skew Hermitian matrix are zero or purely imaginary.14.14 Normal matrices CHAPTER 1. The three naturally occurring examples of normal matrices are Hermitian matrices. partition By Theorem 1. EIGENSYSTEMS The key observation in establishing (1. so that T22 is normal. This property is true of a more general class of matrices. It then follows from the relation TET = TTH that ^22^22 = ^22^22.1. if T is the triangular factor in a Schur decomposition of A. t12 = 0. A key fact about normal matrices is contained in the following theorem. An obvious induction now establishes the claim. A matrix A is NORMAL if A H A = AA H . To see this.15. Consequently.

We have seen (Example 1. Such a matrix could. then the diagonalizing transformation provides a complete set of eigenvectors. which has no complete system of eigenvectors.9) that if we can diagonalize A. of course. a defective matrix. Vol II: Eigensystems . preserve the block upper triangularity of A.SEC. we must first investigate conditions under which the equation has a solution. Thus to proceed beyond Schur form we must look to coarser decompositions. Thus we are led to a transformation of the form whose inverse is Applying this transformation to A we get Thus to achieve block diagonality we must have This equation is linear in the elements of Q. However. which can be accomplished by making the transformation itself block upper triangular. But only the first Schur vector—the first column of the orthogonal factor—is an eigenvector. If we can solve it. Suppose that we wish to reduce A to block diagonal form by a similarity transformation—preferably a simple one. cannot be diagonalized. be obtained by partitioning a Schur decomposition. BLOCK DIAGONALIZATION AND SYLVESTER'S EQUATION 15 A Schur decomposition of a matrix A represents a solution of the eigenvalue problem in the sense that the eigenvalues of the matrix in question can be obtained by inspecting the diagonals of the triangular factor. But we make no such assumption. We can make the transformation even simpler by requiring its diagonal blocks to be identity matrices. THE ALGEBRA OF EIGENSYSTEMS 1. We must. 1. for example. Unfortunately. we can block diagonalize A. such as a block diagonal form.5. From block triangular to block diagonal matrices Let A have the block triangular form where L and M are square. in which case L and M would be upper triangular.

our problem reduces to that ot determining when S is nonsingular. To prove the converse. The first step is to transform this system into a more convenient form. i. Let X = xyH. Here we will be concerned with sufficient conditions for the existence of a solution. we have the following theorem. Consider the sy stem SX — C. y] be a left eigenpair of B.e. if and only if Proof. Specifically. Let T = VHBV be a Schur decomposition of B. To prove the necessity of the condition. and S is singular. if A and B have a common eigenvalue— then X is a "null vector" of S. Hence.16. and it occurs in many important applications. Theorem 1. x) be a right eigenpair of A and let (//. It turns out that the nonsingularity depends on the spectra of A and B. If we introduce the Sylvester operator S: defined by then we may write Sylvester's equation in the form 1 he operator ^ is linear in X . say of orders n and m.16 Sylvester's equation CHAPTER 1. The Sylvester operator S is nonsingular if and only if the spectra of A and B are disjoint. Then Hence if we can choose A = // — that is. let (A. Then Sylvester's equation can be written in the form If we set and we may write the transformed equation in the form .. EIGENSYSTEMS Let us consider the general problem of solving an equation of the form where A and B are square. we use the fact that an operator is nonsingular if every linear system in the operator has a solution. This equation is called a Sylvester equation.

yk-\. then (1. 1.. Thus A — t\\I is nonsingular. In general.SEC. Vol II: Eigensystems ..From the fcth column of (1.XB = 0 so that the Sylvester operator X >-> (A — <?I)X — XB is singular. The eigenvalues of S are the differences of the eigenvalues of A and B. It follows from Theorem 1. Conversely. X) is an eigenpair of S. is well defined.13) we get The right-hand side of this equation is well defined and the matrix on the left is nonsineular. From the second column of the system (1. Hence vi. We have found a solution Y of the equation Hence is a solution of We may restate this theorem in a way that will prove useful in the sequel. (A — aI)X .16 that this can be true if and only if there is a A € A( A) and \i € A(£) such that A . as argued above. .a = //. if A € A( A) and \L € A(B). THE ALGEBRA OF EIGENSYSTEMS Let us partition this system in the form 17 The first column of this system has the form Now tn is an eigenvalue of B and hence is not an eigenvalue of A. so that (<r. Let SX = crX. Equivalently. A . or equivalently a = X — /z. Moreover. suppose that we have found 3/1.12) shows that (A — /*) € A(S).17. and the first column of Y is well defined as the solution of the equation (A-tuI)yi = di.till is nonsingular.. Thus we have the following corollary.13) we get the equation Because y\ is well defined the right-hand side of this system is well defined. Corollary 1. Hence 3/2 is well defined.

1. which can be solved in 0(n 2 ) operations. As it stands. by a further transformation of the problem we can reduce the count for solving the systems to O(mn2). the procedure of transforming the Sylvester equation into the form AY . as in the proof of Theorem 1. EIGENSYSTEMS Given matrices A £ C n x n and B € C m x m .1: Solution of Sylvester's Equation o An algorithm Theorem 1. Then we can write Sylvester's equation in the form or equivalently where Y = UHXV and D = UHCV. 3. It is summarized in Algorithm 1. let S = UHAU be a Schur decomposition of A. These observations form the basis for one of the best algorithms for solving dense Sylvester equations. 2.r^/).16 and its corollary are sufficient for our immediate purposes. 6. . For a general matrix A these systems would require O(mn3) operations to solve. 1. 4. However. If we now solve columnwise for y. it involves solving m systems whose matrices are (A . this algorithm solves Sylvester's equation AX . this operation count is prohibitive. 7. If we have a method for computing Schur decompositions. Compute a Schur decomposition A — USU^ Compute a Schur decomposition B = VTVH D = UHCV for k = 1 to ro Solve the triangular system (S-tkkI)yk k = dk + X^i" tikl/i end for k * = UYVH Algorithm 1. Unless m is quite small. Specifically.YT = D and solving columnwise for Y represents an algorithm for solving Sylvester's equation.XB = C. But it is instructive to reflect on the the second part of the proof. In the code below all partitionings are by columns.18 CHAPTER 1. the systems have upper triangular matrices of the form S — tkkl.16. 5. More details are given in the notes and references.

THE ALGEBRA OF EIGENSYSTEMS Block diagonalization redux Let us now return to our original goal of block diagonalizing the matrix 19 We have seen that a sufficient condition for a diagonalizing similarity to exist is that the equation LQ . .18 to M to cut out a block L^ corresponding to £2. for example.And so on. and by Theorem 1. that A( A) can be partitioned into the union C\ U.18 we can reduce T to the form In the same way we may apply Theorem 1. Suppose.QM = —H have a solution. £*:• Partition where A(£i) = LI . Let and suppose that Then there is a unique matrix Q satisfying such that The importance of this theorem is that it can be applied recursively. Theorem 1.SEC.18. Moreover.. Vol II: Eigensystems .. 1. Then the eigenvalues of LI are disjoint from those of M. .U£fc of disjoint multisets..12) we can find a unitary matrix X such that T = XHAX is upper triangular with the eigenvalues of A appearing in the order of their appearance in the multisets C\. Hence we have the following theorem. a sufficient condition for this Sylvester equation to have a solution is that L and M have disjoint spectra.. to give the following theorem. By the Schur decomposition theorem (Theorem 1.

Let A(A) be the union C\ U . Corollary 1. Finally. if Si is nonsingular.19 and its corollaries enjoy a certain uniqueness. it allows us to assume.. where Li is of order raz and has only the eigenvalue A. that Yi is orthonormal. . see p. The decompositions in Theorem 1.. Let the matrix A of order n have distinct eigenvalues AI. It is natural to ask how much further we can go by relaxing the requirement that the transformation be unitary. U Ck of disjoint multisets.. A is diagonalizable and has a complete system of eigenvectors. . . which is the topic of this subsection. then the blocks Li in this corollary are scalars. correspond to the distinct eigenvalues of A. TJ 1. The answer is the classical Jordan canonical form. We begin with a definition. Then there is a nonsingularmatrix X such that where If the sets £. and Li in Theorem 1. For example..20 CHAPTER 1. and S^lL{Si. Then the spaces spanned by the Xi and Yi are unique (this fact is nontrivial to prove...6. This lack of uniqueness can be useful. . say. Hence: If A has distinct eigenvalues. 246).20. Then there is a nonsingular matrix X such that X-lAX = di&g(Ll.19. \k with multiplicities mi. EIGENSYSTEMS Theorem 1. JORDAN FORM The reports of my death are greatly exaggerated. we obtain the following corollary. Mark Twain We have seen that any matrix can be reduced to triangular form by a unitary similarity transformation...19 by XiSi. YiS~^. However..Lk). Let X — (Xi • • • Xk) be partitioned conformally with the pardon of A( A) and let X~l = (Yi .• Yk) also be partitioned conformally. ra^.-. we may replace X^ Yi. . . if all the eigenvalues of A are distinct.

The hard part is reducing the individual blocks to the J. .14) where (1.SEC. Let AeCnXn have k distinct eigenvalues AI.22. A JORDAN BLOCK Jm(X) isa matrix of order m of the form 21 From an algebraic point of view a Jordan block is as defective as a matrix can be.. The eigenvalue A of the matrix J m (A) has algebraic multiplicity m.15) Here are some comments about this theorem. Any reduction of A into Jordan blocks will result in the same number of blocks of the same order having the same eigenvalues...-j along with their associated eigenvalues are characteristic of the matrix A. The spaces spanned by the Xi are unique. . . we still speak of the Jordan canonical form of a matrix. so A has geometric multiplicity one. Principal vectors may be chosen in different ways. The multiplicities rat and the numbers m.. However.Then there are unique integers m. l. .j (i = 1. But rankfAJ — J m (A)] is m-1... . Vol II: Eigensystems . . The Jordan canonical form—a decomposition of a matrix into Jordan blocks—is described in the following theorem. &.21. Theorem 1.. the columns of the Xi are not. each block corresponding to a distinct eigenvalue of A. • The Jordan canonical form has a limited uniqueness. THE ALGEBRA OF EIGENSYSTEMS Definition 1. For more see the notes and references./.15). • The proof of the theorem is quite involved. j = 1. It begins with a reduction to block diagonal form. mk.14) and (1. . A^ of algebraic multiplicities m i .. But in spite of this nonuniqueness.-) satisfying and a nonsingular matrix X such that (1. in (1. .

17) is called a sequence of principal vectors of A. the action of A on the matrix Xi is entirely specified by the matrix Jt. Nonetheless the simple relations (1. .17) make calculations involving principal vectors quite easy. If we now partition X{ in the form in conformity with (1. The Jordan canonical form is now less frequently used than previously. Note that the first and only the first principal vector associated with a Jordan block is an eigenvector.e.. to the Schur decomposition. Perturbations in the matrix send the eigenvalues flying. The spaces spanned by the columns of the Xi are examples of eigenspaces (to be treated in Chapter 4). Against this must be set the fact that the form is virtually uncomputable. First partition in conformity with (1. in principle.15). so that. and reports of its death are greatly exaggerated. It then follows from the relation AX = XJ that i. EIGENSYSTEMS • We can partition X at several levels corresponding to levels in the decomposition. Finallv. as a mathematical probe the Jordan canonical form is still useful. if we write then the vectors satisfy A set of vectors satisfying (1. If we know the nature of the defectiveness in the original matrix. The simplicity of its blocks make it a keen tool for conjecturing new theorems and proving them. But otherwise attempts to compute the Jordan canonical form of a matrix have not been very successful.22 CHAPTER l.7). Under the circumstances there is an increasing tendency to turn whenever possible to friendlier decompositions—for example. then it may be possible to recover the original form. it can be shown that Thus the actions of A on the matrices Xij are specified by the individual Jordan blocks in the decomposition. However.14). the matrix becomes diagonalizable (Example 1.

insightful treatment with a stress on geometry. However. They contain a wealth of useful information.SEC. and van der Vorst [10]. 33]. It consists of sketches of all the important algorithms for large eigenproblems with brief treatments of the theoretical and computational background and an invaluable bibliography. There are too many of these books to survey here. including eigenvalues. 1762] to solve systems of differential equations with constant coefficients.) The nontrivial solutions of the equations Ax = Xx were introduced by Lagrange [155. "characteristic" has an inconveniently large number of syllables and survives only in the terms "characteristic Vol II: Eigensystems . History and Nomenclature A nice treatment of the history of determinants and matrices. however. He makes the interesting point that many of the results we think of as belonging to matrix theory were derived as theorems on determinants and bilinear and quadratic forms. Datta's Numerical Linear Algebra and Applications [52] is an excellent text for the classroom. Trefethen and Bau's Numerical Linear Algebra [278] is a high-level. 1904] to denote for integral equations the reciprocal of our matrix eigenvalue. as is Watkins's Fundamentals of Matrix Computations [288]. see [213. Eigenvalues have been called many things in their day. (Matrices were introduced by Cayley in 1858 [32]. The classics of the field are Householder's Theory of Matrices in Numerical Analysis [121] and Wilkinson's Algebraic Eigenvalue Problem [300]. At some point Hilbert's Eigenwerte inverted themselves and became attached to matrices.7. Ch. More or less systematic treatments of eigensystems are found in books on matrix computations. which was introduced by Hilbert [116. THE ALGEBRA OF EIGENSYSTEMS 1. p. Golub and Van Loan's Matrix Computations [95] is the currently definitive survey of the field.119]. Demmel. Ruhe. 1. Chatelin's Eigenvalues of Matrices [39] and Saad's Numerical Methods for Large Eigenvalue Problems: Theory and Algorithms [233] are advanced treatises that cover both theory and computation. It is a perfect complement to this volume and should be in the hands of anyone seriously interested in large eigenproblems. The nomenclature for integral equations remains ambiguous to this day. Dongarra. Our term "eigenvalue" derives from the German Eigenwert. for example. should be made of Bellman's classic Introduction to Matrix Analysis [18] and Horn and Johnson's two books on introductory and advanced matrix analysis [118. is given by Kline [147. NOTES AND REFERENCES General references 23 The algebra and analysis of eigensystems is treated in varying degrees by most books on linear and applied linear algebra." not for the polynomial equation but for the system Ax — Xx. In 1829 Cauchy [31] took up the problem and introduced the term "characteristic equation. The term "characteristic value" is a reasonable translation of Eigenwert. but they only gained wide acceptance in the twentieth century. 96]. Special mention should be made of Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide edited by Bai. Special mention.

Ch. The day when purists and pedants could legitimately object to "eigenvalue" as a hybrid of German and English has long since passed. The term "eigenpair" used to denote an eigenvalue and eigenvector is a recent innovation.24 CHAPTER 1. and it can often replace more sophisticated decompositions—the Jordan canonical form. 15] devotes a chapter to Sylvester's equation. and an extensive bibliography. Schur form The proof of the existence of Schur decompositions is essentially the one given by Schur in 1909 [234] — an early example of the use of partitioned matrices. The customary representation is vec(X) in which the columns of X are stacked atop one another in their natural order. The linearity of the Sylvester operator means that it has a matrix representation once a convention is chosen for representing X as a vector." "eigenspace." (For symmetric matrices the characteristic equation and its equivalents are also called the secular equation owing to its connection with the secular perturbations in the orbits of planets.16) and then stated that it easily generalizes. Golub. historical comments. for example—in matrix algorithms. 1884]. For A and B of order two he showed that the equation has a nontrivial solution if and only if A and B have a common eigenvalue (essentially Theorem 1. which contains error and perturbation analyses. Higham [114. J.XB = 0 [273. The earliest use I can find is in Parlett's Symmetric Eigenvalue Problem [206]. Sylvester's equation Sylvester's equation is so called because J. . Nash. Thus we can use "eigensystem. Chapter 2). The decomposition can be computed stably by the QR algorithm (see §§2-3. The German eigen has become a thoroughly naturalized English prefix meaning having to do with eigenvalues and eigenvectors. In this representation the Sylvester equation becomes where ® is the Kronecker product defined by Algorithm 1.) Other terms are "latent value" and "proper value" from the French valeur propre.1 is a stripped down version of one due to Bartels and Stewart [14]. and Van Loan [90] have proposed a Hessenberg-Schur variant that saves some operations. Sylvester considered the homogeneous case AX . The difference is that a real Schur form is used to keep the arithmetic real." or "eigenexpansion" without fear of being misunderstood. EIGENSYSTEMS equation" and "characteristic polynomial.

In the next subsection we study the relation of norms to the magnitude of the largest eigenvalue of a matrix—the spectral radius of the matrix. this sensible-sounding procedure is difficult to implement. See [96. 57. as the following definition shows. But once the existence of eigenpairs is established. MATRIX AND VECTOR NORMS Definition Analysis happens when a metric meets an algebraic structure. Block diagonalization would seem to be an attractive way out of this problem. All the above comments about computing block diagonal forms of a matrix apply to the Jordan form—and then some.1. NORMS. find bases for their eigenspaces. Thus to say that A has linear elementary divisors is to say that A is nondefective or that A is diagonalizable. NORMS. and reduce the matrix to block diagonal form. 2. Leaving aside the possibility that the smallest cluster may consist of all the eigenvalues.SEC.1. one frequently encounters matrices with very ill-conditioned systems of eigenvectors. one can afford to lavish computation on their subsequent analysis. which measure the size of matrices and vectors. The characteristic polynomials of the Jordan blocks of a matrix A are called the elementary divisors of A.60. SPECTRAL RADII. the problem is to find the clusters. For more see [17. For more on norms see §1:4.61. Unfortunately. Example 1. We begin with a brief review of matrix and vector norms. Vol II: Eigensystems .140. We now turn to the analytic properties of eigensystems. 2. AND MATRIX POWERS Eigenpairs are defined by a nonlinear equation and hence are creatures of analysis. Jordan canonical form The Jordan canonical form was established in 1874 by Camile Jordan [136]. an algebra of eigensy stems can be erected on the definition.99]. a topic which will often recur in the course of this volume. 218] for the issues. SPECTRAL RADII. The first section of this chapter was devoted to that algebra. Consequently. It is a generalization of the absolute value of a scalar. The results are then applied to determine the behavior of powers of a matrix. a function that in some sense measures the size of a matrix. If the blocks are small.7 shows that eigenvalues that should be clustered do not have to lie very close to one another. schemes for automatically block diagonalizing a matrix by a well-conditioned similarity transformation have many ad hoc features. AND MATRIX POWERS Block diagonalization 25 Although rounding error prevents matrices that are exactly defective from arising in practice. For matrices the metric is generated by a norm. 2.139. The idea is to generate clusters of eigenvalues associated with nearly dependent eigenvectors.

Examples of norms The norms most widely used in matrix computations are the 1-norm.1. in which case their definitions take the following forms: The Frobenius norm has the useful characterization which in the case of a vector reduces to The Frobenius vector norm is also called the Euclidean norm because in E2 and R3 the quantity \\x\\p is the ordinary Euclidean length x.26 CHAPTER 1. and we call a norm on either space a vector norm. the Frobenius norm. EIGENSYSTEMS Definition 2. . these norms are defined for vectors. In particular. The third condition—called the triangle inequality— specifies the interaction of the norm with the matrix sum. It is frequently used in the equivalent form In this work we will always identify C n x l with the space of n-vectors C n . The Frobenius norm of a vector x is usually written \\x\\2. and the oo-norm defined by Each of these norms is actually a family of matrix norms in the sense that each is defined for matrices of all orders. A MATRIX NORM ON C mxn is a function \\-\\: C m x n ->• ]R satisfying the following three conditions: The first two conditions are natural properties of any measure of size. for reasons that will become evident shortly.

it is difficult to calculate. where the elements ot L are o^ et-j. It is called the operator norm generated by fj. NORMS. we have n(A) = A + h. the oo-norm. we say that the 2-norm is unitarily invariant. where fl denotes floating point evaluation. The vector 2-norm generates a new norm called the matrix 2-norm or the spectral norm.Unlike the 1-norm. Because of property 7. 2. let us examine what happens when a matrix A is rounded. Here is a list of some of its properties. Nonetheless it is of great importance in the analysis of matrix algorithms. The vector 1-norm generates the matrix 1-norm and the vector co-norm generates the matrix oo-norm.SEC. and the Frobenius norm. It is written || • |J2. Then it is easily verified that the function || • ||M)I/ defined on C m x n by is a matrix norm. suppose that the machine in question has rounding unit eM— roughly the largest number for which fl(l + e) = 1. AND MATRIX POWERS Operator norms 27 Let ^ be a vector norm on C m and v be a vector norm on C n . Specifically. It follows that Vol II: Eigensystems . Then elementwise where In terms or matrices. Denote by fl( A) the result of rounding the elements of A. The Frobenius norm is also unitarily invariant. and v or the matrix norm subordinate to IJL and v. SPECTRAL RADII. Norms and rounding error As an example of the use of norms. We have already seen examples of operator norms.

EIGENSYSTEMS Thus in any norm that satisfies 11 \A\ \\ = \\A\\ (such a norm is called an absolute norm). a small normwise relative error guarantees a similar relative error only in the largest elements of the matrix. Frobenius. What we have shown is that: In any absolute norm.2) is still a good approximation. with balanced norms like the oo-norm. The property of consistency does the same thing for the product of two matrices. Incidentally. the relative errors in the components vary. the smaller elements can be quite inaccurate. the 2-norm is not. and in such applications one must either scale the problem to fit the norm or use a norm that gives greater weight to small elements. Limitations of norms We have just shown that if the elements of a matrix have small relative error. and 1 for the third. the commonly used one. then the matrix as a whole has a small normwise relative error.2.2) is called the normwise relative error in fl( A). which has no accuracy at all. On the other hand. . But in others it is. rounding a matrix produces a norm wise relative error that is bounded by the rounding unit. wehave||£|| < \\A\\cM or The left-hand side of (2. Unfortunately the converse is not true. Now in many applications the accuracy of small elements is unimportant. although (2. Unfortunately. In general. and infinity norms are absolute norms. Let Then it is easily verified that so that x has low normwise relative error. 10~2 for the second. as the following example shows. It is 10~5 for the first component.28 CHAPTER l. Example 2. Consistency The triangle inequality provides a useful way of bounding the norm of the sum of two matrices.

3.\\itTn. For let y 6 Cn be nonzero. For example. and || • ||/tn be matrix norms defined on <C /Xm . Then it is easily verified that the function i> defined by is a vector norm. 2. NORMS. SPECTRAL RADII. These norms are CONSISTENT if for all A eC / x m and#e<C m X n we have Four comments on this definition. For example. • An important special case is a matrix norm || • || defined on 7£ nxn . if then • The 1-. a n d C / x n . But it is not a consistent norm.n. 2-. F. In fact there are many such norms. oo. whenever the product AB is defined. Let \\ . and oo-norms as well as the Frobenius norm are consistent in the sense that for p = 1. AND MATRIX POWERS 29 Definition 2.2.SEC. the function denned by is a norm. But Vol II: Eigensystems . || • ||m. • The Frobenius and 2-norms are consistent in the sense that Given a consistent matrix norm || • || on C n x n it is natural to ask if there is a vector norm */ on C n that is consistent with || • ||. In this case consistency reduces to • Matrix norms do not have to be consistent. C m X n .

we write At first glance. if {Ak} is a sequence of matrices. Theorem 2. Thus convergence in any norm is equivalent to the componentwise convergence of the elements of a matrix. 2.. then it converges in any norm. . If\\ • || is consistent.. Specifically.2. if a sequence converges in one norm.B\\p -»• 0 if and only if |aj^ . Theorem 2. This is an immediate consequence of the following theorem on the equivalence of norms. m. This distance function can in turn be used to define a notion of convergence for matrices. n). this definition of convergence appears to depend on the choice of the norm || • ||. Then there are positive constants a and T such that for allAe€mxn Thus it does not matter which norm we choose to define convergence.6. Let \\-\\bea norm onCHXn. Norms and convergence Given a matrix norm || • || on C m x n . then so is Vsim(A)... Although all norms are mathematically equivalent. EIGENSYSTEMS Any consistent matrix norm onCnxn has a consistent vector norm. In particular. see the discussion following Example 2. We begin with a simple but far-reaching definition.IfUis nonsingular.11. The proof of the following theorem is left as an exercise. Let \JL and v be norms on C m x n .5.4. . then the functions are norms. at least in the initial stages. it is easy to see that \\Ak .30 Thus we have shown that: CHAPTER 1. THE SPECTRAL RADIUS Definition In this subsection we will explore the relation of the largest eigenvalue of a matrix to its norms.. But in fact.6f/| -»• 0 (i = 1.. we may define the distance between two matrices A and B by \\A — B\\. For an example. For a proof see Theorem 1:4. . j = 1.4. as a practical matter they may say very different things about a converging sequence. (2-4) Another useful construction is to define a norm by multiplying by a matrix.

Then for any e > 0 there is a consistent norm II • IU. The eigenvalue A is called a DOMINANT EIGENVALUE and its eigenvector is called a DOMINANT EIGENVECTOR. AND MATRIX POWERS 31 Definition 2.7. Let A be of order n. If (A. It gets its name from the fact that it is the radius of the smallest circle centered at the origin that contains the eigenvalues of A. Let A be of order n. then Since A is an arbitrary eigenvalue of A. something worse is true. Let A be of order n. there is no norm that reproduces the spectral radius of J2(0). In other words. x) is an eigenpair of A with j/(z) = 1. any norm of J2(0) must be positive. If it is large. However. Since J2(0) is nonzero. In fact. p(Ak) = p(A)k. there are norms that approximate it. whereas the best we can say for a consistent norm is that || Ak\\ < \\A\f. and let \\ • || be a consistent norm on C n x n . For example.6. Theorem 2.8. Theorem 2. as the following theorem shows. x) for which |A| = p(A) a DOMINANT EIGENPAIR. a consistent norm will exceed the spectral radius of A. 2. the Jordan block has spectral radius zero. In general. the matrix must be large. Specifically. We begin our investigation of norms and spectral radii with a trivial but important theorem. SPECTRAL RADII. Norms and the spectral radius The spectral radius has a normlike flavor. Then Proof. Let i/ be a vector norm consistent with the matrix norm || • || [see (2. but ||«/o(0)||oo = 1.e (depending on A and e) such that Vol II: Eigensystems . Then the SPECTRAL RADIUS of A is the number We call any eigenpair (A. Thus the spectral radius is the magnitude of the largest eigenvalue of A.SEC.4)]. It also tracks the powers of a matrix with greater fidelity than a norm. NORMS.

6). Specifically. there is an 77 such that HD"1 JVjD||oo < c. Since B\ is not defective. B?). It follows that If we now define || • \\A.32 CHAPTER 1.j)-element of D 1ND is Vijrf *.4 the norm || • \\^t£ is consistent.19 we can reduce A to the form diag(i?i. we can reduce A to the form where AI is a diagonal matrix containing the dominant eigenvalues of A. EIGENSYSTEMS If the dominant eigenvalues of A are nondefective. For 77 > 0. Let be a Schur form of A in which A is diagonal and N is strictly upper triangular. we can choose a diagonal matrix DJJ — diag(/. where B\ contains the dominant eigenvalues of A and BI contains the others. A2 is a diagonal matrix containing the remain eigenvalues of A. then there is a consistent norm \\-\\A (depending on A) such that Proof.7) it satisfies (2. we can diagonalize it to get the matrix AI . Since for i < j we have TJJ~I ->• 0 as T? -> 0.e by then by Theorem 2. We can then reduce BI to Schur form to get the matrix A2 + N. and N is strictly upper triangular. D^) as above so that The norm \\-\\A defined by . by Theorem 1. and by (2. Since the diagonals of A2 are strictly less than p(A). let Then for i < j the (i. If the dominant eigenvalues of A are nondefective.

2.3. Then for any norm \ \ .c||£|U. Then Again by the equivalence of norms. Then It follows that || Ak\\2 > p(A)h.SEC. and the norm) such that If the dominant eigenvalues of A are nondefective. Nonetheless. For this reason. we need to know the rate of decrease.« for all B. MATRIX POWERS Many matrix processes and algorithms are driven by powers of matrices.(. (depending on A. x) be a dominant eigenpair of A with ||a:||2 = 1. there is a constant TA. we now turn to the asymptotic behavior of the powers of a matrix. For example. let || • \\A.11).Z > 0 such that ||J?|| < TA. Let (A. as we shall show in the next subsection. Asymptotic bounds The basic result is contained in the following theorem. the errors in certain iterative algorithms for solving linear systems increase decrease as the powers of a matrix called the iteration matrix.\ \ there are positive constants cr (depending on the norm) and TA. By the equivalence of norms.8 shows that the spectral radius of a matrix can be approximated to arbitrary accuracy by some consistent norm. there is a constant a > 0 such that \\B\\ > o-\\B\\2 for all B. it can be used to establish valuable theoretical results. NORMS. Theorem 2. e. We will see that this norm may be highly skewed with respect to the usual coordinate system—see the discussion following Example (2. Let A be of order n and let e > 0 be given. • 33 Theorem 2.e be a consistent norm such that || A||4ie < p( A) + e. It is therefore important to have criteria for when we observe a decrease. SPECTRAL RADII. AND MATRIX POWERS now does the job.9. we may take e = 0. Then Vol II: Eigensystems . Proof. 2. Hence For the upper bound. Equally important. We will first establish the lower bound.

but they fall below their initial value of 1. The co-norm of the powers of J?f 0.9) are called LOCAL CONVERGENCE RATIOS because they measure the amount by which the successor of a member of the sequence is reduced (or increased).34 CHAPTER 1. e. Example 2. The powers of the Jordan block J2(0. • The theorem shows that the powers of A grow or shrink essentially as p(A)k.1 (called the hump) plots these norms against the power k. It is instructive to look at a simple case of a defective dominant eigenvalue. Here is a more extreme example. It follows immediately from Theorem 2.g.99 = -0. the convergence ratio is |. the norms increase to a maximum of about 37 when k = 99. Instead of decreasing. From these graphs we see that the powers of J2(0. Fork = 1. which eventually approximate the asymptotic value of log 0.10.9 that: A matrix is convergent if and only if its spectral radius is less than one. but they quickly approach it.5) are Hence and The ratios in (2. whose norms decrease monotonically at a rate that quickly approaches 2~k. . we can repeat the proof of the upper bound with a consistent norm || • \\A (whose existence is guaranteed by Theorem 2. For k = 10.01. Thereafter they decrease.55. Example 2. Limitations of the bounds One should not be lulled by the behavior of J2(0. but they exhibit a hellish transient in the process.99) are The upper graph in Figure 2. The local convergence ratios are never quite as small as the asymptotic ratio one-half.5)fc.99) not only take a long time to assume their asymptotic behavior. the ratio is about 0. A matrix whose powers converge to zero is said to be convergent. the bound must be in terms of of p(A) + €.. The lower plot shows the natural logarithms of the local convergence ratios.11. We have to add "essentially" here because if a dominant eigenvalue is defective. where c can be arbitrarily small. EIGENSYSTEMS If the dominant eigenvalues of A are nondefective.8) for which \\A\\A = p(A).99 only when k = 563. so that the norms decrease from the outset.

There is a lesson to be learned from this. then the norm || • \\A.11—is typical? The answer is: It depends.1)element of «/2(0. Be on guard. because the (2. so that the norm remains small. However.2)-element of J2 (0.99)fc. which accounts for the hump. such examples do occur.99). AND MATRIX POWERS 35 Figure 2. and it is instructive to see why. But the (2. l)-element is multiplied by 1000. this VolII: Eigensystems . The bound (2. Mathematically.001.99)] + e is near p[J2(0. Of course the norm of a general matrix would be unnaturally large.SEC.1: The hump and convergence ratios Which example—2.C given by Theorem 2. 2. NORMS. and sometimes their effects are rather subtle.8 has the form The (2. Many people will go through life powering matrices and never see anything as dramatic as J2(0.99)]. SPECTRAL RADII. In matrix analysis one frequently introduces a norm designed to make a construction or a proof easy.10 or 2.99)fc. is damped by the factor 0. so that />[J2(0.8) gives no hint of the ill behavior of J2(0. If we choose e = 0.99)* is zero and hence is not magnified by this factor.001.

However. In the same book Ostrowski gives essentially Theorem 2. Since the only nonzero elements of N are ones on its superdiagonal. The reader should be aware that some authors include consistency in the definition of a matrix norm. When computed in finite precision. the approach has the disadvantage that one must choose a suitable value for the free parameter €—not an easy problem. which are described in [114. If A has a defective eigenvalue near one.1 of this series. But unless the norm bears a reasonable relation to the coordinate system one is working in (the usual unit vectors for Example 2. There is no one. we have Matrix powers For a detailed treatment of matrix powers—exact and computed—see [ 114. though he gives no reference.9. 2. Let A be the diagonal matrix of eigenvalues of A and let X~l(t~l A)X = e"1 A + N be the Jordan form of €~l A. the perturbations Ei may create eigenvalues of magnitude greater than one. in which case the asymptotic behavior will not be described by Theorem 2. An important approach to this problem is the pseudospectrum In many instances." Ostrowski's proof is an elegant and amusing application of the Jordan form. stating that it "is unpublished. EIGENSYSTEMS is sound practice. Ch. the results may be disappointing or misleading.1 has lead to various attempts to bound maxfc || Ak\\. The existence of the hump illustrated in Figure 2.1].11). In the example of the last paragraph. §17.8. for sufficiently large e. but the gist of what is happening this. simple explanation. the defective eigenvalue near one will. 17].1960] attributes Theorem 2.36 CHAPTER 1.7 or in §1:1. and a nominally convergent matrix can even diverge. A matrix norm and a consistent vector norm are sometimes called compatible norms. The computed jth column of Ak satisfies where ||£"j||/m|| is of order of the rounding unit.7 to Frobenius. NOTES AND REFERENCES Norms For treatments of matrix and vector norms see the general references in §1.4. . cause the pseudospectrum to extend outside the unit circle. the powers of a matrix can exhibit erratic behavior. the location of the pseudospectrum in the complex plane explains the behavior of the matrix powers. Norms and the spectral radius In his Solution of Equations and Systems of Equations Ostrowski [ 194.4.

The perturbation theory of eigenvalues and eigenvectors is a very large subject. For example. Trefethen [276.uk/projects/pseudospectra/ 3. First. For more see the Pseudospectra Gateway: http://web. The first is a bound that establishes the continuity of the eigenvalues of a matrix. In fact. where ||£||/||yl|| is of the order of the rounding unit for the computer in question. the eigenvalues of any matrix are continuous functions of the elements of the matrix. 279]. 3. We will consider three topics. Nonetheless. The second topic is Gerschgorin's theorem. N. Second. VolII: Eigensystems . A knowledge of the sensitivity of the eigenpair will enable us to predict the accuracy of the computed eigenvalue. many of the algorithms we will consider return approximate eigenpairs (A.ox. There are two reasons for studying this problem.1 we will consider the perturbation of individual eigenvalues. We conclude with results on the perturbation of the eigenvalues of Hermitian matrices and the singular values of general matrices. a useful tool for probing the sensitivity of individual eigenvalues. the eigenvalues of the matrix are 1 ± ^/e.1.ac. as the following theorem shows. 277. PERTURBATION THEORY In this section we will be concerned with the sensitivity of eigenvalues and eigenvectors of a matrix to perturbations in the elements of the matrix. 3.SEC. We will treat these generalizations in Chapter 4.x) that satisfy (A + E)x = Xx.2 generalize to eigenspace. and we will only be able to skim the most useful results. THE PERTURBATION OF EIGENVALUES This subsection is concerned with the problem of assessing the effects of a perturbation of a matrix on its eigenvalues. and in §3. In §3. Most of the results in §3. the results are often of use in the design and analysis of algorithms. The continuity of eigenvalues A fundamental difficulty with the perturbation of eigenvalues is that they need not be differentiable functions of the elements of their matrix.comlab. the eigenvalues are continuous at € = 0. the current high state of the art is largely due to L.2 we will consider the perturbation of eigenpairs. PERTURBATION THEORY 37 Although the pseudospectrum has been introduced independently several times. and this expression is not differentiable at e = 0.

A cannot be complex. . Moreover. this does not preclude the possibility of two eigenvalues of A attaching themselves to a simple eigenvalue of A and vice versa. Then there is a permutation j\. since if it were its complex conjugate would also be an eigenvalue—a second eigenvalue that satisfies the bound. pairs up the eigenvalues of A and A in such a way that each pair satisfies the bound (3. though perhaps magnified by a large order constant. is the primitive nth root of unity.. It is fairly easy to show that for every eigenvalue A of A there must be an eigenvalue A of A satisfying the bound (3. Theorem 3.. the eigenvalues of are where u. . n such that • There are three points to be made about this theorem. i_ The exponent £ in the factor \\E\\% is necessary. . Let A be such an eigenvalue and let E be small enough so that the right-hand side of (3. the bound (3.. Garden variety eigenvalues are more likely to suffer a perturbation that is linear in the error. . EIGENSYSTEMS Theorem 3.1) is used more in theory than in practice. In particular.. . Thus a perturbation of order e has the potential of introducing a perturbation of order e« in the eigenvalues of a matrix.38 CHAPTER 1.. For this reason.1). ..1) is less than half the distance between A and its neighbors. Let the eigenvalues of A — A + E be A j . . Let A the (possibly multiple) eigenvalues be A i. it is important to stress that this example is extreme—the eigenvalue in question is totally defective.. jn of the integers 1. For example. each satisfying the bound (3. However. to an eigenvalue A of multiplicity m of A there correspond m eigenvalues of A.1. and vice versa. Then there is exactly one eigenvalue A of A satisfying the bound (for otherwise there would be too few eigenvalues of A to pair up with the other eigenvalues of A).. • The theorem has a useful implication for the behavior of the simple real eigenvalues of a real matrix under real perturbations. A n .. on the other hand.. A n . Hence: The simple real eigenvalues of a real matrix remain real under sufficiently small real perturbations. • The theorem establishes the continuity of eigenvalues in a very strong sense.1 (Eisner).1). However.1).

A cure for this problem is to derive bounds for individual eigenvalues. The statement about the disjoint union can be established by a continuity argument. say. which makes it too large for everyday work. Let & be a component of x with maximum modulus.. (Jt=i Q^ is disjoint from the other disks.SEC. It must therefore be large enough to accommodate the worst possible case. U(i) = \Ji=i Gji(t) is also disjoint from the other disks Qi(i). x) be an eigenpair of A. then that union contains exactly k eigenvalues of A. where D — diag(an.-. 3. Let A = D + B. Proof.1 is that it bounds the perturbation of all the eigenvalues of any matrix. Let A be of order n and let the GERSCHGORIN DISKS of A be defined by Then Moreover. we find that which says that A 6 (/.-. and they lie in the the (degenerate) disks ft(0). The eigenvalues of A(Q) are the diagonal elements a.-. Initially the union VolII: Eigensystems . . As t proceeds from zero to one.. Let (A. PERTURBATION THEORY Gerschgorin theory 39 The problem with the bound in Theorem 3.. if the union ofk of the setsQi are disjoint from the others. From the ith row of the equation Ax = Xx we have or Taking absolute values and remembering that |£j| < 1. and assume without loss of generality that £t = 1. Now suppose that. Theorem 3. the radii of the disks Qi(i) increase monotonically to the radii of the Gerschorin disks of A. ann). Gerschgorin's theorem provides a method for doing this.2 (Gerschgorin). Let Qi(t) be the Gerschgorin disks for A(t) = D + tB. Then for 0 < t < 1.

Hence. and their deviation is approximately the size of the error. U(t) can neither lose nor acquire eigenvalues.4-10"3.4e-03.4. which violates continuity. If A is regarded as a perturbation of diag(l.2e-03. Example 3.1).1. A has three eigenvalues \2. then the above examples show that the eigenvalues of A lie near its diagonal elements. and A4 that satisfy At first glance this would appear to be a perfectly satisfactory result. 1. EIGENSYSTEMS £/(0) contains exactly k eigenvalues of A(0). Hence.4e-04.0e-03.2). The remaining three disks are centered at 2 and are not disjoint.2. Let A be the matrix of Example 3. The first disk is disjoint from the others. -7. Then . 1.40 CHAPTER 1.7e-03. • Gerschgorin's theorem is a general-purpose tool that has a variety of uses.3. as t increases. 1.5e-03. it is especially useful in obtaining bounds on clusters of eigenvalues. For that would require an eigenvalue to jump between U(i) and the union of the other disks. Thus the Gerschgorin disks locate the eigenvalues near two fairly well. -9. Moreover. U( 1) = U?=i GK contains exactly k eigenvalues of A. and let P = diag(l(T3.3e-06. We can remedy this defect by the method of diagonal similarities. from which we conclude that A has an eigenvalue AI that satisfies\Xi — 1| < 2. which is illustrated in the following example. Consider the matrix The radii of the Gerschgorin disks for this matrix are 2. AS. Example 3. However.1. but they badly overestimate the deviation of the eigenvalue near one. 4.6e-04.3. 2. In perturbation theory. the actual deviations are 1.

9. A trivial but useful consequence is that Another important consequence is the following interlacing theorem. PERTURBATION THEORY and the radii of the Gerschgorin disks are 41 2. Let the Hermitian matrix A have eigenvalues Then and The formulas (3. Nonetheless. On the other hand.3) we know that the remaining eigenvalues are within 4. • Although we have proceeded informally. whose eigenvalues are the same as DAD~l. choosing the matrix D by inspection. Hence we know that A.5) are called respectively the min-max and the max-min characterizations. the first disk remains disjoint from the others. The similarity transformation DAD~l has reduced the radius of the first disk while increasing the radii of the others. What we have shown—at least for this matrix—is that if a diagonal element is well separated from the other diagonal elements then an off-diagonal perturbation moves the eigenvalue by a small amount that is generally proportional to the square of the perturbation. 3.4-10"6 of one. • The matrix in Example 3. see the notes and references. 6. has an eigenvalue within 2.4e-06.1e-02. For more.3 is an off-diagonal perturbation of a diagonal matrix. They have far-reaching consequences. it is possible to formalize the method of diagonal similarities. VolII: Eigensystems .4e-01. from (3.(MO~3 of (wo.SEC. The first result is a far-reaching characterization of the eigenvalues of a Hermitian matrix. 8. There are two comments to be made about this example.7e-02.4) and (3. Hermitian matrices In this subsection we will state without proof some useful results on the eigenvalues of Hermitian matrices.5 (Fischer). Theorem 3.

Hence we have the following corollary. The final consequence of Fischer's theorem is the following theorem on the perturbation of eigenvalues. In Theorem 3.e.8. In Theorem 3. The inequality (3. Corollary 3. then U^AU is a principal submatrix of A of order m. In the terminology of perturbation theory.6. If U consists of m columns of the identity matrix. Let A beHermitian of order n with eigenvalues U Let U e Cnxm be orthonormal and let the eigenvalue ofUHAU be Then The theorem is called the interlacing theorem because when m = n-1. then the eigenvalues m of B satisfy (3. It is noteworthy that like Theorem 3.1 this theorem is about pairs of eigenvalues. they interlace. If B is a principal submatrix of A of order m.8) says that the eigenvalues of a Hermitian matrix are perfectly conditioned—that is. Theorem 3.. Let A and A = A + E be Hermitian. the pairing is explicitly determined by the order of the eigenvalues. A. satisfy i.1 the pairing is not explicitly given.8. there is a difference. a perturbation in the matrix makes a . EIGENSYSTEMS Theorem 3. However.6). Let the eigenvalues of A.42 CHAPTER 1. the eigenvalues \i and //. (3.7.7) gives an interval in which the ith eigenvalue must lie in terms of the smallest and largest eigenvalues of E. and Ebe Then Consequently.

andlO"3.0180e-03. 1. 1.8e-07. 1.9.6e-05.8. 3. 9. Example 3. The example not only illustrates the relative sensitivity of the smaller eigenvalues. 1. 2. 1.8e-07. The absolute errors in the eigenvalues. 1. In the notation of Theorem 3.9e-04.8 holds for the Frobenius norm. Vol II: Eigensystems . are in general smaller than the bound would lead one to expect.0002e-01.SEC. then the quantity is accurate to about k decimal digits.8e-05. a stronger result is also true. PERTURBATION THEORY 43 perturbation that is no larger in the eigenvalues.6e-03. The matrix has eigenvalues 1. imply that all the eigenvalues ar determined to high relative accuracy.10"1.9) implies that the smaller th eigenvalue the fewer the accurate digits. unfortunately.9e-05.9839e-03.8e-02. as the following example shows. the relative error in At is A rule of thumb says that if the relative error in a quantity is about 10 *. Thus (3.10~2. However. (We have reported only the first five figures of the element of A. 1. The perfect condition of the eigenvalues of a Hermitian matrix does not. Since the Frobenius norm of a matrix is greater than or equal to its 2-norm. The orem 3. In fact. but it shows the limitations of bounds based on norms.) If we perturb A by a matrix E of2-norm 10~4 we get a matrix whose eigenvalues are l. 1. The relative errors in these eigenvalues are 2. It is seen that the relative errors become progressively larger as the eigenvalues get smaller. Theorem 3.10 (Hoffman-Wielandt).OOOOe+00.

The angle between x and y will play an important role in the perturbation theory to follow. In particular.44 3. In this case A corollary of this assertion is that if x is a simple eigenvector then its left eigen vector y is not orthogonal to x. This insures that (up to a scalar factor) we have a unique eigenvector to study. It turns out that the perturbation theory for an eigenvector goes hand in hand with that of its eigenvalue. then there is a left eigenvector y of A such thatyHx = 1. Left and right eigenvectors Let (A. In R2 and R3 it is easy to see that the constant 7 is the cosine of the angle between the lines subtended by x and y. In general we define the angle between nonzero vectors x and y by Since in (3. z) be a simple eigenpair of A. By Theorem 1. It follows that: If (A. we have yHx = 1.10) above (x X) * = (y YE)H. EIGENSYSTEMS Up to now we have been concerned with eigenvalues. We will restrict ourselves to a simple eigenpair (A. x).18 there is a nonsingular matrix (x X) with inverse (y y)H such that It is easily seen by writing that yRA — Aj/H and Y^A = MYE. In this section we will examine the sensitivity of eigenvectors to perturbations in the matrix. and we will actually treat the perturbation of eigenpairs. From the Cauchy inequality we know that where 7 < 1 with equality if and only if x and y are linearly dependent. SIMPLE EIGENPAIRS CHAPTER 1. . x) is a simple eigenpair of A.2. y is the left eigenvector of A corresponding to A.

we are given a simple eigenpair (A. Since yHX = 0. we have yHAXp = XyHXp = 0 and yHXp = 0. we get the linearized equation We must now solve the linearized equation for (p and p. x) that approaches (A. Specifically. We must now set up an equation from which (p and p may be determined. The key to a first-order analysis is to note that since (p and p approach zero along with Et products of these terms will become negligible when E is small enough. It can be shown (Theorem 3. multiply by yE. we can write where (p is to be determined. then there is a simple eigenpair (A. Ignoring such products.13) by Y H . The natural equation is Ax — Xx. and use the facts that Vol II: Eigensystems .11 below) that if E is small enough. To solve for <p. The first order of business is to find suitable representations of A and x.10). Here we will require that yHx = 1. Since A is a scalar. x) as E approaches zero. Thus to get a unique representation we must impose a normalization on x. Since yHx = 1. We will suppose that A has been block-diagonalized as in (3. This is easily seen to be equivalent to representing x in the form where p is to be determined.SEC. and a small perturbation A = A -f E. The problem of representing x is complicated by the fact that x is determined only up to a scalar factor. we have Unfortunately this equation is nonlinear in the parameters (p and p. Before establishing that fact. we will firs address the problem of approximating the perturbed eigenpair. 3. we have To solve for p multiply (3. x) of A. or Since Ax = Ax. PERTURBATION THEORY First-order perturbation expansions 45 We now turn to the perturbation theory of simple eigenpairs.

) under the perturbation A + E of A is Rigorous bounds We have derived approximations to the perturbed eigenpair. see the notes and references. a. which we give without proof. Hence AJ — M is nonsingular and In summary: The first-order expansion of the eigenpair (A. Since A is a simple eigenvalue. the eigenvalues of M are not equal to A. However. Theorem 3. EIGENSYSTEMS to get Mp + YHEx = \p. nor have we shown how good the approximations actually are. For more. we have not established that a perturbed pair exists to be approximated. fills in these gaps. Let (x X) be a nonsingular matrix such that where and set Let || • || be a consistent norm and define If then there is a scalar (p and a vector p such that is an eigenpair of A with . The following theorem. x) be a simple eigenpair of A.11.46 CHAPTER l. Let (A.

we will first describe the qualitative behavior. Since A + yuEx = yEAx. it follows from (3. but they do not exhibit the 0(|| £ ||*) behavior that we have observed in defective matrices [see (3. 3. Generalizations of this scalar quantity will play an important role in algorithms for large eigenproblems. so that E represents a perturbation of size e in the (i. we have that |y?n|. ||/12||. Specifically.20) implies that A is a differentiable function of the elements of its matrix. we can write The quantity yHAx in (3. normalized so that yHx = 1. PERTURBATION THEORY Moreover.20) that In other words. Moreover from (3. But to orient ourselves. 47 and There are many things to be said about this theorem. the simple eigenvalues are not only continuous. In matrix terms: If (A. and ||F||22 are all 0(||£||). j)-element of A. enough to give a subsubsection to each.18) it follows that \\p\\ is also O(||E||). For if we take E = ee^e. since <p = A — A. then From this it follows that dX/daij is y{Xj.21) is called a Rayleigh quotient. then VolII: Eigensystems . Turning first to the eigenvalues..20) our first-order approximation yHEx to (p is accurate up to terms of 0(||. ||/£||. Qualitative statements The bounds in Theorem 3.E||2). 2) is a simple eigenpair of A and y is the corresponding left eigenvector.11 say very specific things about how a simple eigenpair behaves under perturbation. Equation (3.2)] Again by (3.SEC.

the bound on A was derived by taking norms in a sharp approximation to the perturbed eigenvalue A. we begin with the eigenvalue A. Since A = A + yHEx + (<p. If it is small. we would not have ended up with as elegant a condition number as the secant of the angle between the left and the right eigenvectors.M) The condition of an eigenvalue l /2i to be a very good approximation to x. and the condition number will be an accurate reflection of the sensitivity A. it provides a meaningful condition number for the perturbed eigenvalue.4.20) By (3. Such numbers are called condition numbers. as we have just seen. If a condition number for a quantity is large. Moreover. the quantity is said to be /// conditioned. that it is easy to contrive large perturbations that leave A unchanged.) Thus: The condition number of a simple eigenvalue is the secant of the angle between its left and right eigenvectors. they are useful only to the extent that the bound that generates them is sharp. the condition number changes with the norm. The reason is that the vector p can be large. In our case. however. we see that the secant of the angle between x and y is a constant that mediates the effect of the error E on the eigenvalue A. (For a general discussion of condition numbers see §1:2. Turning now to more informative bounds.19) it follows that Thus we can expect x -f X(\I .yHEx). if we had not chosen the 2-norm. /^ 24) Two caveats apply to condition numbers in general. since x — x = Xp. we have from (3. In particular. First. .48 CHAPTER 1. EIGENSYSTEMS Turning now to the eigenvector x. from (3. and. Second. But it does not directly provide a condition number for the perturbed eigenvector.11 is useful in establishing the existence of a perturbed eigenpair. Keep in mind.12).Hence Ignoring the 0(|| JE?||2) term in (3. The condition of an eigenvector Theorem 3. even when the perturbation in the directions between x and x is small. for most norms in common use the secant gives a reasonable idea of the sensitivity. we have so that the eigenvector is also continuous. On the other hand.23). Hence the bound will generally be in the ballpark. we know that ||x||2||y||2 = sec ^-(x-> y}. the quantity is said to be well conditioned.

Hence We are now in a position to bound the sine of the angle between x and x. Let x and x be nonzero. we have But \XHX\ = cos(z. let S be nonsingular.15) to block diagonal form. Then \\E\\- Proof. we will actually calculate the sine of the angle.12.15) by XS and Y by YS~H. From the unitary invariance of || • |J2. We begin by choosing a distinguished basis for the reduction (3. we will. Lemma 3.SEC. S~1MS). with S chosen so that the new Y is orthonormal.11 assume thatY is orthonormal. VolII: Eigensystems . and that the condition (3. Then we can replace X in the block diagonal reduction (3. we can make p as large as we like compared to The cure for this problem is to bound the angle between x and x. For the rest of this subsection we will assume that this change has been made.13. Theorem 3. It then follows that the approximation p to p in (3. PERTURBATION THEORY 49 To see this. and let Y be an orthonormal basis for 7£(x)-1. we have Since A 4 M. so that the block diagonal form becomes diag(A.14) becomes S~lp. Then the matrix (x Y) is unitary. In Theorem 3. Specifically. By choosing S appropriately. The following lemma shows how do do this. replace X by XS and Y by Y5~~H. Although we defined the angle between two vectors in terms of the cosine. as above. Then Proof.17) is satisfied in the 2-norm. We have (A + E)x = \x or Multiplying this relation by Yu and recalling that YHA = MYU. 3.£). Without loss of generality assume that ||or||2 = \\x\\z = 1.

Specifically. as the following example shows. By Theorem 2. For if an eigenvalue of M is near A. Actually. We will now consider some of the properties of the functions sep. Hence we may rewrite (3. Properties of sep The first and most natural question to ask of sep is how it got its name.12 and the definition (3. The answer is that it is a lower bound on the separation of A from the spectra of M. Otherwise put if an eigenvalue of M is near A we can expect the eigenvector x to be sensitive to perturbations in A. sep l (A. Theorem 3.14.15.11 it follows that A —»• A as E -*• 0. Let A = 0 and let M = Wm be themxm upper triangular matrix . In any consistent norm. EIGENSYSTEMS and by Lemma 3. sep( A. M) can be far smaller than the separation of A from the spectrum of M.50 CHAPTER l. Example 3. M) is a condition number for the eigenvector x. • Thus sep(A.16) From Theorem 3. Proof. then a small perturbation of A could make them equal and turn the one-dimensional space spanned by x into a two-dimensional space. this is just common sense. we have the following theorem. Unfortunately.25) in the form Consequently.7 The result now follows on inverting this relation. M) is a lower bound on the separation of A and the eigenvalues of M.

26). goes quickly to zero in spite of the excellent separation of A = 0 from the eigenvalues ofWm. It follows that the angle between x and y is zero.16. as a consequence of the following theorem. is perfectly conditioned. Hence which. and sec Z(z. then Thus for Hermitian matrices the physical separation of eigenvalues—as distinguished from the lower bound sep—indicates the condition of its eigenvectors. Now in the decomposition (3. Let A be real and M Hermitian. [Hint: See (3.11. whatever its multiplicity. This result. the distance between A and the spectrum of Wm is one. Hermitian matrices If A is Hermitian then the right eigenvector x corresponding to the eigenvalue A is also the left eigenvector y. On the other hand. VolII: Eigensystems . The simple eigenvectors of a Hermitian matrix also behave nicely. Thus the simple eigenvalues of a Hermitian matrix are optimally conditioned. y) = 1—the smallest possible value. is an eigenvector of A and x is the corresponding eigenvector of A = A + E.] Theorem 3. with increasing m. in the co-norm But sothatyw-^loo = 2 m . we may take X = Y. It is an instructive exercise to use Wm to construct an example of this phenomenon.8). is weaker than (3.15) of Theorem 3. 3. It follows that if a. PERTURBATION THEORY 51 Since all the eigenvalues ofWm are one. Then in the 2-norm. This example suggests the possibility that an eigenvector whose eigenvalue is well separated from its companions can still be sensitive to perturbations in its matrix. so that M is Hermitian. which shows that any eigenvalue of a Hermitian matrix.SEC. whose proof is left as an exercise.J . of course.

The technique is quite flexible. This fact has been repeatedly rediscovered. [119]. [18]. Books whose titles include the words "matrix analysis" or "matrix inequalities" usually contain material on perturbation theory—e. VIII]. Bhatia [22] treats the perturbation of eigenvalues with special emphasis on Hermitian and normal matrices.285]. Also see [23.52 3. The approach taken in this section is along the line of Stewart and Sun's Matrix Perturbation Theory [269]. Theorem 3. 269] for further references and historical notes. and Krause [24]. Gerschgorin theory Gerschgorin's theorem [82.4 of this series.4. Eigenvalues of Hermitian matrices Much of the material presented here can be found in §1:1.g. Kato's Perturbation Theory for Linear Operators [146] presents the theory as seen by a functional analyst (although it includes a valuable treatment of the finite-dimensional case).70] as improved by Bhatia. as Wilkinson has demonstrated in his Algebraic Eigenvalue Problem [300]. leading Olga Taussky to write a survey entitled "A Recurring Theorem on Determinants" [274].118. In his subsequent Matrix Analysis [23] he treats eigenspaces of Hermitian matrices. Continuity of eigenvalues For an extensive survey of general results on the perturbation of eigenvalues of nonnormal matrices. EIGENSYSTEMS Most texts on numerical linear algebra contain at least some material on the perturbation of eigenvalues and eigenvectors (especially [121] and [300]). for the most part. Gerschgorin also showed that an isolated union of k disks contains exactly k eigenvalues and introduced the use of diagonal similarities to reduce the radii of the disks. which will be treated in §2. Chapter 4. see Bhatia's Matrix Analysis [23.1 is due to Eisner [69. NOTES AND REFERENCES General references CHAPTER 1.119. Ch. Simple eigenpairs The results in this subsection were. [168]. Eisner. 1931] is a consequence of the fact that a strictly diagonally dominant matrix is nonsingular. developed for eigenspaces. . [39] [118]. [20]. Both books are a valuable source of references.. For a formalization of the procedure see [175. Left and right eigenvectors The fact that left and right eigenvectors corresponding to a simple eigenvalue cannot be orthogonal is well known and can be easily deduced from a Schur decomposition. There are several books devoted more or less exclusively to perturbation theory.3. [177].

PERTURBATION THEORY 53 The converse is not true—a multiple eigenvalue can fail to have nonorthogonal left and right eigenvectors. In fact. An analytic version of the result. Third. also deducible from a Schur decomposition. Chapter 4]. the condition of a simple eigenvalue is the norm of its spectral projection. the bound becomes where Sep The justification for the introduction of sep to replace the actual separation of eigenvalues in the complex plane is the simplicity of bounds it produces.26).11 is a consequence of more general theorems due to the author [250. This result generalizes to representations of a matrix on an eigenspace [see (2. Ifx and y are normalized so that yHx = 1. where x and y are normalized right and left eigenvectors corresponding to a simple eigenvalue A. formulas for derivatives can often be read off from a perturbation expansion [the assertion (3.252]. In matrix analysis first-order expansions are useful in three ways. the secant of the angle between x and y. If Y is not orthonormal. A classroom example is the behavior of the pendulum under small oscillations. Rigorous bounds Theorem 3. pp.22) is an example]. the approximations they furnish are often used in deriving and analyzing matrix algorithms. and on the basis of a first-order perturbation analysis notes that the sensitivity of A is determined by l/|s|. I/ \s\ is. It is so called because (xy^)z is the projection of z on 'Jl(x) along ^(y)1-. They have also been used to generate iterations for solving nonlinear equations.SEC. We will return to these bounds when we treat eigenspaces in Chapter 4. then the matrix must be near one with a multiple eigenvalue [303].?/).13 is a generalization to non-Hermitian matrices of the sin# theorem of Davis and Kahan [54]. First-order perturbation expansions First-order perturbation expansions have long been used in the physical sciences to linearize nonlinear differential equations. the matrix xyH is the spectral projector corresponding to A. Condition numbers Wilkinson [300. in perturbation theory a first-order expansion can yield very sharp asymptotic bounds. 3. Theorem 3. Its drawback is Vol II: Eigensystems . Since ||a:yH||2 = IHMMh = sec/(#. 68-69] introduces the quantity s = yEx. a first-order expansion is often the best way to get a handle on the perturbation theory of a new object. Second. First. although for this to happen the eigenvalue must be defective. is that if the left and right eigenvectors of a matrix are nearly orthogonal. of course. Newton's method is the prototypical example.

Over the last two decades researchers have begun to investigate these and related problems. It is also impossible to present a complete bibliography. bidiagonal matrices. This phenomenon is unexplained by our perturbation theory. small eigenvalues may be well conditioned. but overlap—often considerably. l)-element is larger than the eigenvalue itself. On the other hand it has the nice property that where || • || is the norm used to define sep.9 suggests that the small eigenvalues of a symmetric matrix are not well conditioned in a relative sense under perturbations of the matrix. if the perturbations are structured. Relative and structured perturbation theory Example 3. 2.0000-10 7. however.173.g. tridiagonal matrices. as Example 3. The above categories are not disjoint. But we can discern some trends.54 CHAPTER 1. the matrix and its perturbations must have some special structure. 58. even though the perturbation in the (1. where E and F are small. If we replace A by we get eigenvalues 1. the matrix whose eigenvalues are (to five figures) 2.995740 '. Generally. 3. In other words. Relative perturbation of eigenvalues: Determine the relative error in eigenvalues resulting from perturbations in the matrix.. . 4. 1. Consider. Multiplicative perturbations: i.9995 and 4. Perturbation theory for graded matrices. Perturbation theory for matrices with structured nonzero elements: e. perturbations of the form (/ + E) A(I + F). and it is impossible to survey it here.0000 and 5. Thus. 112. etc. Componentwise perturbation of eigenvectors: Determine the error (preferably relative) in the individual components of eigenvectors resulting from perturbations in the matrix. 113. This is generally true of normwise perturbation..166. The subject is still in flux.15 shows.e. 266]. for example. a small perturbation in the arguments of sep makes an equally small perturbation in its value. 5. the smallest eigenvalue is insensitive to small relative perturbations in the elements of A. 125. EIGENSYSTEMS that sep is by no means a simple quantity. The following references emphasize surveys or papers with extensive bibliographies: [56. 165.

The purpose of this chapter is to describe the QR algorithm and its variants. to any polynomial p(x) = xn — an-ixn~l — aix — ao there corresponds a companion matrix whose characteristic polynomial is p. In §2 we will present the general algorithm in the form generally used for complex matrices. Thus if we had a noniterative general-purpose eigenvalue algorithm. the singular value decomposition of a general matrix. Thus the history of eigenvalue computations is the story of iterative methods for approximating eigenvalues and eigenvectors. We will therefore begin with an exposition of those methods.2 THE QR ALGORITHM All general-purpose eigenvalue algorithms are necessarily iterative. with proper attention to detail. can compute a Schur form of a general matrix in 0(n3) operations. This algorithm. we could apply it to the companion matrix of an arbitrary polynomial to find its zeros in a finite number of steps—in effect producing a formula for the roots. An important theme in this story concerns methods based on powers of a matrix. and the generalized Schur form of a matrix pencil. The algorithm is closely related to two algorithms that are important in their own right—the power method and the inverse power method. And the underlying theory of the method continues to suggest new algorithms. Other forms of the algorithm compute the eigenvalues of and eigenvectors of a Hermitian matrix. Specifically. This statement is a consequence of Abel's famous proof that there is no algebraic formula for the roots of a general polynomial of degree greater than four. The climax of that theme is the QR algorithm—the great success story of modern matrix computations. It can be adapted to reduce a real matrix in real arithmetic to a real variant of the Schur form. and in the following section we will present the implicitly 55 .

its various manifestations all have a family resemblance.1) dominates. Nonetheless. A word on the algorithms. Shifts of origin 7. we devote this subsection to a detailed treatment of the algorithms.1. For this reason we will lavish attention on one variant—the explicitly shifted QR algorithm for complex matrices—and sketch the other variants in greater or lesser detail as is appropriate. so that If |Ai| > |A. We begin with the power method. 1. n) be a complete set of eigenpairs of A. The Rayleigh quotient 6.. 1. The symmetric eigenvalue problem and the singular value decomposition will be treated separately in the next chapter. We will treat these problems in the following order.56 CHAPTER 2. Choice of starting vector 4. Computation and normalization 3. they have a lot to teach us about the art of finding eigenvectors. Convergence of the method 2. Implementation 8. THE QR ALGORITHM shifted form for real matrices. 1. . THE POWER METHOD The basic idea behind the power is simple. In spite of their simplicity. so that Aku becomes an increasingly accurate approximation to a multiple of the dominant eigenvector x\. Finally. the first term in (1. then as k —>• oo. THE POWER AND INVERSE POWER METHODS The power and inverse power methods are extremely simple methods for approximating an eigenvector of a matrix. its implementation raises problems that will recur throughout this volume. Although the basic idea of the power method is simple. Let A be a nondefective matrix and let (A. Convergence criteria and the residual 5. Since the Xi are linearly independent. any nonzero vector u0 can be written in the form Now Akxi = X+Xi.. in §4 we treat the generalized eigenvalue problem.. The QR algorithm is a complicated device that generates a lot of code. For this reason and because of the connection of the methods to the QR algorithm.-1 (i > 2) and 71 ^ 0.. The effects of rounding error . £.-)(« = 1.

all that is needed is that A have a single dominant eigenpair ( AI .SEC.12. Let A have a unique dominant eigenpair(\i. Since this eigenvalue is necessarily simple. we can write any vector UQ in the form In this notation we have the following theorem. x\) and let A be decomposed in the form (1. Chapter 1.2). we can find a nonsingular matrix X = (x\ X^) such that Without loss of generality we may assume that Since x and the columns of X form a basis for C n . Chapter 1. Let UQ be a nonzero starting vector. THE POWER AND INVERSE POWER METHODS 57 Convergence In motivating the power method. x\) in the sense of Definition 2. Then by Lemma 3. But Vol II: Eigensystems . there is a constant a such that where p(M) is the spectral radius ofM. It is easy to verify that Let the columns of Y form an orthonormal basis for the subspace orthogonal to x\. In fact. Theorem 1. 1/71 ^ 0. then In particular for any e > 0.1. by Theorem 1. Chapter 1. where X = (x\ X^) satisfies (1. we made the simplifying assumption that A is nondefective. decomposed in the form (1.6. 1. Proof.4).3).19.

A tridiagonal matrix of order n has the form illustrated below for n = . although its asymptotic rate will be unchanged. if the powers of M are badly behaved. k = 0 2. We will return to this point later.58 and CHAPTER 2. the error conK. end while For a dense matrix. it will have many zero elements. while (not converged) 3. see Example 2. we would end up first computing Bk — Ak and then computing wjt = BkUQ.6). • From (1. If the matrix is represented by an appropriate data structure. Computation and normalization We have written the kth iterate u^ in the power method in the form ilk = Akv.9. Substituting these inequalities in (1. • Equation (1. if A has a complete system of eigenvectors with its eigenvalues ordered so that |Ai| > |A2| > • • • > |An|. Chapter 1. However. 1. uk+i = Auk (1.11. we see that if 71 is small in comparison to c2. In many applications the matrix A will be large and sparse—that is.2. then the convergence is as |A 2 /Ai| fc . For example. Chapter 1. • Three comments on this theorem. we may have to wait a long time to observe this asymptotic convergence. THE QR ALGORITHM IfriA?*! + X2Mkc2\\2 > |7l||A?| . then the convergence will be retarded.6). where a flam denotes a pair of floating-point additions and multiplications.||Mfc||2||C2||2. • By Theorem 2. The inequality (1. Since p(M) < |Ai|.6) follows directly from Theorem 2.6). This procedure requires about kn3 flam to compute Uk—a grotesquely large operation count. we get (1. k = k+l 5.7) and dividing by |7i||Ai|.8) 4. each iteration of this program requires about n2 flam.5) shows that the error in the eigenvector approximation produced by thepower method behaves like [IIM^I^/IAil^]. it may take far fewer than n2 flam to compute its product with a vector. An alternative is to observe that iik+i = Auk. Example 1. Let us consider the example of a tridiagonal matrix. a statement made precise by (1.Q. this operation count does not tell the entire story.This suggests the following algorithm. k verges to zero at an asymptotic rate of[p(M)/\\i\] . Chapter 1. If we were to naively implement this formula. In this case we say that UQ is deficient in\\. if M is nondefective then we may take e = 0 in (1.9.5). However.

SEC. i. Then asymptotically we have Uk — 10*70^0. the vectors Uk will quickly underflow. In most cases the savings will be small compared to the work in forming Auk. 2. k = k+l (T = 1/KH Ui — auk end while Since floating-point divides are expensive compared to multiplies. As it stands the program (1. 1. suppose that A! = 10. 4. The cure for this problem is to note that only the direction of the vector Uk matters: its size has no relevance to its quality as an approximate eigenvector. e. then the following program computes y = Tx.. 7. Instead we multiply it by l/||tifc||. Usually.8) will tend to overflow or underflow. 3. we do not divide Uk by \\uk\\. We can therefore scale iik to form a new vector Uk that is within the bounds of the machine.. Thus we have reduced the work to form the product Ax from 0(n2) to O(n). Uk = Uk/'\\Uk\\> Vol II: Eigensystems . For example. and c. b. 56. if AI = 0. 2.1. Uk is scaled so that \\Uk\\ = 1 in some relevant norm. k=Q while (not converged) Mfc+i = Auk 4. 1. Similarly. We shall not always get as dramatic improvement in operation counts as we did in the above example. This can be done in the course of the iteration as the following program shows.g. But it serves to show that algorithms based on matrix-vector multiplies can often be economized. 5. 3.e. the magnitude of Uk increases by a factor of ten for each iteration. 1. Such a process cannot continue long without overflowing the machine on which it is run. THE POWER AND INVERSE POWER METHODS 6: 59 If the nonzero diagonals of T are stored as indicated above in arrays a.In the sequel we will dispense with such minor economies and write. y[l] = a[l]*a:[l] + 6[l]*a:[2] for z = 2 to n—1 y[i] — c[i—l]*x[i—1] + a[i]*^[z] + 6[z]*x[z'4-l] end for t y[n] = c[n-I]*x[n-l] + a[n]*z[n] This program has an operation count of In fladd+3n flmlt.

if UQ is random.. then it can be shown that -I TT Since 7j = y^UQ and c^ . Convergence criteria and the residual The chief problem with determining the convergence of a sequence of vectors u^ is that the limit—in our case x\ —is unknown. The reasoning is as follows. A 2 . a natural criterion is to stop when \\Uk — Uk-i\\ is sufficiently small. On the other hand. It it not surprising that a small value of 71. -.g. If that eigenvector is not the dominant eigenvector. one should avoid starting vectors consisting of small integers in a pattern —e. if we are trying to find the dominant eigenvectors of a sequence of nearby matrices A\. In fact this ratio appears explicitly in the convergence bound (1.5). For example. For example. In some cases one will have at hand an approximation to the eigenvector x\.Y^UQ. It is therefore important that the starting vector UQ should have a substantial value of 71. As a rule. which makes the ratio large. If we write (x i X2) = (y\ YI) . where 0 < a < l. then the dominant eigenvector of Ak will furnish a good starting approximation to the dominant eigenvector of Ak+i. the structure of the problem may actually dictate such a choice. For example. the vector e consisting of all ones is often a good choice for a matrix whose elements are positive.. If the sequence converges swiftly enough. can retard convergence.then or . we may expect the ratio ||c2/7i ||2 to vary about the quantity Lacking special knowledge about x\.. this is about as good as we can expect. The problem is that a highly structured problem may have such a patterned eigenvector. Such approximations are natural starting values for the power method. Thus we cannot simply quit when the norm of error ek = Uk — x\ is sufficiently small. When no approximation to x i is available. then 71 will be zero. THE QR ALGORITHM The ratio ||c2||/|7i| from the expansion UQ = jiXi + X^c? is a measure of how far the starting vector UQ is from the dominant eigenvector x\. if ||eJ| < alle^-ill. a vector whose elements alternate between plus and minus one.60 Choice of a starting vector CHAPTER 2. it is customary to take UQ to be a random vector—usually a vector of random normal deviates.

SEC. Only if we have an estimate of a — which is tricky to obtain on the fly — can we relate the distance between successive approximations to the error. An alternative is to compute the residual where ^ is a suitable approximation to AI. Although the residual does not give us a direct estimate of the error. Moreover. Vol II: Eigensystems . 1. THE POWER AND INVERSE POWER METHODS 61 When a is.12). Theorem 1.5. Let Then there is a matrix such that and If A is Hermitian and then there is a Hermitian matrix satisfying (1. it gives us something almost as good. as the following theorem shows (for generality we drop the iteration subscripts). 0.3. and stop when r/t is sufficiently small. say. \\ek\\ can be no larger than twice ||Uk — Uk_i ||. But if a is near one the error can be large when \\Uk — Uk-i \\ is small.

in which case A will be contaminated with rounding error—errors of order of the rounding unit. For example. we have not specified how to choose //.u to determine convergence of the power method.Since the remaining eigenvalues are zero. from which (1.11) and (1.12). the equalities (1. . we naturally take the vector u to be the fcth iterate Uk. we may take so that E alters only the diagonal elements of A. Now if the residual norm is smaller than the error level. A may be computed.13) of IJL makes r orthogonal to u. the inaccuracy is no greater than that induced by the error already present. For general results on such structured backward perturbations see the notes and references. The proof in the non-Hermitian case is a matter of direct verification of the equalities (1. Optimal residuals and the Rayleigh quotient In computing the residual r = A u—fj.3 gives a backward perturbation E that is unsuitable for sparse matrices. A natural choice is the value of \i that minimizes the residual in some norm. As is often the case.62 CHAPTER 2.12) follows immediately. In most problems we will not be given a pristine matrix A. A may be measured. in which case A will have errors that are. This is not the whole story. Thus Theorem 1.E||p is the sum of the eigenvalues of ERE. Fortunately. if your results are inaccurate. THE QR ALGORITHM Proof. the problem is most easily solved when the norm in question is the 2-norm. u) is an exact eigenpair of a nearby matrix A + E. For the norm of E.However. in general. The justification for this criterion is the following. instead A will be contaminated with errors. In many applications the matrix A will be sparse. In other words. then any errors in the approximate eigenvector can be accounted for by a perturbation of A that is smaller than the errors already in A. For example. Now the zeros in a sparse matrix usually arise from the underlying problem and are not subject to perturbation. Again.14) follow from the fact that \\E\\\ is the largest eigenvalue of E^E and ||. u) an exact eigenpair of A. For the Hermitian case we use the fact that the choice (1.10) will not generally share the sparsity pattern of A—in fact it is likely to be dense. But the matrix E in (1. much larger than the rounding unit. then the pair (//. there is more than one E that will make (/^. note that From this it follows that r and u are orthogonal eigenvectors of EEE with eigenvalues IHH/IHII. • The theorem says that if the residual is small. It suggests that we stop iterating when the residual is sufficiently small.

This characterization of the optimal residual norm will prove useful in the sequel. • The proof characterizes the norm of the optimal residual as the norm of g in the partition (1. The quantity WH Au/u^u is an example of a Rayleigh quotient. Thus in the power method.2. Letu ^ 0 and for any yu setr M = Au — ^LU. Chapter 1. This quantity is clearly minimized when fj. we make the following definition. Then the quantity is called a RAYLEIGH QUOTIENT. Generalizing from these two Rayleigh quotients. the Rayleigh quotients u^Auk/u^Uk provide a sequence of increasingly accurate approximations to A.4. Definition 1. where we will treat the approximation of eigenspaces and their eigenblocks. then the Rayleigh quotient reproduces that eigenvalue. Let (u U) be unitary and set Then Since || • ||2 is unitarily invariant. This Rayleigh quotient provides a highly accurate approximation to the corresponding eigenvalue of A + E. — v — WH Au. Let u and v be vectors with v^u ^ 0. is established by direct verification. If either u or v is an eigenvector corresponding to an eigenvalue A of A. There it had the form yH(A + E)x where x and y (normalized so that yHx = 1) were left and right eigenvectors corresponding to a simple eigenvalue Xof A. then on to the main point. l. By continuity when u or v is near an eigenvector. Chapter 4. • A brief comment. Without loss of generality we may assume that 11 u \ \2 = 1. We will return to Rayleigh quotients in §2. THE POWER AND INVERSE POWER METHODS 63 Theorem 1.SEC.21).16). Proof. Vol II: Eigensystems .5. The orthogonality of r and fj. We have already seen another Rayleigh quotient in (3. Then ||rM||2 is minimized when In this case r^ J_ u. the Rayleigh quotient will approximate the corresponding eigenvalue.

0. If we shift A by AC = —0.49. In the next subsubsection we will consider a very powerful spectral enhancement—the shift and invert operation. then the powermethod converges with ratio 0.. 1. end while Algorithm 1. = x^y 6. x}. • Both the residual and the next iterate are computed from the intermediate vector . this algorithm iterates the shifted power method to convergence.64 CHAPTER 2. Implementation Algorithm 1. /j. a shift AC. 1/3 is nearly the best convergence ratio we can obtain by shifting.5. whose eigenvalues \\ — AC. x = x/\\x\\2 3. and a nonzero starting vector x. But it can only improve performance by so much. a convergence criterion e. if (11 r 112 /Anorm < e) leave while fi 10... If this ratio is near one.99. while (true) 4. the eigenvalues become 1. \n — K are shifted by the constant K.99. Shifting has the advantage that it is cheap: we can calculate (A . r = y — /j. However. ar = ar/||ar||2 9. For example. y — Ax 5. THE QR ALGORITHM Given matrix A.5. x = y — KX 8. There are three things to say about this straightforward program. Sometimes we can improve the rate of convergence by working with the matrix A — AC/. we pay for this power in the coin of increased computational costs. Anorm = \\A\\p 2.1: The shifted power method o Shifts and spectral enhancement If A is nondefective and the eigenvalues of A satisfy then the power method almost always converges to the dominant eigenvector x\ with rate |A 2 /Ai|.5. if the eigenvalues of A are 1. In the above example.K!)U in the form Au — KU at little additional cost. . and the convergence ratio becomes | —a dramatic improve This shifting of the matrix to improve the disposition of its eigenvalues is an example of spectral enhancement. the convergence will be slow.1 implements the shifted power method.x 7. returning the approximate eigenpair (/z. . and —0. and -0.

Chapter 1]. However. We will suppose that the computations are performed in standard floating-point arithmetic with rounding eM. Chapter 1. • On termination the algorithm returns an approximate eigenvector (//. However.SEC. we can say that rounding error not only changes the vector Ax to (A + E)x.A||p/sep(Ai. VolII: Eigensystems . we return the vector x because it is almost certainly more accurate. Nonetheless. but it is easy to forget when one is writing in a high-level language like MATLAB. In particular. Strictly speaking. THE POWER AND INVERSE POWER METHODS 65 y = Ax to avoid an extra multiplication by A. where x is the previous iterate. and the code would give an error return when that number was exceeded. From Theorem 1. Thus we can expect to see a steady decrease in the residual until the limiting accuracy is attained. we can expect no greater accuracy in the angle between the computed vector and xi than roughly neM||. x) is an exact eigenpair of A + E.13. if the residual corresponding to Ax is reduced far enough (and that need not be very far).2). but it also changes the residuals corresponding to these vectors. x).18) imply that at each stage the power method is trying to compute the dominant eigenvector of a slightly perturbed matrix. of the order of the rounding unit. 1. where the relative error ||£||p/|| A||p is less than or equal to the convergence criterion e. Thus. our convergence test is based on the pair (/x. The effects of rounding error We must now consider the effects of rounding error on the power method. by Theorem 3. For definiteness we will assume that ||ujb||2 = 1 and ||A||p = 1 in the following discussion. the change is very small. and we cannot expect the power method to compute xi to an accuracy greater than that error. M). in view of the simplicity of the convergence proof for the power method. x). we cannot say the same thing about (//. This parsimonious style of coding may seem natural. This is only n times as large as the error we could expect from simply rounding the matrix [see (2. • In a working implementation one would specify a maximum number of iterations.17) and the bound (1. the norm of A is computed outside the loop. then the computed residual corresponding to(A + E)x will also be reduced. Similarly. This perturbations induces an error in the eigenvector x\.3 we know that (//. x). The limiting accuracy of the power method is therefore quite satisfactory. Moreover. But can the limit be attained? The answer is not easy — surprisingly. The major source of error in the method is the computation of the product AukAn elementary rounding-error analysis shows that for a dense matrix where Equation (1.

and a variant—the inverse power method—is widely used. THE QR ALGORITHM The advantages of the power method are that it requires only matrix-vector multiplications and that it is essentially globally convergent. THE INVERSE POWER METHOD The difficulties with the power method can be overcome provided we can find a spectral enhancement that makes an eigenvalue of interest the dominant eigenvalue of the matrix. For these reasons. However. We begin with the enhancement.66 Assessment of the power method CHAPTER 2. for i > 1 the m approach (A. If we have an approximation K to AI. This spectral enhancement is called the shift-and-invert enhancement. if a real matrix has a complex dominant eigenvalue. 1. then the eigenvalues of the matrix (A — At/)"1 are Now as K.. given a vector x and a shift K we generate a new iterate in the form In principle. Moreover. Thus by choosing K sufficiently near A! we can transform AI into a dominant eigenvalue. —*• AI. we could use Algorithm 1. It has the disadvantages that it only computes the dominant eigenvector. The traditional enhancement for doing this is called the shift-and-invert enhancement. and suppose that Aj is simple. provided the matrix has a single. we can compute a residual almost for free. Specifically. simple dominant eigenvalue.K!)~I. /^i —> oo. . and the convergence can be very slow if the dominant eigenvalue is near in magnitude to the subdominant one. it plays an implicit role in the QR algorithm.1 to carry out the inverse power method. We now turn to that variant. . Shift-and-invert enhancement Let A have eigenvalues AI . and the power method cannot converge. The inverse power method The inverse power method is simply the power method applied to the enhanced matrix (. which are finite quantities. On the other hand. A n in no particular order. then its conjugate is also an eigenvalue with the same absolute value. However. and its combination with the power method is called the inverse power method.. — A) . the power method in its natural form is little used these days..2.4 . and that dominance can be made as large as we like. owing to the nature of the shift-and-invert enhancement.

1.I)y — x in statement 2. 10. and a nonzero starting vector x. a convergence criterion e. It is evident that the major work in the algorithm consists in solving the system (A . 8. returning an approximate eigenpair (//. let Since (A — Kl)y = x. 1.4) Note that as x converges to an eigenvector. a shift K. on the other hand. 9. 2.2 implements this scheme. With dense systems. We will say that this process requires a factorization. the decomposition is usually an LU decomposition computed by Gaussian elimination with partial pivoting.K. It is therefore important that the computation of the decomposition be placed outside the while Vol II: Eigensystems . requires only ra2 flam. 7. Most linear system solvers first compute a decomposition and then use the decomposition to solve the system. and some care must be taken not to inadvertently increase this work.2: The inverse power method o Specifically. we have If we set then the optimal residual for x is (see Theorem 1. the Rayleigh quotient fi = K + p = xHAx converges to the corresponding eigenvalue of A. or). THE POWER AND INVERSE POWER METHODS 67 Given a matrix A. while (true) Solve the system (A-Kl)y = x w = x/\\y\\2 p — x^w 3- % = y/\\y\\2 p. 5. The subsequent solution of the linear system. r = w — px x=x if (||r||2/||A||F < c) leave while fi end while Algorithm 1. 4.SEC. = K + p 6. this algorithm iterates the inverse power method to convergence. Algorithm 1. which costs about ^n3 flam.

4e-16 It is seen that each iteration adds about two significant figures of accuracy to the eigenvalue until the eigenvalue is calculated to working accuracy (£M = 10~16).814e+00 7.6e-09 7. where the concern is with large eigenproblems.2e-09 4.2. 9.2.7e-ll 7.3e-08 3. When A is sparse.3. if not dominant.2e-03 7.3e-16 3.307e+00 -1. they are usually an important.470e+00 2.906e-01 -5. We will return to this point repeatedly in the second half of this volume.399e+00 4.4e-10 1.8e-13 8.820e-02 4. we get the following results.4.7e-05 7.606e+00 (shown only to four figures) has eigenvalues 1.274e+00 8.4e-04 3. The effects of rounding error The nearer the shift K is to an eigenvalue.68 CHAPTER 2.765e+00 -2.298e-01 -3.877e-01 1.5. the matrix 3. the operations counts for computing a decomposition or factorization may be less than O(n3).4e-14 3.487e-01 9.2 were 9.510e+00 2.776e-01 1. then A — K! is nearly singular.4e-10 3. one should avoid software that solves a linear system by invisibly computing a decomposition. The errors in th Rayleigh quotients and the residual norms for several iterations of Algorithm 1.095e-01 4. part of any calculation involving shift-and-invert enhancement.209e+00 -1.079e-01 1. the iteration converges in one iteration. If K approximates AI to working accuracy.158e-01 -1. we took K = 1.5e-07 7. For example.4e-12 3.475e+00 -1. In particular.907e+00 3. and the system .01 and took x to be a normalized random vector. THE QR ALGORITHM loop in Algorithm 1.1e-18 The iteration converges to working accuracy in two iterations.2e-06 3. In the following experiment. This behavior is typical of the inverse power method applied to a well-conditioned eigenpair.4e-15 4.513e+00 -6.699e+00 -2. Nonetheless. An example The convergence of the inverse power method can be stunningly fast when the shift is near the eigenvalue in question.184e+00 2. But if K is near an eigenvalue.833e+00 -3. the faster the inverse power method converges. On the other hand if we take K = 1 + 10~8.022e+00 6.

it produces Rayleigh quotients \i that are generally more accurate than the shift AC. NOTES AND REFERENCES The power method As an explicit method for computing a dominant eigenpair. but most of the inaccuracy is in the direction of the eigenvector being approximated. 1. Yet the method still converges. which accounts for the fact that the Rayleigh quotient iteration in its natural form is not much used. the power method goes back at least to 1913 [182]. Hotelling in his famous paper on variance components [120. the solution will be completely inaccurate. The Rayleigh quotient method We have seen that as we proceed with the inverse power method. which is the subject of the rest of this chapter. If the eigenpair we are looking for is well conditioned.SEC. 1.AC/ changes it only insignificantly. this perturbation in A .1933] used it to compute his numerical results. Vol II: Eigensystems . Gaussian elimination with partial pivoting) is used to solve the system (A — AC/)?/ — x. it converges at least quadratically. dense matrices the Rayleigh quotient method takes O(n3) operations for each iteration. Why? The answer to this question is furnished by the rounding-error analysis of the solution of linear systems (see §1:3.I)y — x in statement 2 will be solved inaccurately. It can be shown that once it starts converging to a simple eigenvalue. in a disguised form it reappears in the QR algorithm. then the computed solution will satisfy where ||£||F/||A||F is of the order of the rounding unit. Since we are only concerned with the direction of the eigenvector. The main difficulty with the Rayleigh quotient method is that each time the shift changes.3.2 The result is called the Rayleigh quotient iteration. one must decompose the matrix A — AC/ to solve the system (A — AC/)?/ = x. For example.1937] gave an extensive analysis of the method. a simulation of a discrete-time discrete-state Markov chain amounts to applying the power method to the matrix of transition probabilities. Thus for full. In fact if AC approximates A to working accuracy..g. Many people who use the power method do so unknowingly. we can improve convergence by replacing AC wit ju. If a stable algorithm (e. this error in y does not introduce any significant errors in the approximate eigenvector.4). THE POWER AND INVERSE POWER METHODS 69 (A — K. To put the matter in another way. when AC is near an eigenvalue. Aitken [2. Since the rate of convergence depends on the accuracy of AC. However. This amounts to adding the statement AC = /z after statement 8 in Algorithm 1. Thus the inverse power method moves toward an approximate eigenpair that is for practical purposes indistinguishable from the exact eigenpair. the vector y is indeed computed inaccurately.

working at the National Physical Laboratory in England. 278.g.262. pp. Shift of origin Wilkinson used shifting to good effect at the National Physical Laboratory. p. The first part is due to Wilkinson—e. 570-599] and [295]. 299.. 141] The result for Hermitian matrices in a more general form is due to Powell [214]. which allows one to specify the relative sizes of the elements in the backward error. The Rayleigh quotient The Rayleigh quotient is due to Lord Rayleigh (J.g. since seldom will we have the wherewithal to determine the accuracy of the current approximation to an eigenvalue or eigenvector. The inverse power and Rayleigh quotient methods The inverse power method is due to Wielandt [294. see [299. who defined it for symmetric matrices. the power method was one of the standard methods for finding eigenvalues and eigenvectors on a digital computer.3. 95. which says that a small residual implies a small backward error. The failure of the backward perturbation in Theorem 1. p. Even in this case one can apply a very general theorem of Oettli and Prager [185]. THE QR ALGORITHM For a time. before the advent of the QR algorithm. p. whose application was optimization. The . W.70 CHAPTER 2. Wilkinson.3 to respect the zero structure of a sparse matrix is a drawback only if the eigenpair in question is especially sensitive to perturbations in the zero elements. 172]. Theorem 1. For further discussion of this matter see any general textbook on numerical linear algebra—e. The justification of a convergence criterion by appeal to a backward error result is part of the paradigm of backward rounding-error analysis. 1944]. Some of its many generalizations will appear throughout this volume. provides a justification for tests based on the size of a suitably scaled residual.. The shift was actually entered by machine operators in binary by switches at the console before each iteration [300. A particularly important problem was to deflate eigenvalues as they are found so that they do not get in the way of finding smaller eigenvalues. Residuals and backward error The use of a residual to determine convergence is inevitable in most problems. honed this tool to as fine an edge as its limitations allow. Strutt). 300]. For more see The Algebraic Eigenvalue Problem [300. [62. The fact that the Rayleigh quotient gives an optimal residual is due to Wilkinson [300. Whether this result can be generalized to Hermitian matrices is an open question. Ipsen [124] gives a survey of the method and its properties. 577]. along with an extensive list of references.

for A: = 0. see Algorithm 2. 1. is the algorithm worth considering? Vol II: Eigensy stems .190. It is important to use the Rayleigh quotient to compute the optimal residual in Algorithm 1. the residual can never converge to zero. §4. to get from one iterate to the next — i. The matrices produced by the algorithm (2. THE EXPLICITLY SHIFTED QR ALGORITHM The QR algorithm is an iterative method for reducing a matrix A to triangular form by unitary similarity transformations. The Rayleigh quotient iteration has been analyzed extensively in a series of papers by Ostrowski [188. pp.192]. 2.SEC. The method is called the explicitly shifted QR algorithm because the shift is explicitly subtracted from the diagonal of A and added back at the end of the process.. 2.1. 619-622]). this fact alone is not sufficient reason for performing this sequence of operations.2. to perform one QR step— subtract the shift from the diagonal. THE EXPLICITLY SHIFTED QR ALGORITHM 71 method is chiefly used to compute eigenvectors of matrices whose eigenvalues can be approximated by other means (e.189. Chapter 3).. In that case. The analysis given here.g. end for k In other words.However.191. The use of the Rayleigh quotient to compute the optimal residual mitigates this problem. For more see [96] and [124]. where Q is orthogonal and R is upper triangular 4. which shows that the inaccuracies do no harm. An obvious reason is that if we use the constant shift AC. compute a QR factorization. Factor Ak—Kkl — QkRk. For additional results on the symmetric case. A more subtle reason is the behavior of the residual when the matrix is strongly nonnormal—that is..6-4. the first residual will generally be extremely small and then will increase in size before decreasing again. 2. The fact that A — AC/ is ill conditioned when AC is near an eigenvalue of A makesT certain that the vector in the inverse power method will be inaccurately calculated. Let AQ = A. when the off-diagonal part of its Schur form dominates the diagonal part [111]. Its shifted form can be written simply—and cryptically—as follows. multiply the factors in reverse order. then... is due to Wilkinson (e. Choose a shift AC^ 3. Why. Ak+i = RkQk+KkI 5.6.e.. see [206. and add back the shift to the diagonal.1) satisfy Thus Ak+i is unitarily similar to Ak.9]. see [300.g.2.

This relation allows us to investigate the convergence of the algorithm.1. THE QR ALGORITHM The answer is that the algorithm is an elegant way of implementing the inverse power method with shift K^. For brevity we will drop the iteration subscripts and denote the current matrix by A and the next QR iterate by A. . however. These relationships go far toward explaining the remarkable effectiveness of the algorithm. then g in (2. The QR choice The above reduction is not an effective computational procedure because it requires that one be given eigenvectors of A. The remaining subsections are devoted to the practical implementation of the algorithm.ftq11 (see Theorem 1. we obtain an alternate proof of Schur's theorem—one in which the reduction is accomplished from the bottom up by left eigenvectors.4). 2. If. The QR algorithm implicitly chooses g to be a vector produced by the inverse power method with shift K. THE QR ALGORITHM AND THE INVERSE POWER METHOD In this subsection we will show how the QR algorithm is related to the inverse power method.72 CHAPTER 2. then It follows that Thus we have deflated the problem. At the same time it is a variant of the power method. A step of the QR algorithm can be regarded as an approximation to such a deflation. and the next two subsections are devoted to laying them out in detail. B. If we repeat the procedure inductively on B. q) is a left eigenpair of A. Schur from the bottom up In establishing the existence of Schur decompositions (Theorem 1. let Q — (Q* q) be unitary and write If (A. Chapter 1) we showed that if we have knowledge of an eigenvector we could deflate the corresponding eigenvalue of the matrix by a unitary similarity transformation. etc.2) will be small because ||#||2 is the norm of the optimal residual qu A . Specifically.12. with the difference that the deflation occurs at the bottom of the matrix. we choose q to be an approximate eigenvector of A.

e n ) is an approximate left eigenpair whose residual norm is \\g\\2. Partition A in the form Then (//. is the Rayleigh quotient eT A e n -^-"n • Error bounds To show that the QR algorithm converges to a deflated matrix. then we can expect q to be a good approximation to a left eigenvector of A and hence #H in (2. let us rewrite the QR factorization of the shifted matrix A — AC/ in the form where because of the triangularity of R we have The last row of (2. This shift is called the Rayleigh quotient shift because fj. We begin by giving a bound that relates the norms of g and g in successive iterates.2) to be small.5) shows that the last column of the matrix Q generated by the QR algorithm is the result of the inverse power method with shift K applied to the vector ej. The key to the derivation is to work with a partitioned form of the relation of A — AC/ = QR.SEC. we must show that the vector g in (2. Hence and Vol II: Eigensystems . Since Q is unitary. THE EXPLICITLY SHIFTED QR ALGORITHM 73 To see this. If AC is close to an eigenvalue of A.6) converges to zero. Hence Equation (2. the norms of its last row and last column must be one. The above analysis also suggests a choice of shift. Specifically. let us write and also write the relation A — AC/ = RQ in the form From these partitions we proceed in stages. \i should approximate an eigenvalue of A and is a natural choice for a shift.Hence if g is small. 2.AC/) = rnnej.4) is qH(A .

12) in the form This bound can be used to show that if go is sufficiently small and //o is sufficiently near a simple eigenvalue A of A then the iteration with Rayleigh quotient shift converges in the sense that (/jt -> 0 and fik -> A. . We will assume that we have constants T? and a such that Since the A* are orthogonally similar. Hence from (2.K).9) and (2.11) we get This is our final bound. We will therefore assume that the iteration converges and determine the rate of convergence.13) by .74 CHAPTER 2.8) we find that #H = peH. If we use the fact that Q is unitary to rewrite (2.7). we can simply take rj = ||A2||2.7) in the form then we find that p = ftth + x((j. However. the proof is tedious and not particularly informative. THE QR ALGORITHM Next we want to show that e is small when g is small. We may now replace (2. The fact that ak can be bounded is a nontrivial corollary of the formal proof of convergences. from (2. If we assume that S is nonsingular and set then We now need a bound on p. Computing the (2. Rates of convergence Let us restore our iteration subscripts and write the bound (2.1)-block of (2.10) and the fact that Finally. Hence from (2. we get #H — eH S.10) and (2.

so that our uniform upper bound a no longer exists. For example.2. For more on these points see the notes and references.13). once it sets in. General comments on convergence Our convergence analysis seems to depend on the hypothesis that //^ is converging to a simple eigenvalue A. In addition. the algorithm also Vol II: Eigensystems . it assumes that fj.Q is sufficiently near a simple eigenvalue of A. Subsequent eigenvalues are found with fewer and fewer iterations. However. that is. In practice. THE EXPLICITLY SHIFTED QR ALGORITHM 75 (the second term vanishes because the Rayleigh quotient shift takes Kjt = Hk). 2. If AQ is Hermitian. For example.In particular. is very fast. whatever its multiplicity. the matrices S^1 can be unbounded.We say that \\gk\\2 converges at least quadratically to zero. Quadratic convergence. and the convergence remains cubic.Thus when the QR algorithm converges to a simple eigenvalue. The typical behavior seems to be that the algorithm spends its first few iterations floundering and then homes in on an eigenvalue. the norm of gk+i is bounded by a quantity proportional to the square of the norm of gk.Thus our bound becomes This type of convergence is called cubic.Kjb|||flr/b||2 in (2. The complete convergence analysis is local. the convergence may be only linear. any shift for which \pk . which removes the term 0k\Hk .SEC. if a2?/ = 1 and ||0o||2 = 10"1. In particular. this is true of the Wilkinson shift defined by Algorithm 2. if A is nondefective. We have used the Rayleigh quotient shift in our analysis. and it is blindingly fast. then so are the iterates Ak. THE UNSHIFTED QR ALGORITHM The relation of the QR algorithm to the inverse power method explains its ability to converge swiftly to a particular eigenvalue.) Nonetheless. 2. then Thus four iterations are sufficient to reduce the error by a factor corresponding to the IEEE double-precision rounding unit. if A is multiple. if A is Hermitian.7 below. which we turn to now. If A is defective on the other hand. so that we may replace 77 by \\gk\\2. A is necessarily nondefective. Why this happens is partly explained by the relation of the QR algorithm to the power method. however. (It is a good exercise to see why this happens. we have hk = gk. the convergence of the QR algorithm remains quadratic.Kk\ = O(||0||fc) will give us quadratic convergence. There are no theorems for the global convergence of the QR algorithm applied to a general matrix.

. Rk be the orthogonal and triangular matrices generated by the QR algorithm with shifts KQ.. The reason for this is that the QR algorithm is also related to the power method. The explication of this relation is the subject of this subsection. THE QR ALGORITHM produces approximations to the other eigenvalues.When the algorithm has finished its work. The QR decomposition of a matrix polynomial Each step of the QR algorithm consists of a unitary similarity transformation Ajt+i = Q^AQk. we will need a single transformation that moves us from the initial matrix A = AQ to the final matrix Ak+i . . .This can be accomplished by multiplying the transformations together as we go along.1.Qk and RQj. The result is a matrix satisfying The matrix Qk has another useful characterization Theorem 2. . a process called accumulating the transformations. Kk starting with the matrix A. We begin with a far-reaching characterization of the ArthQR iterate. Let Then Proof. ...76 CHAPTER 2. Let Q0.. we get Hence An obvious induction on the product completes the proof. We have Postmultiplying this relation by Rk-i. albeit more slowly. ...

then we can simply apply the shifted QR algorithm with shifts KI . Then by (2...1 can be interpreted as follows. Let 77 be a polynomial of degree k whose zeros are «i.1 has more than one application. we have Thus the first column of Qk is the normalized result of applying k iterations of the power method to ei..) These formulas give us two ways of evaluating the QR factorization of p( A). the matrices Qk and Rk from the unshifted QR algorithm satisfy Multiplying this relation by ei and recalling that Rk is upper triangular. accumulating the transformations Q j and the product of the triangular matrices Rj. of p(r). For the present.SEC.We can then define p( A) by either of the formulas (It is a good exercise to verify that these forms are equivalent.. If we use the first formula. Kk.. Vol II: Eigensystems .. The result is the QR factorization QkRk ofp(A). Now if the conditions for convergence of the power method are met. we simply formp(A) and calculate the QR decomposition of the result. if we know the zeros AC. we are interested in what it says about the relation of the QR algorithm to the power method. say by Householder triangularization.. the vectors q[ ' approach the dominant eigenvector of A. Let us say that a sequence of QR steps is unshified if the shifts Kk are zero. The QR algorithm and the power method Theorem 2. It follows that if we partition then gk —»• 0 and /^ —> AI. Convergence of the unshifted QR algorithm The relation of the QR algorithm to the power method is a special case of a more general theorem—under appropriate conditions the unshifted QR algorithm actually triangularizes A. However. 2. THE EXPLICITLY SHIFTED QR ALGORITHM Theorem 2.. Kk.15). where AI is the dominant eigenvalue of A.

2. Proof. Let the diagonals of RRAkU be 61. Then Let QkRkbe the QR decomposition of / + REkR 1 .. Write A*IA-fc = I + Ek.. A..78 Theorem 2. and letX = QR be the QR decomposition o f X . 6nt and set Then the triangular factor in the decomposition has positive diagonal elements. Let CHAPTER 2. . then there are diagonal matrices Dk with \Dk\ = I such that QkDk -> Q.. THE QR ALGORITHM where Suppose that X l has anLU decomposition X l = LU. We have Now for i > j the (z..kLA~k approaches the identity matrix.)-element of A fc £A k is lij(\i/Xj)h.Then Since / -f REkR 1 —> /. If Ak has the QR decomposition Ah = Qk&k. Chapter 1 ] that the columns of the Q-factor of X are the Schur vectors of A corresponding to the eigenvalues in the order (2. • We know [see (1. where L is unit lower triangular.16). where Ek -> 0.16) approaches zero — i. which by (2. It follows by the uniqueness of the QR decomposition that Qk = QQkD~\and Four comments on this theorem. by the continuity of the QR factorization Qk —>• /.9). Consequently.e. with ..

THE EXPLICITLY SHIFTED QR ALGORITHM 79 its columns suitably scaled the matrix Q k converges to the orthogonal part of the Schur decomposition of A. Thus under the conditions of Theorem 2. which is what we set out to show. The former decomposition always exists. This phenomenon is called disorder in the eigenvalues. and ^ . write If 4fc)H> £J+i. It is easily verified that the unshifted QR algorithm converges — actually it is stationary in this example — to a matrix whose eigenvalues are not in descending order. but the latter need not.i.5.1. Rounding error will generally cause the eigenvalues to be found in descending order. 2.are zero.2 the unshifted QR algorithm produces a sequence of matrices tending to a triangular matrix.SEC. • We have assumed that the matrix X of right eigenvectors has a QR decomposition and that the matrix X~l of left eigenvectors has an LU decomposition. But the QR algorithm can be very slow in extricating itself from disorder. It follows that the QR iterates Ak = Q^AQk must converge to the triangular factor of the Schur decomposition of A. For example. then so that the matrix X that orders the eigenvalues in descending order. the product R(AkLA~k)(AkU) has the form This product is block upper triangular. and has no LU decomposition. if A — diag(0.2. It should be kept in mind that disorder is a creature of exact computations. and hence its Q-factor has the form Vol II: Eigensystems .0). Specifically. • We can get further information about the behavior of the unshifted algorithm by partitioning the matrices involved in the proof of Theorem 2.

80 CHAPTER 2. and even when the conditions of Theorem 2. This reduction may not be much. However. we see that the most slowly converging elements are l\ ^ and i\^ t-. where T is the upper triangular factor of the Schur decomposition of A. THE QR ALGORITHM NowA f c = Q^AQk = Q%QHAQQk = Q^TQk. suppose that the shifted algorithm is converging to an eigenvalue A.18) at the same rate as these matrices approach zero.17) are exactly zero. Specifically. which converge to zero as \\i/\i_i\k and \Xi+i/\i\k respectively. by considering a suitable partitioning of the matrices in question. while the other elements of the submatrix Ak[i:n. we are in effect applying the unshifted algorithm to the matrix A — K&/. HESSENBERG FORM We now turn to the practical implementation of the QR algorithm. the convergence can be slow.2. and by continuity Ak will tend toward a matrix of the form (2. The first point to note is that one step of the algorithm requires the computation of a QR decomposition . Now in the unshifted algorithm these matrices tend to zero. The foregoing does not make an impressive case for the unshifted QR algorithm._i |. the (i. L\^ l. whose diagonal blocks—which themselves may not converge — contain the equimodular eigenvalues. This. • Theorem 2.18) occurs only whenilL' . | \i+i/ \i \}k to zero. It follows that i./ A. As the shifts «& converge. Nonetheless. the QR iterates Ak tend to a block triangular form. in fact. l:i] converge to zero. is what happens in practice. Hence we may expect some reduction in the subdiagonal elements of A other than those in the last row. i)-element of Ak converges to A. on which we must perform many iterations. Its convergence is unreliable. its cumulative effect may be significant. and i\^ i in r (2. the proof of the theorem can be adapted to show that if there are groups of eigenvalues of equal magnitude.e. It follows that: Under the hypotheses of Theorem 2.3. the results on the unshifted algorithm give us insight into how the shifted algorithm behaves. Ak decouples at its ith diagonal element. but for a large matrix. so that the last row is going swiftly to zero.2 are satisfied.2 explicitly assumes that the eigenvalues of A are distinct and have distinct absolute values.. 2. By inspection. The block triangularization in (2. The rate of convergence is at least as fast as the approach of max{ | A.

The product of a Householder transformation and a matrix A can be cheaply computed. which overwrites A with HA. a QR step for a Hessenberg matrix requires only 0(n 2 ) operations. Since at least one QR step is needed to compute a single eigenvalue. Moreover. This equation suggests the following algorithm.1 in this series). a matrix of the form illustrated below forn = 6: (This representation. Specifically.SEC. Vol II: Eigensystems . the operation count to find all the eigenvalues is O(n4). in which an 0 stands for a zero element and an X stands for an element that may be nonzero. we work with ever smaller submatrices.) Such an operation count is far too high for the algorithm to be practical. emphasizing the complex case. This brings the total operation count down to an acceptable 0(n3). in §1:4. is upper Hessenberg form'. THE EXPLICITLY SHIFTED QR ALGORITHM 81 of Ak. Real Householder transformations have been widely treated in the literature and in textbooks (in particular. The cure to this problem is to first reduce the matrix A to a form in which the algorithm is cheaper to apply. A Householder transformation or elementary reflector is a matrix of the form where It is easy to see that a Householder transformation is Hermitian and unitary.} As we shall see later. that is. it turns out. (This is true even if we take advantage of the fact that as the matrix deflates.The computation of this decomposition requires 0(n3) floating-point operations. Here we will give a brief treatment of the basics. 2. Householder transformations The reduction to Hessenberg form is traditionally effected by a class of transformations called Householder transformations. a QR step preserves Hessenberg form. The natural form. is called a Wilkinson diagram.

This observation leads to Algorithm 2. The condition that the vector a have norm one and that its first component be nonnegative is rather restrictive. to reduce a general vector a to a multiple of ei we can replace a by where p is a scalar of absolute value one that makes the first component of a nonnegative. We begin with the special case of a vector. THE QR ALGORITHM If A is rax n. the algorithm returns a row vector u such that a(I .WWH such that . Let Then The proof of this theorem is purely computational and is left as an exercise.UHU) — i/ef.1)element changed to — 1.1. However.uu . and so on.82 CHAPTER 2. • If a = 0. • If a is of order n and we replace the index 1 in housegen with n. Since we generally represent row vectors as aH. which generates a Householder transformation that reduces a vector a to a multiple of ei. Five comments. we obtain a new function that produces a Householder transformation H = I . • There is no danger of cancellation in statement 11 because the preceding statement forces u[l] to be nonnegative. Let a be a vector such that \\a\\2 = 1 and a\ is real and nonnegative. • If a is a row vector. this algorithm requires about 2mn flam. Householder transformations are useful because they can introduce zeros into a matrix. which makes H the identity with its (1. W H . we will always invoke housegen in the form In this case our Householder transformation is J . the routine returns u = \/2. A similar algorithm will compute the product AH. as usual. Theorem 23.

and the generation of Householder transformations for real vectors is usually done somewhat differently (see Algorithm 1:4. Nonetheless. w. 3. fi 5. Reduction to Hessenberg form We have already mentioned that the principal use of Householder transformations is to introduce zeros into a matrix.1.3.u/v/ttpl 13. Let A be of order n and partition A in the form Let HI be a Householder transformation such that Vol II: Eigensystems . p= 1 9. 2. see Algorithm 1. return .SEC. • Whenaisreal. 2. in this volume we will regard housegen as a generic program. if(«[l]/0) 6.1). However. We will illustrate the technique with the reduction by unitary similarities of a matrix to Hessenberg form. end housemen Algorithm 2. u — (p/v)*u 11. applicable to both real and complex vectors.1: Generation of Householder transformations o For an example of housegen2 applied to a row vector. in this case such constructions as p = «[l]/|u[l]| are unnecessarily elaborate. else 8. «[!] = ! + «[!] 12. end if 10. p = «[l]/|ti[l]| 7. 1. Chapter 5. if (v = 0) tt[l] = \/2. « . i/ = — pv 14. t>) w=a ^=N|2 4. housegen(a. soisw. THE EXPLICITLY SHIFTED QR ALGORITHM 83 This algorithm takes a vector a and produces a vector u that generates a Householder transformation H = I — WWH such that Ha = I/GI.

then Thus we have annihilated all the elements in the first column below the first subdiagonal. Let Hk be a Householder transformation such that A If we set Hk = diag(/A.. . Figure 2. THE QR ALGORITHM Figure 2. JYjt). Hk-i such that where AH is a Hessenberg matrix of order k — 1. The elements with hats are the ones used to generate the Householder transformation that . .1 illustrates the course of the reduction on a matrix of order five. H I ) . . For the general step. suppose that we have generated Householder transformations HI .84 CHAPTER 2. .1: Reduction to upper Hessenberg form o If we set HI = diag(l. then which reduces the elements below the subdiagonal of the fcth column to zero.

13. k+l:n] = H[k+l:n. k+l:n] .SEC. H[k+l.n] = e n .U*VH ^[^+2:71. 1. 5. Q[:. end for fc 19.k] = u Premultiply the transformation VE = uH*H[k+l:n. u = Q[k+l:n.e n _i 4.Q) Reduce A H =A for k = I to ra-2 Generate the transformation housegen(H[k+l:n. 9.H.W*VH 17.l 14.k+l:n] .2 t o l b y . Q[k+l:n. 12.n-l]=. end hessreduce Algorithm 2.^ = 0 Postmultiply the transformation v = H[l:n. 1.k+l:n] H[k+l:n. k+l:n] 16. The reduced matrix is returned in the matrix H. k+l:n] = H[l:n. VH = «H*Q[fc+l:n.k] = ek 18.Q[k+l:n. and the transformations are accumulated in the matrix Q. 6. Ar]. 10. k]) Q[k+l:n. forfc = n . hessreduce( A. k+l:n] . 8.V*UH end k Accumulate the transformations Q[:.2: Reduction to upper Hessenberg form o VolII: Eigensystems . u.k] 15. 2.Q[:.k+l:n]*u H[l:n. 2. k+l:n] . THE EXPLICITLY SHIFTED QR ALGORITHM 85 This algorithm reduces a matrix A of order n to Hessenberg form by Householder transformations. 3. 11.

k+l:n]. This has the advantage that the reduction and the accumulation can be done in a single loop. the flam actually represents a floating-point addition and multiplication on the machine in question. the term flam is an abbreviation for "floating-point addition and multiplication. THE QR ALGORITHM produces the next matrix. There are several comments to make about it. and a complex multiplication requires two additions and four multiplications." When A is real.86 CHAPTER 2. Note that after three (i. however. In this implementation we have saved the generating vectors in Q itself and initialize Q to the identity column by column as the loop regresses. additions and multiplications are composite operations. we will have to apply the subsequent Hf. only affects Q[k+l:n. Thus in complex arithmetic the flam is . However. so that the application of Hj. fc+1. • The matrix Q is the product H\H2 — -Jfn-2. • An operation count for the algorithm can be obtained as follows. Thus the count for the premultiplication is about The postmultiplication at the k stage requires about 2n(n—k) flam. When A is complex. a complex addition requires two ordinary floating-point additions.. Hence: The* /^r\f»rotif^n fnimf fr\r A Irmnthm 1 O ic flam for the redcatiof of a flam for the accumulatiof oif Q • As mentioned above. Specifically. which gives a total of n3 flam.e. An alternative is to save the vectors generating the Householder transformations and accumulate them after the reduction in the order #l(---(#n-4(#n-3(#n-2g)))--')- It is easy to see that this order fills up Q from the southeast to the northwest. At the Arth stage the premultiplication requires about (n-fc) 2 floating-point additions and multiplications (flams) to compute v and a like number to update H. n—2) transformations. n]. to the array Q[l:n. Algorithm 2. the reduction is complete.The natural way to accumulate the transformations is to set Q = /. since HI is almost full. The operation count for the accumulation of Q is the same as for the premultiplication. The generation of the transformations is negligible.2 implements the reduction to upper Hessenberg form. and then accumulate the transformations in the order (•••(((g#i)#2)#3)--0#n-2.

. where ||E\\ is of the order of the rounding unit compared to \\A\\. In the sequel we will report operation counts in terms of the arithmetic operations—real or complex—in the algorithm and leave the conversion to machine operations to the reader. Then we say that the sequence (2.Set P = Rm-i'"Ri and Q = Si~-Sm-i. For later reference..SEC. THE EXPLICITLY SHIFTED QR ALGORITHM 87 equivalent to four ordinary flams. • The algorithm generates H in situ. We will use it later in deriving the implicit shift version of the QR algorithm. the first column of the orthogonal kmatrix ! generated by Algo rithm 2. This fact can be verified directly from the algorithm. A^.• • #n-2 • Then there is a matrix E satisfvine such that the computed Hessenberg matrix H satisfies Here 7n is a slowly growing function of n. If we replace H by A.. the computed Hessenberg form is the exact Hessenberg form of A + E. 2. • The algorithm is backward stable. Am be generated by the recursion where Rk and Sk are unitary matrices generated from Ak. For the implications of this result see the discussion following Theorem 1. • An extremely important observation is the following. In other words.22) is STABLE IN THE USUAL SENSE if there is a matrix E and a slowly growing function 7m such that Vol II: Eigensystems . Let the sequence AI. and the operation counts above must be adjusted by a factor of four.3. Here fl denotes floatingpoint computation with rounding unit eM and includes any errors made in generating and applying Rk and Sk. Specifically. Definition 2. then the algorithm overwrites A with its Hessenberg form. The stability result just cited is typical of many algorithms that proceed by a sequence of orthogonal transformations. Most packages choose this option to save storage. we make the following definition.2 is e1.4.. let Hk be the exact Householder transformation generated from the computed Ak and let Q = HI .

but it is more efficient to use another class of transformation—the plane rotations. If we set v = ^/\a\2 + |6|2 and then (note that c is real) We can apply plane rotations to a matrix to introduce zeros into the matrix. Plane rotations can be used to introduce zeros into a 2-vector. j)-plane is a matrix of the form . Plane rotations CHAPTER 2. A plane rotation (also called a Givens rotation) is a matrix of the form where The transformation is called a rotation because in the real case its action on a two vector x is to rotate x clockwise through the angle 9 = cos"1 c.88 and Am = P(Al+E)Q. a plane rotation in the (t. let a and 6 be given with a ^ 0. Specifically. Specifically. The proof is constructive and involves the use of 2x2 unitary transformations. THE QR ALGORITHM We have mentioned that a QR step leaves a Hessenberg matrix invariant. We could easily use Householder transformations for this purpose.

a rotation in the (i. The definition we have chosen makes c real. Four comments on this algorithm. They can also be simplified when a and 6 are real. • A simpler and in many ways more natural definition of c and s is which makes the final value of a equal to v. let X be a matrix and let If X and Y are partitioned by rows.j)-plane combines rows i and j and leaves the others undisturbed. • The constant r defined in statement 9 is a scaling factor designed to avoid overflows in statement 10 while rendering underflows harmless. then Thus premultiplication by a rotation in the (i. which saves work when we apply the rotation to a matrix.3 generates a plane rotation satisfying (2. If a is zero. Because our pseudocode can alter the arguments of routines this means that if a and 6 are elements of a matrix then those elements are reset to the values they would assume if the rotation were applied. 2. Since the application of a rotation to a matrix alters only two rows of the matrix. Postmultiplication by P^ affects the columns in a similar way. the routine returns the rotation that exchanges two components of a vector.SEC. . j)-plane is an identity matrix in which a plane rotation has been embedded in the submatrix corresponding to rows and columns i and jTo see the effect of a rotation in the (i. then the routine returns the identity rotation.24). • For the sake of exposition we have stayed close to the formulas (2. These formulas can be considerably economized (for example. we could replace r by | Re(a)| + | Im(a)| + | Re(a)| +1 Im(a)|. and we can introduce a zero anywhere in the it\\ or jth column. Algorithm 2.)-plane on a matrix.23). • The final values of the application of the rotation to a and 6 are returned in a and 6. thus avoiding some multiplications and a square root in statement 9). a simple routine to combine two vectors will serve to implement the general application VolII: Eigensystems .. By an appropriate choice of c and s we can introduce a zero anywhere we want in the zth or jth row. • If b is zero. THE EXPLICITLY SHIFTED QR ALGORITHM 89 In other words.

The cosine of the rotation is real. rotapp(c. rotgen(a. The program overwrites a with its final value and b with zero. s = 1. ^ = a/\a\ 9r = H+l&l 10.4: Application of a plane rotation o . c = 1. c=|a|/i/ 12. 3. 5 = /j. 6 = 0 return 7. 2. end if 8. v = rHVIa/rp+lfc/rl 2 11. end if 5. end rot gen Algorithm 2. 5. s = 0. 1. c. c = 0. If(a = 0) 6. return 4.90 CHAPTER 2.y] £ = c*x-f5*2/ y = c*y—s*x ar = * end rotapp Algorithm 2. THE QR ALGORITHM The following algorithm generates a plane rotation satisfying (2. 6.x. if (6 = 0) 3.*b/v 13. 4.s.24) from the quantities a and b. a = z/*// 14. a = 6.3: Generation of a plane rotation o This algorithm takes a plane rotation P specified by c and s and two vectors x and y and overwrites them with 1. s) 2. 6= 0 15.

3.g. PX :rotapp(c.X[:. Thus our transformed pair of rows is 1 2 3 4 5 X 0 0 X X X X 0 X X In this example we have actually lost a zero element. In some algorithms.X[:. 4. in which case the count for each pair of elements is 8 flmlt -f 4 fladd. t]. leaving context to decide the actual count. In designing algorithms like this.. In particular.X[j. A pair of O's remains a pair of O's (column 3).-s. Begin by selecting two rows of the matrix (or two columns): 1 2 3 4 5 X X 0 X X 0 X 0 0 X Then after the transformation the situation is as follows.s.']. • The routine rotapp has many applications. plane rotations are frequently used to move a nonzero element around in a matrix by successively annihilating it in one position and letting it pop up elsewhere—a process called bulge chasing. we must multiply real rotations by complex vectors.j]) • The routine rotapp uses a vector t to contain intermediate values. X [ : . 2. j ] ) XPR:rotapp(c.:}) XP : wtapp(c. In the real case the count becomes 4 flmlt + 2 fladd.SEC. We will often refer to these counts collectively as a flrot. it is useful to think of the transformations as a game played on a Wilkinson diagram. Here are some comments on the routine. A practical implementation would do something (e. j)plane defined by c and s.4 is such a routine.s. 2. the cost of combining two components of x and y is 12 flmlt + 8 fladd. Transformations that can introduce a zero into a matrix also have the power to destroy zeros that are already there. Vol II: Eigensystems . Then the following table gives the calling sequences for four distinct applications of P to a matrix X. The element of your choice becomes zero (X in column 2). X[:. A mixed pair becomes a pair of X's (columns 4 and 5).X(i.']} pKX:wtapp(c. Here are the rules for one step of the game. Suppose that P is a rotation in the (i. -s. A pair of X's remains a pair of X's (column 1). write out the operations in scalar form) to insure that a storage allocator is not invoked every time rotapp is invoked.X{i. THE EXPLICITLY SHIFTED QR ALGORITHM 91 of plane rotations. Algorithm 2.i].^X(j. • Because the cosine is real. 1.

TQ = TP^ • • • P*J_i. The QR step is divided into two parts.. then Q is the Q-factor of H — i. The second step is to form H . we consider only the unshifted step. Note that if <2H = -Pn-i. This step is illustrated by the Wilkinson diagrams in Figure 2. The point to observe is that the postmultiplication by P^-+1 introduces a nonzero in the (i+1. n » so that H is the result of the single.. we will use plane rotations to implement the basic QR step. • The application of plane rotations dominates the calculation.. the resul.e. t)-element but leaves the other zero elements undisturbed. Let H be the upper Hessenberg matrix in question. Since shifting the step consists only of altering the diagonal elements at the beginning and end of the step.92 CHAPTER 2..n • • -Pi2. the unitary Q of the QR algorithm. Specifically. unshifted QR step. after the secon . we determine rotations Pi2.n with Pk.and postmultiplications can be interleaved. In the first part.2. Algorithm 2. THE QR ALGORITHM Figure 2. Thus we have shown that a QR step preserves upper Hessenberg form and requires 0(n 2 ) operations.t is upper Hessenberg.5 implements these two steps.n ' • • Pi^H is upper triangular. this translates into O(n2) operations. It is easy to see that the algorithm requires about n2 flrot. Whether the matrix is real or complex.k+i acting in the (fc. Thus the final matrix H is in Hessenberg form. Figure 2. and we have shown: If a Qr step is performed on an upper GHessenberg matrix. Pn-i.. fc-|-l)-plane so that T = Pn-i. • The pre. Two comments are in order.3 illustrates this postmultiplication step.2: Reduction to triangular form in QR step Invariance of Hessenberg form The reason for reducing a matrix to Hessenberg form before applying the QR algorithm is that the form is invariant under a QR step and the operation count for performing a QR step is 0(n2). To establish these facts.

ck. H[k+l. H[k+l. 7. 2. 4. H[k. k].3: Back multiplication in QR step /s This algorithm applies one unshifted QR step to an upper Hessenberg matrix H. sk. 1. sk. H[l:k+l.SEC. 3. H[l:k+ltk+l]) end for k Algorithm 2. 2. 5. THE EXPLICITLY SHIFTED QR ALGORITHM 93 Figure 2. k+l:n]) end for k fork = lton-l rotapp(ck. for k = 1 ton—1 rotgen(H[k.5: Basic QR step for a Hessenberg matrix o Vol II: Eigensystems .k]. $k) rotapp(ck. k]. 6. k+l:n].

-+! .k+i being followed by the application of P^_1 ^. let us examine the consequences of setting /&. albeit slowly. In this subsection we will work our way to a practical algorithm for reducing an upper Hessenberg matrix H to Schur form.• to zero amounts to making a normwise relative perturbation in A of size e. we will assume that the current value A of A has been computed without error.n-i will tend rapidly to zero.2 suggest that other subdiagonal elements may also tend to zero. then In other words.1 we have reason to expect that when the shifted QR algorithm is applied to an upper Hessenberg matrix H the element /&n.. If we set then Moreover. To simplify the discussion. then the problem deflates.26). At the same time.. to claim this bonus.•+!.2) has been premultiplied we can postmultiply by PJ^. the perturbation E is of a size with the perturbation due to rounding the elements of A.4). 2.. A natural criterion is to choose a small number e and declare /it+i?l negligible provided where A is the matrix we started with.-+!. we must first decide what it means for a subdiagonal element to be negligible. and we can save computations. so that there is an orthogonal matrix Q such that QHAQ — A. The process continues with the generation and application of Pk. THE QR ALGORITHM rotation (P^ in Figure 2. Now setting'/».4. if hi+i^ satisfies (2. THE EXPLICITLY SHIFTED ALGORITHM Algorithm (2. setting /&*_!. the results in §2... • The method is stable in the usual sense (Definition 2.to zero amounts to replacing A by A = A .94 CHAPTER 2./&. Detecting negligible subdiagonals From the convergence theory in §2.5) exists primarily to show the invariance of upper Hessenberg form under a QR step. However. If any subdiagonal element can be regarded as negligible. When e equals the rounding unit £M. If the elements of A are roughly the same . To see why.-ei+ieJ1.• to zero.

we can deflate the problem. however. It is worth observing that we used the Frobenius norm in (2. one can revert to (2. In this case it is important that we set /i. then this is as small a perturbation as we can reasonably expect. If we decide to treat the latter first. 2.>t-| + |/i. so that the remaining eigenvalues are contained in the leading principal submatrix of order three and in the principal submatrix of order four whose elements are labeled 6._i>z to zero only if it is small compared to the nearby elements of the matrix. For more on this point see the discussion following Theorem 1. something the criterion (2.3. we must distinguish two cases. ninth. When the elements of A are not balanced—say when the matrix shows a systematic grading from larger to smaller elements as we pass from the northwest corner to the southeast—the backward error argument for the criterion (2. eighth. and tenth diagonal elements.26) becomes less compelling. n—1)-element and proceeding backward one element at a time.26) only because its unitary invariance made the bound (2.26) does not insure. As the algorithm progresses.27) easy to derive.26). In applying the rotations. subdiagonal elements in other positions may also deflate.SEC. Deflation Once a subdiagonal element has been deemed insignificant. A typical situation is illustrated by the following Wilkinson diagram: Here the seventh. In practice. The usual cure for this problem is to adopt the criterion If by chance |/i.+i)Z-+i| is zero. In the shifted QR algorithm subdiagonal elements will usually deflate beginning with the (n.or oo-norm. the matrix frequently has small eigenvalues that are insensitive to small relative perturbations in its elements. then we need only perform QR steps between rows and columns four and seven. The third subdiagonal element has also become negligible. In such cases. however. In that case we need only transform Vol II: Eigensystems . revealing three eigenvalues at the eighth. and ninth subdiagonal elements have become negligible. The first is when we only wish to find the eigenvalues of A. THE EXPLICITLY SHIFTED QR ALGORITHM 95 size. one can replace \\A\\p by any reasonably balanced norm—say the 1.

7. Back searching In (2. THE QR ALGORITHM This algorithm takes a Hessenberg matrix H of order n and an index i (1 < t < n) and determines indices il. if we wish to compute a Schur decomposition of A. else 12.8.96 CHAPTER 2. the transformations must be premultiplied into the elements labeled 6 and c and postmultiplied into the elements labeled 6 and d. i2 < t such that one of the following two conditions holds. In practice. or the matrix is completely deflated. *2) /7 = i2 = l 3.29) we can determine by inspection the rows—call them il and /2—that define where the QR step is to be applied.6 implements such a loop. end if 11. end backsearch Algorithm 2. and the QR algorithm is finished. 1. On the other hand. il = il-l 13. end while 15. 2. end if 14. 5. In particular. This is usually done by searching back along the subdiagonal from the current value of i2. Negligible subdiagonal elements are set to . 1. It either returns distinct indices il and i'2. backsearch(H'. For details see Algorithm 2.i7-l] = 0 if(il = i2) i2 = il = il-l 8. in which case all the subdiagonal elements of H in rows 1 through I are zero. 1 = il = i2. we must apply the transformations to the entire matrix. we must determine them on the fly by applying whatever criterion we choose to decide if a subdiagonal element is zero. leave while 10.29). in which case the QR step is to be taken between these indices. ^. Algorithm 2. In this case the transformations must also be accumulated into the matrix Q from the initial reductions to Hessenberg form. else 9. in which case the matrix deflates at rows il and 12.6: Finding deflation rows in an upper Hessenberg matrix o the elements labeled b in (2. 1 < il < i2 < I. 4. 6. 2. i7-l] is negligible) #[i7. while (*7>1) if (#[i7. z'7.

if H is real. as H [/2. the Rayleigh-quotient shift is also real and cannot approximate a complex eigenvalue. i2}. The problem simplifies if we work with the shifted matrix The Wilkinson shift then becomes K = A + d. since it gives local quadratic convergence to simple eigenvalues. The natural candidate is the Rayleigh-quotient shift K = H[i2. The procedure is to compute the eigenvalues from the characteristic polynomial using the usual formula for the roots of a quadratic equation. The cure is to use the Wilkinson shift. The characteristic eauation of B — dl is The roots of this equation are To avoid cancellation we should not compute the smallest root directly from this formula. the shift approaches the Rayleigh-quotient shift so that we retain the quadratic convergence of the latter. i2]. THE EXPLICITLY SHIFTED QR ALGORITHM zero as they are detected. Instead we should compute the largest root Amax first and then compute the smallest root A^n from the relation AminAmax = —q. i2— 1] approaches zero. To simplify our notation. we must choose a shift. where A is the smallest eigenvalue of B-dl. note that Vol II: Eigensystems . However. However. The computation of the shift requires the solution of a 2x2 eigenvalue problem. 12] within which to perform a QR step. as the method converges—that is. which is defined as the eigenvalue of the matrix that is nearest H[i2. The Wilkinson shift 97 Once we have found a range [il. Moreover. some care must be taken to insure the stability of the resulting algorithm. To determine the largest root. This shift can move the QR iteration into the complex plane.SEC. 2. we will consider the matrix Thus we want the eigenvalue of B nearest to d.

Algorithm 2. provided an appropriate deflation strategy is used. end wilkshift Algorithm 2. c.1 says that once hqr starts converging to a simple eigenvalue. Algorithm 2. THE QR ALGORITHM B a =( 5) \c dj wilkshifi returns the eigenvalue of B that is nearest d. • The convergence theory developed in §2.98 Given the elements of the matrix CHAPTER 2. 5 = |o| + |6| + |c| + |d| 4. K) 2.4). K=d 3.(d/s)) 8. it will converge quadratically. if (5 = 0) return fi 5.8 computes the Schur form of an upper Hessenberg matrix by the QR algorithm. Here are some comments on the algorithm. which is cheaper to compute.30) is positive. Unfortunately. • The routine hqr is stable in the usual sense (Definition 2. if(g^O) 7. there is no theory . end if 12. • The routine hqr combined with hessreduce (Algorithm 2.7 implements the computation described above.2) gives an algorithm for computing the Schur form of a general matrix. r = -y/p2 + g 9. To avoid overflow. Reduction to Schur form We can now assemble the algorithmic fragments developed previously into an algorithm for the unitary triangularization of an upper Hessenberg matrix. 1. 9 = (b/s)*(c/s) 6. d.s*(<jf/(p+r)) 11. AC = AC . the algorithm scales B so that its largest element is one in magnitude. if (Re(p)* Re(r) + Im(»* Im(r) < 0) r = -r fi 10. 6. p = 0. This scaling factor can be replaced by the sum of the absolute values of the real and imaginary parts of the elements of B. wilkshift(a.5*((a/s) .7: Computation of the Wilkinson shift o Consequently we want to choose the sign ± so that the middle term in (2.

i2] = ^[i2. il = 1.i7]~K 13. oldi2 = 12 8. 1. H[i+l. ^T[z. if (iter > maxiter) error return. 2. backsearch(H'.i] = 5r[i. st-. #[z7./2-l]. cj. 19.H" [z. if (i2 = 1) leave fc^rfi 10./2] + « end while end /i<7r Algorithm 2.K) 12. 20. rotapp(ci.i2-l]. /2. t]. if(i2^0W/2)iter=Ofi 11. s. for i = i/toi2-l rotapp(ci. st-. #[/2-l.-.K 16. maxiter) 2. ro?gen( . while (true) 5. wi7jbfc//r(#[*2-l.«+l]) rotapp(ci.i+l:n]) 17. The similarity transformation is accumulated in Q.0 4.z2].i7] = ^T[i7. fT[l:t+l. 24. i2 = n 3. 21.i2]. i]. i+1]) ^[i. iter . The routine gives an error return if more than maxiter iterations are required to deflate the matrix at any eigenvalue. 25.«]. i+1] = H[i+l. i+l:n]. end for i 18. fori = /7toi2-l 14. z7. Q.«]. fi 7. iter — iter+1 6. hqr overwrites it with a unitarily similar triangular matrix whose diagonals are the eigenvalues of H. i+1] . JI[i2. THE EXPLICITLY SHIFTED QR ALGORITHM 99 Given an upper Hessenberg matrix H. end for z #[i2.SEC.*] + « 22. hqr(H. 5t-) 15. 5"[i+l. Q[l:n. 5" [t+1. JT[i2. Q[l:n.8: Schur form of an upper Hessenberg matrix o VolII: Eigensystems . 23. Jf [!:«+!. z'2) 9.

what is wanted is not the Schur vectors but the eigenvectors of A. Given il < 12. If A = QTQU is the Schur decomposition of A and X is the matrix of right eigenvectors of T. the pre. the matrix deflates at the upper end. hqr takes several iterations to deflate the first eigenvalue.5. then the matrix of right eigenvectors of A is QX.31) becomes f fcn3 flrot. however. hqr gives an error return when in the course of computing any one eigenvalue a maximum iteration count is exceeded. we may restrict the range of our transformations to the elements labeled 6 in the Wilkinson diagram (2. Typically. i2] to 2(i2 . In most applications. Eigenvectors We have now seen how to compute the Schur decomposition of a general matrix A. if we partition Q^ . somewhat less to compute the second eigenvalue. we no longer need to accumulate the transformations.29).il)2 flrot. • The operation count is dominated by the application of plane rotations. • To guard against the possibility of nonconvergence. neither of which is known a priori. It should be stressed that the eigenvalue-eigenvector decomposition is not a stable decomposition—in fact. and the upper bound (2. THE QR ALGORITHM that says that the algorithm is globally convergent. This reduces the operation count for a QR step in the range [i7. In some applications a Schur decomposition is all we need. The bound is approximately • If we are only interested in computing the eigenvalues of H. The accumulation of the transformations in Q requires the same amount of work to bring the total to 2ra(i2-i2) flrot. and so on. Still later. Eventually the subdiagonals of H become small enough so that quadratic convergence is seen from the outset. Most real-life implementations do not return immediately but instead try an ad hoc shift to get the algorithm back on track. 3). Moreover. so that il > 0. An overall operation count depends on the number of iterations to convergence and the behavior of the ranges [j7.and postmultiplication of H requires about n(i2 — il) flrot.100 CHAPTER 2. Thus we can compute the eigenvectors of A by computing the eigenvectors of T and transforming them by We have already derived a formula for eigenvectors of a triangular matrix (see p. i'2]. the demand for eigenvectors is so great that all packages have an option for their computation. For more see §2. Nonetheless. We can derive an upper bound by assuming that il = 1 and that it takes k iterations to find an eigenvalue. if the matrix is large enough. Specifically. if A is defective it does not exist. which gives further savings.

The cure Vol II: Eigensystems . 2. x(j] = x x[j}/(T(jJ}-») 4. end for k This is essentially the algorithm we will use to compute eigenvectors. THE EXPLICITLY SHIFTED QR ALGORITHM and Tkk is a simple eigenvalue of T. It follows that if we write the fcth eigenvector in the form (k} then we can obtain x\ ' by solving the upper triangular system To derive an algorithm for solving (2. it has two drawbacks.32). However. x = b 2. All this gives the following algorithm. for j = nto 1 by -1 3. which we further reduce by a recursive application of the same procedure. If we partition this equation in the form then the second row of the partition gives the equation (r . then some diagonal of TU may be equal to Tkk.32). then 101 is an eigenvector of T. First. whence This is a system of order one less than the original.fil)x* + £t* = &*.x[j]*T[l:j-l.//)£ = (3. in which case our algorithm will terminate attempting to divide by zero.SEC. let us drop subscripts and superscripts and consider a general algorithm for solving the equation (T — ///)# = 6. in our final algorithm the shift /^ will be a diagonal of our original matrix T.33) we have (T* . If T has multiple eigenvalues. 1. as in (2. z[l:j-l] = a[l:j-l] . from which we get From the first row of (2.j] 5.

In Algorithm 2. However. it must be ill conditioned. where underflow is a number just above the underflow point. • If there is no rescaling.102 CHAPTER 2. we see that the replacement corresponds to a small perturbation in A.e. the diagonal element is adjusted to keep the divisor d from becoming too small. This implies that if the eigenvector is not ill conditioned then it will be accurately computed.9 below we restrict it to be greater than 6M/^ unless that number is itself too small. • The algorithm is column oriented in the sense that the matrix T is traversed along its columns. Here are some comments. Each eigenvector Xi satisfies where ||jE". the cost is ^n3 flmlt. • As we mentioned above. The second problem is that in some applications eigenvectors can have elements of widely varying sizes. by projecting the error made back to the original matrix A [see the discussion leading to (2. if scaling is done at every step in the computation of every eigenvector. For this reason the algorithm rescales only when there is a serious danger of of overflow — specifically. if a component of the solution is in danger of overflowing. It is worth remem- . The strategy is a variant of one found in LAPACK. Many people are somewhat nervous about replacing an essentially zero quantity by an arbitrary quantity of an appropriate size. THE QR ALGORITHM for this problem is to not allow T[k. when the component to be computed would be bigger than a number near the overflow point of the computer. For example. the attempt to compute the eigenvectors can result in overflow. In such cases. in which case we replace it with a number whose reciprocal is safely below the overflow point.. This orientation is to be preferred for programming languages like FORTRAN that store arrays by columns. in which smallnum is taken to be ^underflow. For a further discussion of these matters see §1:2. This may result in the underflow of very small components. • The algorithm is stable in the following sense. This algorithm is to be preferred in languages like C that store their arrays by rows. &]—// to be too small. the algorithm has an operation count of |n3flam. we are free to scale the solution. Since simple eigenvectors are determined only up to a scalar factor. which can be adapted to compute right eigenvectors. however. There is a row-oriented version of the basic algorithm for solving triangular systems. we may scale the vector so that the offending component has modulus one. which is larger than the cost of the unsealed algorithm.3. the results will generally be satisfactory.27)]. This means that for the procedure to introduce great inaccuracies in the computed eigenvector. if underflows are set to zero. the eigenvector itself must be sensitive to small perturbations in A—i. Algorithm 2. In particular. • The cost of rescaling can be comparatively large.9 computes the right eigenvectors of a triangular matrix.||/||TJ| < *y€M for some modest constant 7.

k] 15. 1.k] = X[l:k. X [ l : k . 2. s = \d\/\X{j. for k — n to 1 by -1 5.k]\\2 20. d = T\j. X) 2.k] 11. end for k 21. if (\d\ < dmin) d = dmin fi 12.X ( j . X\j. end if 16. smallnum} 9. end righteigvec Algorithm 2. X[l:k. k} .9: Right eigenvectors of an upper triangular matrix o Vol II: Eigensystems . *]*T[l:j-l. j] 18.j]-T[k.k] 6. forj = fc-ltolby-1 10. this routine returns an upper triangular matrix X of the right eigenvectors of X.k]/\\X[l:k.k] = Q 8. righteigvec(T. k ] = 3*X[l:k. k] = X[l:j-l.k]\ 14. X[kjk] = l 7. X[l:k-l. k] \.k] = X\j. endforj 19.k] = -T[l:k-l. bignum = a number near the overflow point 4.SEC. if(\X[j.k]\/bignum>\d\) 13. X[l:j-l. The columns of X are normalized to have 2-norm one. smallnum = a small number safely above the underflow point 3. dmin = max{ eM * | T [k. THE EXPLICITLY SHIFTED QR ALGORITHM 103 Given an upper triangular matrix T.k]/d 17. X[k+l:n.

• Although we have chosen to compute the entire upper triangular array of eigenvectors. A detailed treatment of these topics would lengthen this chapter beyond reasonable . BITS AND PIECES The algorithm sketched in the last subsection is an effective method for finding the eigenvalues (and eigenvectors.5. provides an easy diagnostic for routines that purport to compute eigenvectors: compare the scaled residual against the rounding unit. there are modifications and extensions that can improve the performance of the algorithm. Specifically if we partition T as usual in the form and assume that Tkk is a simple eigenvector of T.Thus we can also find left eigenvectors by solving triangular systems. If X is the matrix of right eigenvectors of a diagonal matrix. Unfortunately. This fact. • It follows from (2. But this will not work in cases where we want only a few left and right eigenvectors.104 CHAPTER 2. then FH = X~l is the matrix of left eigenvectors. the methods are not numerically equivalent. the resulting matrix may not be the inverse of the matrix of right eigenvectors — we say that the right and left eigenvectors have lost biorthogonality. 2. if desired) of a general complex matrix. Thus we can also obtain the left eigenvectors of a matrix by inverting the matrix of right eigenvectors. which continues to hold for the eigenvectors of the original matrix. there is an alternative. If we compute the left eigenvectors separately. essentially the same algorithm could be used to compute a few selected eigenvectors. If biorthogonality is required. A similar algorithm can be derived to compute left eigenvectors.34) that the residual T{ — Txi — taxi satisfies Thus the norm of the scaled residual for each computed eigenvector is near the rounding unit. then is a left eigenvector of T corresponding to Tkk. However. then directly inverting the matrix of right eigenvectors will give better results. However. so that as a whole the eigenvalueeigenvector decomposition is not necessarily stable. THE QR ALGORITHM bering that Ei is different for each eigenvector zt-.

even with the Wilkinson shift. If that fails one can either try again or give an error return. In such an event the natural action is to give an error return. and balancing. where and Vol II: Eigensystems . i—l)-element of H. The LAPACK routine CGEEV takes the shift to be | Re(/^. Specifically. An alternative is to restart the iteration with a new shift. deflation at consecutive small subdiagonal elements. Suppose we begin the QR step at row i. The problem usually manifests itself by the algorithm exceeding a preset limit on the number of iterations (see statement 6 in Algorithm 2. 2. where i is the row where the problem is supposed to deflate. For purposes of illustration we will assume that the elements in this diagram are real and that d is positive. After the reduction to triangular form. sorting for eigenvalues.SEC. we will give a brief survey of the following topics: the ad hoc shift. consider the Wilkinson diagram where the upper-left-hand x represents the (i—2._i)| + | Re(/ij_i >2 _2)|. It does not make much sense to try more than one or two ad hoc shifts. Ad hoc shifts Convergence of the Hessenberg QR algorithm is not guaranteed. Aggressive deflation If two consecutive subdiagonals of H are small but neither is small enough to permit deflation. we can sometimes start the QR step at the diagonal to the right of the first small element. The subsection concludes with a discussion of graded matrices. Instead. we have the following diagram. There is little general theory to guide the choice of ad hoc shifts (which is why they are called ad hoc). commonly referred to as an ad hoc shift. THE EXPLICITLY SHIFTED QR ALGORITHM 105 bounds.8).

the computed value of the element sci = ei^/1' is €i €2/d. THE QR ALGORITHM The postmultiplication by the rotations does not change elements in the second column. we may be able to start the QR step even when e\ or €2 are not small enough to deflate the matrix. and so on until it fails to isolate an eigenvalue. and the computed value of the element ccj will simply be €1 —unchange from the original. Now suppose that €2 is so small that ft(d2+€2) = d2. It takes at most n2 comparisons to isolate such a row. Thus for c\ and e2 sufficiently small.106 CHAPTER 2. Note that it is the square of ei and the product €162 that says whether we can start a QR step. . it is possible that some eigenvalues are available almost for free. this matrix can be permuted so that it has the form We can then repeat the process on the leading principal matrix of order ra-1. Consequently. if it exists. Moreover. The result is a permutation of A that has the form where T is upper triangular. we can ignore it. At an additional cost of O(n2} operations. Then the computed value of c will be one. For example. if the product £162 is so small that it is negligible by whatever deflation criterion is in force. Since the product of two small numbers is much smaller than either. if A has the form then the single nonzero element in the second row is an eigenvalue. Cheap eigenvalues If the original matrix A is highly structured. we can assume that the matrix is deflated at the element ej —at least for the purpose of starting a QR step.

if it has. 2. Properly implemented the process requires only 0(n2) operations per eigenvalue.SEC. the form 107 then it can be permuted to the form Proceeding as above. In this case balance can often be restored by the following procedure. the process is usually made an option in matrix packages. Because of its cheapness. they are accidental. Balancing Many applications generate matrices whose elements differ greatly in size. for example. In others. we can permute A into the form where S is upper triangular. THE EXPLICITLY SHIFTED QR ALGORITHM Turning now to A. say. Let p ^ 0. We proceed by balancing the elements of a matrix one row and column at a time. caused. Thus we have deflated the problem at both ends. by a poor choice of units of measurement. and let Then (for n = 4) If we now require that p be chosen so that the first row and column are balanced in the sense that Vol II: Eigensystems . In some cases these disparities are a a consequence of the structure of the underlying problem. the same as for a single QR step. however. Let us consider the case of the first row and column.

0515156210790460e-16 -1. However. the method can be shown to converge to a matrix in which all the rows and columns are balanced. For example. We therefore say that A is graded downward by diagonals. Graded matrices Loosely speaking. at times it can make things worse. cycling back to the first if necessary. a matrix is graded if it shows a systematic decrease or increase in its elements as one proceeds from one end of the matrix to the other.1034150572761617e-33 1. The balancing procedure is relatively inexpensive. For example.108 then we find that CHAPTER 2. It turns out that the QR algorithm usually works well with matrices that are graded downward by diagonals. and so on. Although the balancing of one row and column undoes the balancing of the others. Moreover. the third. If we set then is graded downward by rows.5940000000000001e+00 .while is graded downward by columns. For this reason. any package that incorporates balancing should include an option to turn it off. a very loose convergence criterion will be sufficient to effectively balance the matrix. Balancing often improves the accuracy of computed eigenvalues and eigenvectors. Each step of the iteration requires only 0(n) operations. the computed eigenvalues of A are 5. we can go on to balance the second. THE QR ALGORITHM Having balanced the first row and column. the matrix exhibits a symmetric grading from large to small as we proceed along the diagonals.

1e-01. from which it is seen that the smallest eigenvalue has no accuracy at all.5940000000000005e+00 The relative errors in these eigenvalues are 3.4e-01.OOOOOOOOOOOOOOOOe+00 This display shows that the first column of the Hessenberg form is the same as the one obtained by first setting the element 1. 6. 2.3e+00.Oe+00. The computed eigenvalues of D~1AD have relative errors of l. Vol n: Eigensystems . reverse the rows and columns of A to get the matrix which is graded upward by diagonals.1190000000000004e-17 7. For example. and the largest is correct to working precision. we get the following results. On the other hand. consider the first column of B and the corresponding first column of the Hessenberg form of B: 6.Oe+00.0515156534681920e-16 -1.3e+00. Row and column grading can also cause the algorithm to fail. inspection of special cases will often show that the algorithm causes a loss of important information by combining large and small elements.6859999999999994e-33 6. 3.2900000000000000e-24 -7.8e-16. 2. This is a small error compared to the norm of B. However. but the smallest eigenvalue of B is quite sensitive to it. There is no formal analysis of why the algorithm succeeds or fails on a graded matrix.3e-16. the second smallest is accurate to only eight decimal digits. 2. O.6e+00. In particular the relative errors in the eigenvalues of the altered matrix B are 1. For example.290e-24 to zero. O. 3.1e-08.SEC. while the computed eigenvalues of DAD~l have relative errors of 1.6859999999999994e-33 1.4e-16. 1. If we compute the eigenvalues of B. the algorithm does not do as well with other forms of grading.1189999999999992e-17 O. THE EXPLICITLY SHIFTED QR ALGORITHM 109 which are almost fully accurate.6859999999999994e-33 1. The two smaller eigenvalues of both matrices have no accuracy. 2.Oe+16.

) Rutishauser [224. When dealing with a graDED MATRIX ARRANGE THE GERsding downward BY TOWS COLUMNS.1954]. The result was what is known as the LR algorithm. Rutishauser consolidated and extended the recurrences in his qd-algorithm [222.1961] and Francis [76.1892] gave determinantal expressions in terms of the coefficients of (2. the LU algorithm is numerically unstable. However.110 CHAPTER 2.37) for poles beyond the radius of convergence. (For a systematic treatment of this development see [122].Later Hadamard [106. and therefore anyone employing it should take a hard look at the results to see if they make sense.1884]. NOTES AND REFERENCES Historical The lineage of the QR algorithm can be traced back to a theorem of Konig [ 151. balancing B has no effect.. 223. He generalized the process to a full matrix and showed that under certain conditions the iterates converged to an upper triangular matrix whose diagonals were the eigenvalues of the original matrix. THE QR ALGORITHM Balancing can help in this case. Aitken [1. .1958]. 2. OR DDIGONALS AND BALANCE THE MATRRIX. For example. Later Rutishauser introduced a shift to speed convergence [225. Thus the algorithm preceded the decomposition.1926] independently derived these expressions and gave recurrences for their computation. For more historical material see Parlett's survey [204] and the paper [73] by Fernando and Parlett. On the other hand. Except for special cases.1961] independently proposed substituting the stable QR decomposition for the LU decomposition. Francis went beyond a simple substitution and produced the algorithm that we use today.1955] then showed that if the numbers from one stage of the qdalgorithm were arranged in a tridiagonal matrix T then the numbers for the next stage could be found by computing the LU factorization of T and multiplying the factors in reverse order.6. which states that if the function defined by the power series has one simple pole r at its radius of convergence then r — lim^oo ak/ak+i. I have heard that the Q was originally O for "orthogonal" and was changed to Q to avoid confusion with zero. This suggests the following rule of thumb. The name of the QR decomposition comes from the fact that in Francis's notation the basic step was to decompose the matrix A into the product QR of an orthogonal matrix and an upper triangular matrix. since it cannot change the direction of the grading. like positive definite matrices. balancing DAD l and D 1AD •estores the lost accuracy. We do not claim that this strategy is foolproof. Kublanovskaya[154. complete with a preliminary reduction to Hessenberg form and the implicit double shift for real matrices.

although the first explicit statement of the relation is in a textbook by Stewart [253. the iteration converges quadratically to a nondefective multiple eigenvalue.5 for notes and references on Householder transformations and plane rotations The treatment here differs from that in many textbooks in its emphasis on com plex transformations. who worked out the numerical details. as was the convergence result in Theorem 2. Ch. 173]). The stability analysis is due to Wilkinson. is due to Francis [76].SEC. mutatis mutandis. Watkins and Eisner [292] have given a very general convergence theory that holds for both the QR and LR algorithms. which is so important to the implicitly double shifted algorithm. The case for nonorthogonal reduction. this result is implied by work of Ostrowski on the Rayleigh-quotient method [193]. QL. p. His general approach to backward error analysis of orthogonal transformations is given in his Algebraic Eigenvalue Problem [300] and is the basis for most of the rounding-error results cited in this and the next chapter.2.1. The QR algorithm. Wilkinson and W. Local convergence The local convergence results are adapted from [264]. and LQ. The proof given here is due to Wilkinson [301]. As mentioned in the text. Theorem 2. Householder transformations and plane rotations See §1:4. [300. H. 8]. which we may represent symbolically by QR. those of the RQ and the LQ. Chapter 3. For more on complex Householder transformations see [159]. the inverse power method.3. Kahan (see [206. The theory of the QR algorithm. For the special case of the Rayleigh-quotient shift. 2. THE EXPLICITLY SHIFTED QR ALGORITHM Variants of the QR algorithm 111 There are four factorizations of a matrix into a unitary and a triangular matrix. is not as compelling as the case for GausVol II: Eigensystems . p. see §1. The shifted variants of the QR and QL algorithms exhibit fast convergence in the last and first rows. It is possible to use nonorthogonal transformations to reduce a symmetric matrix to tridiagonal form—a process that bears the same relation to Givens' and Householder's method as Gaussian elimination bears to orthogonal triangularization. no general analysis seems to have appeared in the literature. in the first and last columns. For more details on the QL algorithm. 352]. Hessenberg form The reduction of general matrix to Hessenberg form is due to Householder [123] and Wilkinson [297]. §§29-30. where R and L stand for upper and lower triangular. and the power method The relation of the QR algorithm to the inverse power method was known to J. however. RQ. which was originally established using determinants.1. applies to the other three variants. however. The relation to the power method was known from the first.

His talk was punctuated by phrases like. The observation that the QR step preserves Hessenberg form is due to Francis [76]. "Here is what I do. 107) is due to Osborne [187] and was used. sparsely commented. Its use in the general QR algorithm is traditional. He ended by saying. . §6.. and difficult to read.28)..8—cannot be regarded as a polished implementation. Aggressive deflation The idea of deflating when there are two consecutive small subdiagonal elements is due to Francis [76].112 CHAPTER 2. The Hessenberg QR algorithm As this subsection makes clear. the implementation of the QR algorithm is a large task. For further details. However. THE QR ALGORITHM sian elimination." and described the relative criterion (2. "How small is small?" He began. I would rather be on your side. which works well with sparse matrices. in the Handbook [208] and in subsequent packages. Preprocessing The balancing algorithm (p. as do their EISPACK translations into FORTRAN [81].8-19]. where Jim Wilkinson presented the QR algorithm. In addition. The Handbook ALGOL codes [305] are well documented but show their age. "When a subdiagonal element becomes sufficiently small. Watkins [289] has shown that the shift is accurately transmitted through the small elements." After one of the lectures I asked him. many people felt that such small elements could retard the convergence of the implicitly shifted QR algorithm (to be discussed in the next section)." So it remains to this day. The computational advantages of this procedure are obvious. For another scaling strategy. see [50]. there is little to do but go to the better packages and look at the code.. "But if you want to argue about it. For details and an error analysis see [300. In 1965 I attended a series of lectures on numerical analysis at the University of Michigan. . where it insures global convergence of the QR algorithm. The LAPACK [3] codes are state of the art—at least for large matrices—but they are undocumented. and our final result—Algorithm 2. The Wilkinson shift The Wilkinson shift was proposed by Wilkinson [302] for symmetric tridiagonal matrices. along with the strategy for the initial deflation of eigenvalues. Deflation criteria A personal anecdote may illustrate the difficulties in finding satisfactory criteria for declaring a subdiagonal element zero. and the method is not used in the major packages.

they recommend that the matrix be graded upward. Unfortunately. Finally. 3.36). REAL SCHUR FORM In this subsection we derive the real Schur form and present a preliminary algorithm for its computation. Hence in §3. Since the concern is with solving the real eigenproblem: In this section A will be a real matrix of order n. Reinsch. it is generally impossible to avoid complex arithmetic if the actual Schur form is to be computed. each such block containing a conjugate pair of eigenvalues. 3. VolII: Eigensy stems .3 we pull the material together into an algorithm for computing the real Schur form. The adaptation involves a technique called implicit shifting. For a general theory and references see [266]. 3.1. complex arithmetic is quite expensive and to be avoided if at all possible. This matrix is then passed on to a variant of the QR algorithm—the QL algorithm—that also works from the bottom up. Since real matrices can have complex eigenvalues and eigenvectors. and Wilkinson [169]. The following theorem establishes the existence of such decompositions. Real Schur form A real Schur form is a Schur form with 2x2 blocks on its diagonal. in §3. It turns out that these inefficiencies can be removed if the matrix in question is Hessenberg.SEC. because their version of the reduction to Hessenberg form starts at the bottom of the matrix and works up. Contrary to (2. The fact that the direction of the grading must be taken into account was noted for symmetric matrices by Martin. In this section we will show that there is a real equivalent of the Schur form and that the QR algorithm can be adapted to compute it in real arithmetic. THE IMPLICITLY SHIFTED QR ALGORITHM In the previous section we described a variant of the QR algorithm that is suitable for computing the Schur decomposition of a general complex matrix. We begin the section with the real Schur form and an inefficient algorithm for computing it. THE IMPLICITLY SHIFTED QR ALGORITHM Graded matrices 113 There is little in the literature about general graded matrices.2 we establish an important result that enables us to take advantage of the Hessenberg form. which can also be used to solve generalized eigenproblems and compute the singular value decomposition.

its eigenvalues are A and A. Because RLR is similar to L. Let A be real of order n.114 CHAPTER 2. Hence X\ = (y z)R~l. Specifically. 1 . • Three comments.y . Proof. z) be a complex eigenpair with From the relations 2. or By computing the characteristic polynomial of Z.3.1 (Real Schur form). Now let be a QR decomposition of (y z). one can verify that the eigenvalues of L are A and A. The blocks can be made to appear in any order. Chapter 1. The proof is essentially the same as the proof of Theorem 1. The blocks of order one contain the real eigenvalues of A. R is nonsingular. The blocks of order two contain the pairs of complex conjugate eigenvalues of A. we find that Ay = fj. Then (y z] = X\R.12. Chapter 1). that which completes the deflation of the complex conjugate pair A and A. except that we use two real vectors to deflate a complex eigenvalue. Chapter 1.LIZ.y = x + x and Izi — x . THE QR ALGORITHM Theorem 3. Since y and z are independent (see Theorem 1.vz and Az = w 4. let (A. Then there is an orthogonal matrix U such that T = £7T AU is block upper triangular of the form The diagonal blocks ofT are of order one or two. and It follows as in the proof of Theorem 1.or.12.

then H = QHHQ. We begin by noting that the reduction of A to Hessenberg form (Algorithm 2. A preliminary algorithm The QR algorithm can be adapted to compute a real Schur form of a real matrix A in real arithmetic. 3. Thus the QR algorithm with two complex conjugate shifts preserves reality. Unfortunately. But is real. the proof of this theorem is almost constructive.K!). one can reduce a matrix to real Schur form using Householder transformations. it turns out that we can use a remarkable property of Hessenberg matrices to sidestep the formation of A2 .2 Re(/c) A +1«|2/. This strategy of working with complex conjugate Wilkinson shifts is called the Francis double shift strategy. VolII: Eigensy stems . then its complex conjugate K is also a candidate for a shift.Kl)(H . we need a new approach to avoid the introduction of complex elements in our matrix. the very first step of this algorithm—the formation of H2 -2 RQ(K)H + \K\2I—requires 0(n3) operations.2) can be done in real arithmetic. one with shift K and the other with shift K to yield a matrix H. THE IMPLICITLY SHIFTED QR ALGORITHM 115 • A matrix of the form (3. We can even avoid complex arithmetic in the intermediate quantities by forming the matrix H2 . resulting in a real matrix H. We no turn to this property.SEC. Here we will start with an expensive algorithm that points in the right direction. Such subspaces will be the main concern of the second half of this volume. which makes the algorithm as a whole unsatisfactory. if is the QR decomposition of (H . We can then apply the shifted QR algorithm (Algorithm 2.8) in real arithmetic until such time as a complex shift is generated.1. By Theorem 2. Such a space is called an eigenspace or invariant subspace. However.1) in which the blocks are of order one or two is said to be quasi-triangular. Given the necessary eigenvectors. • Like the proof of Schur's theorem. computing its Q-factor Q. Suppose that we apply two steps of the QR algorithm. Hence so are Q and H. and then computing H = QTHQ. At that point. • The relation implies that A maps the column space of X\ into itself.2Re(«)# + W 2 /. It is an instructive exercise to work out the details. The starting point is the observation that if the Wilkinson shift K is complex.

as the following argument suggests. Formally we make the following definition. we have n-1 degrees of freedom left over in Q. the QR algorithm produces a sequence of Hessenberg matrices each of which is similar to the original matrix A. the matrices Q and H are determined by the first column of Q. Definition 3. Chapter 1). given A. Let H be upper Hessenberg of order n. hij is replaced by tiihijtij. This reduction is clearly not unique. It is therefore reasonable to conjecture that.« We are now in a position to give conditions under which a reduction to upper Hessenberg form is essentially unique. and we will say that the reduction is determined up to column scaling ofQ. Second. It turns out that this is essentially true. THE QR ALGORITHM 3. The two reductions are not essentially different. . where \D\ = /. just enough to specify the first column of Q. since Q and Q differ only in the scaling of their columns by factors of modulus one. However. Proof.. Let A be of order n and let H = QHAQ be a unitary reduction of A to upper Hessenberg form. In reducing A to Hessenberg form H by a unitary similarity. First.3 (Implicit Q theorem). For example.. there is a limit to this nonuniqueness. The choice of scaling will also affect the elements of H\ specifically. In either case the rescaling makes no essential difference. If Q = QD. Now a unitary matrix has n(n-l)/2 degrees of freedom (see the discussion preceding Theorem 1.2.116 CHAPTER 2. we must introduce (n-l)(ra-2)/2 zeros into A. Then H is UNREDUCED if fct+i f » ^ 0 ( i = 1. Consider the relation If we partition Q = (q1 »• qn) by columns.5) and remembering that q^qi = 1 and gj1^ — 0. then the first column of this relation is Multiplying (3. THE UNIQUENESS OF HESSENBERG REDUCTION Let A be of order n and let H = Q^AQ be a unitary reduction of A to upper Hessenberg form. Since we must use (n— l)(n-2)/2 of the degrees of freedom to introduce zeros in A. Theorem 3. IfH is unreduced then up to column scaling ofQ the matrices Q and H are uniquely determined by the first column ofQ or the last column ofQ..2. but with two provisos. we must assume that the subdiagonals of H are nonzero. then H = Q^AQ is also a reduction to upper Hessenberg form. we have .12. let H = QEAQ be a unitary reduction of A to Hessenberg form.

VolII: Eigensystems . THE IMPLICIT DOUBLE SHIFT We now return to our preliminary algorithm and show how it may be modified to avoid the expensive computation of H2 — 2Re(«)# -f |«|2/. . The algorithm itself has numerical drawbacks. . qk+i are orthonormal. Since /&&+!.SEC. .. THE IMPLICITLY SHIFTED QR ALGORITHM We then have 117 Since /i2i ^ 0.4) gives the relation Since q\.£ is nonzero..huqi ||.. Since \\qz\\2 = 1 we may take /i2i = H^gi .4) we get To show that Q is uniquely determined by its last column we start with the relation and repeat the above argument on the rows of QH starting with the last. and we may take which determines qk+i up to a factor of modulus one. we have Moreover.. Its proof is actually an orthogonalization algorithm for computing Q and H. the vector Aq\ — h\\qi is nonzero. 3. cancellation in the computation of Aqk — hikqi hkkqk can cause the computed qk+i to deviate from orthogonality with qi. We will begin with a very general strategy and then fill in the details. Finally. Then the fcth column of (3.. from the last column of (3. For example.. q^ have been determined... because it allows the determination of Q implicitly from its first or last column. • The theorem is sometimes called the implicit Q theorem. qk- 3. Aqk — h\kq\ — hkkqk is nonzero. It is called the Arnoldi method and will play an important role in the second half of this volume..3. Now suppose that qi. which determines q-2 uniquely up to a factor of modulus one.

3. Set 0 = 0o0i. 5.3). 4.2 RQ(K)H + |Ac|2/ requires that we first compute the scalars 2 Re(«) and |«|2. The first column of QoQi is proportional to q. We will proceed in stages. if H is unreduced. It turns out that because H is upper Hessenberg we can effect the first calculation in 0(1) operations and the second in O(n2) operations. To do this we do not need to compute K itself. then from the relation c — O-QQGI we have c = aq(°\ Hence q and q^ are proportional to c.2Re(»# + \K\2I then H = QTHQ is the result of applying two steps of the QR algorithm with shifts K and R. 1. For it is easily verified that . Now if we partition Q0 — (g(°)0i°^). we must show that H and Q are the matrices that would result from two steps of the QR algorithm with shifts K and K. This is because QI&I — ej [see (2. Getting started The computation of the first column of C = H2 . then Q = QoQi and H = (QoQi)TA(Q0Q1). Determine the first column c of C = H2-2 Re(K)H+\K\2L 2. The following is a general algorithm for simultaneously determining Q and H. To see this. The key computations in (3. THE QR ALGORITHM Let K be a complex Francis shift of H. up to scaling. We have seen above that if we compute the Q-factor Q of the matrix H2 .118 A general strategy CHAPTER 2. By construction the matrix (QoQi )r£H(QoQi) is Hessenberg. Hence by the implicit Q theorem (Theorem 3. 1. 2. To establish that this algorithm works.6) are the computation of the first column of C and the reduction of Q^HQi to Hessenberg form. 3. Thefirstcol umn of the reducing transformation is proportional to q. Use Householder transformations to reduce H\ t upper Hessenberg form H. The first column of QQ and the first column of Q are. the same. Call the accumulated transformations Q\.21)]. Let QQ be a Householder transformation such that Q^c = cre\. We now turn to the details. Setjy 1 =0H40o. partition the QR factorization of C in the form Then c = pq.

In practice. THE IMPLICITLY SHIFTED QR ALGORITHM and 119 Because H is upper Hessenberg. Note that we may assume that h21 is nonzero. only the first three components of the first column of H2 are nonzero. we can ignore the multiplicative factor h^\ in the definition of c. which does not affect the final Householder transformation. Then Thus the formulas in Algorithm 3. the QR step will not be applied to the entire matrix. Hence the scaling factor s in statement 2 is well defined. but between certain rows and columns of H.8) has real eigenvalues A and u. Specifically. see the notes and references. They are given by Thus the first three components of the first column of c are given by Inserting the definition (3. in effect. The cure is to scale before calculating c. 3. For this reason we do not invoke the algorithm with the matrix H. VolII: Eigensystems . when the problem deflates. multiplies c by the scaling factor. Algorithm 3.SEC. This.8) of d.7) of t and (3.1 generate a vector u that starts the iteration with shifts of A and p. suppose that the matrix (3.7) and (3.1 calculates the vector u generating the Householder transformation to start a doubly shifted QR step. An important consequence of our substitution of t for 2 RC(K) and d for \K\2 is that our algorithm works even if the Francis double shifts are real. For more on this. The particular form of the formulas has been chosen to avoid certain computational failures involving small off-diagonal elements. for otherwise the problem would deflate. we obtain the following formula for c: Since two proportional vectors determine the same Householder transformation. The formulas for the components of h^c involve products of the elements of H and are subject to overflow or underflow. but with the specific elements of H used to generate c.

we get the matrix .\hnln\. hnln. 7. h22. 9.hll (p*q-hnnl*hnln)/h21 + h!2\ r-p-q c h32 J housegen(c. postmultiplication by RQ operates on only the first three columns of H so that after the complete transformation has been applied. v] end startqrstep 4. 3. Then premultiplication by RQ acts only on the first three rows of H. h!2 = s*h!2.1. ( 8. w.120 CHAPTER 2.|/z27|.|/z32|. 6. hnln = s*hnln.1: Starting an implicit QR double shift o Reduction back to Hessenberg form Let RQ denote the Householder transformation corresponding to the vector u returned by Algorithm 3. hnlnl.\hnn\} hll = s*/*77. h!2. hnn — s*hnn p = hnn — hll q = hnlnl . so that RoH has the form \ / Similarly. h22 = s*h22\ h32 = s*h32.|/z72|. hnlnl = s*hnln].hll r = h22 . h32. h21 = s*/*27.2Re(«)# + \K\2I to a multiple of e^ 1.|/i22|. THE QR ALGORITHM This program generates a Householder transformation that reduces the first column of C = H2 .\hnnl\. h21. \hnlnl\. startqr2step(hll. hnn. Algorithm 3. hnnl = s*hnnl. u) s = l/max{|/z77|. 5. 2. hnnl.

This can be done with a 2x2 Householder transformation or a plane rotation RA: Note that the transpose in the above diagram is necessary if R3 is a rotation.2 implements a double implicit QR shift between rows and columns Vol II: Eigensystems . since there is only one element to annihilate. In the usual reduction to Hessenberg form we would annihilate all the elements below the first subdiagonal. and we can use a Householder transformation RI that acts only on rows and columns two through four. 3. which is symmetric. it makes no difference when #3 is a Householder transformation. But here only two of the elements are nonzero.SEC. THE IMPLICITLY SHIFTED QR ALGORITHM 121 We must now reduce this matrix to Hessenberg form. Algorithm 3. The result of applying this transformation is We now continue with a transformation R^ that annihilates the hatted elements in this matrix to give Yet another transformation R3 annihilates the hatted elements in the third column: The last transformation is a special case.

H[l:iu. VT = uJ-*H[i:i+2. s.122 CHAPTER 2. *:»+2] = Q[l:n. /2]) 17. H[i2. <^[l:n. Q[l:n. rotapp(c. J7[i"2. 3. rotgen(H[i2-l. nw) fi 12. z:i+2]*w 10. w.j] = H[i+2J] = 0. 5. Q) 2. rotapp(c.j:n] = H[i:i+2. z'2} 7. J3"[l:/2. i2-l]. end qr2step Algorithm 3.i'2-l:n]. i7.i2-2]. Q[l:«. The transformations are applied between rows il and 12 and are accumulated in a matrix Q. -y = Q[l:rz. fi 13. qr2step(H. 1. o+2]-v*ttT 9. for i = il to /2-2 3.ff[l:iw. this program applies utoH and then reduces H to Hessenberg form.1 (startqrstep). #[/2-l.j:n] . i:z+2]-v*wT 11.U*VT 6. c.*:*+2]*u 8. THE QR ALGORITHM Given an upper Hessenberg matrix H and the vector u from Algorithm 3. if (t ^ i7) ff[i+l.i2-l:n]) 16. i]. J5T[l:i2. 5) 15. iu = min{i-f 3. w.2: Doubly shifted QR step o . i2. v = . if (z / /2—2) /i0ws0gew(. i:i+2] = H[l:iu. /2-1]. rotapp(c. 12-2].j:n] 5.H"[z'+l:z'+3. H[i:i+2. /2]) 18.z7} 4. j = maxjz—I. end for 2 14.

SEC.8) requires k'n3 complex flrots. where k' is the number of iterations to find an eigenvalue. the subdiagonal elements other than /i n _i >n _2 and /in-i. with the result that: Algorithm 3. If the eigenvalues are real and nondefective. A 2 x 2 block may require additional processing. THE IMPLICITLY SHIFTED QR ALGORITHM 123 il and i2 of H. For more on this topic. • We have already observed that the routine qr2step requires 6n2 (12—il) flam.4 reduces a real Hessenberg matrix to real Schur form.2 requires 6n(i2—il) flam. if it contains real eigenvalues.3 implements a backsearch for deflation. we must allow for the deflation of a 2x2 block at the bottom of the matrix. It differs from Algorithm 2. Deflation The behavior of the doubly shifted QR algorithm once asymptotic convergence has set in depends on the eigenvalues of the matrix which determines the shift. As usual. it should be rotated into upper triangular form so that the eigenvalues are exhibited on the diagonal.n rnay show a slow convergence to zero so that we must allow for deflation in the middle of the matrix. see the notes and references. 3. both the /i n _ 1>n _ 2 and /in-i. the explicitly shifted QR algorithm (Algorithm 2. It is easy to do an operation count on this algorithm. Computing the real Schur form Algorithm 3. then the total operation count is On the other hand. At the very least. In either case. In terms of real arithmetic this is Vol II: Eigensystems .n converge quadratically to zero. then /i n _i )n _2 converges quadratically to zero. even when they are not small enough to cause a deflation. we can start a QR step at a point where two consecutive subdiagonals are small. If we assume that il = 1 throughout the reduction and that it requires k iterations to compute an eigenvalue. If the eigenvalues are complex and nondefective. Algorithm 3.6 in recognizing deflation of both 1x1 and 2x2 blocks. The coding is a bit intricate. As with the singly shifted algorithm. There are several comments to be made about the algorithm. since there are special cases at both ends of the loop on i. It helps to think of i as pointing to the row containing the subdiagonal at which the current Householder transformation starts.

THE QR ALGORITHM This algorithm takes a Hessenberg matrix H of order n and an index t (1 < t < n) and determines indices il.3: Finding deflation rows in a real upper Hessenberg matrix . in which case the matrix deflates at rows il and il. In processing 2x2 blocks all transformations are accumulated in Q. 2.124 CHAPTER 2. 1. Algorithm 3. in which case no two consecutive subdiagonal elements of H in rows 1 through t are nonzero. i2 < t such that one of the following two conditions hold. 1 = il — i2. 1 < il < i2 < I.

4: Real Schur form of a real upper Hessenberg matrix Vol II: Eigensystems . hqr overwrites it with a orthogonally similar quasi-triangular matrix. Algorithm 3. The similarity transformation is accumulated in Q. The routine gives an error return if more than maxiter iterations are required to deflate the matrix at any eigenvalue. 3. THE IMPLICITLY SHIFTED QR ALGORITHM [25 Given a real upper Hessenberg matrix H.SEC.

We begin with the case of a real eigenvalue. on the other hand. .9) we have whence £22 = -723/(r22 .4 are both backward stable that they produce essentially the same matrices. • It should not be thought that because Algorithms 2. works in the complex field. Rather than present a complete program. • Like the single shift algorithm. We will assume that the eigenvalues of a diagonal block are distinct from those of the other diagonal blocks. Suppose T has the form If we seek the eigenvector corresponding to A = r^ in the form (x j f 2 1) .126 CHAPTER 2. one would expect that k < k'. and the imaginary parts of a pair of complex eigenvalues can drift away from conjugacy — very far away if the eigenvalue is ill conditioned. The method is a generalization of the back-substitution process of Algorithm 2. THE QR ALGORITHM Even if k — k'. It will be sufficient to illustrate the procedure on a quasi-triangular matrix T consisting of three blocks. For example. This is yet another argument for computing the real Schur form. the doubly shifted algorithm produces a matrix whose complex eigenvalues occur in conjugate pairs.4 and transforming them back using the orthogonal transformation Q. we will sketch the important features of the algorithm. then we have the equation rp (The reason for putting A to the right of the eigenvector will become clear a little later. the double shift algorithm is numerically stable in the usual sense.A). The singly shifted algorithm. This is just the usual back-substitution algorithm.9. In fact. Thus the problem of finding the eigenvectors of the original matrix A is reduced to computing the eigenvectors of a quasi-triangular matrix T. Eigenvectors The eigenvectors of the original matrix A can be found by computing the eigenvectors of the Schur form T produced by Algorithm 3. the superiority of the double shift algorithm is clear.) From the second row of (3.8 and 3. since the real Schur form is real. which would necessarily be lengthy. since each iteration of the double shift algorithm is equivalent to two steps of the single shift algorithm.

If the row in question has a scalar on the diagonal. The system is nonsingular because by hypothesis A is not an eigenvalue of T\\. We already have two examples of such block eigenvectors in (3.2) and (3. then the process reduces to the usual back-substitution. if the diagonal block is 2x2.3). On the other hand. From the third row of (3. For a general quasi-triangular matrix T the back-substitution proceeds row by row as above. From the second row of (3. THE IMPLICITLY SHIFTED QR ALGORITHM From the first row of (3.9) we have 127 Hence we may obtain x\ as the solution of the linear system (T\\ — \I}xi = —tis — £2*1 • Here we have deviated from the usual back-substitution algorithm: the two components of the eigenvector in x\ are obtained by solving a 2x2 linear system. We begin by letting X$ be nonsingular but otherwise arbitrary. The case of a complex conjugate pair of eigenvalues is sufficiently well illustrated by the matrix We will look for an "eigenvector" (later we will call it an eigenbasis) in the form for some matrix L of order 2. then two components of the eigenvector are obtained by solving a system of order 2. From the third row of (3.10).10) we have Vol II: Eigensystems . 3. we have from which it follows that Since L is similar to T^ it contains the complex conjugate pair of eigenvalues of T^.SEC.10) we have Hence we may compute zj asthesolutionofthe2x2systemorj(r22-i) = -t^A'ss. This system is nonsingular because T22 is not an eigenvalue of L.

for a complex eigenvalue A they write out the relation Ax = Ax and backsolve as in the real case but compute the real and imaginary parts explicitly in real arithmetic. A natural choice is X$ — I—the generalization of the choice £3 = 1 for an ordinary eigenvector. which is implemented in the LAPACK routine xHSEQR. There is also a multishift variant (see below). it is better to take X$ = (y^ 23). we have presented only row-oriented versions of the algorithms. Moreover. Chapter 1). which we will use to compute eigenvalues of a symmetric tridiagonal matrix. Arguing as in the proof of Theorem 3.128 CHAPTER 2. where x3 = y3 + ^3 is the right eigenvector of TSS. This sketch leaves open many implementation details—in particular. However. The major packages adopt a different approach that is equivalent to our second choice of T^. if actual eigenvectors are desired. how to solve the linear systems and how to scale to prevent overflow.4. The principal advantage of this choice is that it does not require any computation to form L. . Specifically. In the next section we will show how to use implicit shifting to solve the generalized eigenvalue problem and in the next chapter will apply it to the singular value decomposition. THE QR ALGORITHM This is a Sylvester equation. from the real Schur form to the implicit double shift. which we can solve for AI because 1 n and L have no eigenvalues in common (see Theorem 1. We now turn to the choice of X3. It is easy to cast them into columnoriented form in the spirit of Algorithm 2. each row of the matrix yielding either a 2 x 2 system or a Sylvester equation for the corresponding part of X.16. there is a single shift variant. For larger matrices the process continues.If we then compute X as described above and partition This relation implies that y ± iz are the right eigenvectors of T corresponding to the eigenvalues // ± ip. The idea of an implicit shift has consequences extending beyond the computation of a real Schur form.2)]. which is simply ^33. we have where fj. are due to Francis [76]. 3.9..1 [see (3. ± iv are the eigenvalues of Tys. Although we have focused on the double shift version. NOTES AND REFERENCES Implicit shifting The ideas in this section. Thus our choice generates the real and imaginary parts of the desired eigenvector.

The resulting algorithm is called the QZ algorithm. Byers. and the purpose of this section is to describe it. THE GENERALIZED EIGENVALUE PROBLEM Processing 2x2 blocks 129 When a 2 x 2 block has converged. It also transforms a block with complex eigenvalues into one in which the diagonal elements are equal. the result is a sequence of small. proposed by Bai and Demmel [9] and implemented in the LAPACK routine xHSEQR. The multishifted QR algorithm There is no need to confine ourselves to two shifts in the implicitly shifted QR algorithm. 4. We can also use more than two shifts. suppose that we have chosen m shifts KI. Vol II: Eigensystems . the problem has many features not shared with the ordinary eigenvalue problem. For example. adjacent bulges being chased down the matrix. The LAPACK routine XLANV2 does this. when we treat the symmetric eigenvalue problem.Kml) . we will use single shifts. which will have m + 1 leading nonzero components. which is chased down out of the matrix in a generalization of Algorithm 3. a generalized eigenvalue problem can have infinite eigenvalues. This algorithm. it should be split if it has real eigenvalues. . the QR algorithm can be adapted to compute the latter. The problem reduces to the ordinary eigenvalue problem when B = I—which is why it is called a generalized eigenvalue problem. the generalized eigenvalue problem has the equivalents of a Hessenberg form and a Schur form. Unfortunately.. The paper also contains an extensive bibliography on the subject. say. 4. and Mathias [29]. For example. THE GENERALIZED EIGENVALUE PROBLEM The generalized eigenvalue problem is that of finding nontrivial solutions of the equation where A and B are of order n. We compute the first column c of (A . the bulge from a 2 x 2 shift is chased two steps down the diagonal then another double shift can be started after it.• • (A «i/). Km. In spite of the differences between the two problems.SEC.2. The shifts can be computed as the eigenvalues of the trailing mxm submatrix. An alternative is to note that if. If this process is continued. We then determine an (ra+l)x(ra+l) Householder transformation QQ that reduces c to ei and then apply it to A. Specifically. Moreover. The matrix QoAQo has an (m+l)x(m+l) bulge below its diagonal.. This algorithm and a new deflation procedure is treated by Braman. can take advantage of level-3 BLAS—the larger m the greater the advantage. Watkins [290] has shown that if the number of shifts is great they are not transmitted accurately through the bulge chasing process.. Although the generalization results from the trivial replacement of an identity matrix by an arbitrary matrix B.

\B) if 1. We then turn to the details of the QZ algorithm. B) ify ^ 0 and yHA = R Xy B. For example. the same. if (A. B) (also written A . The scalar A is called an EIGENVALUE of the pencil and the vector x is its EIGENVECTOR. we only sketch the outlines. 4. the direction of x is not necessarily preserved under multiplication by A or B. The reader is referred to the notes and references for more extensive treatments. As with the ordinary eigenvalue problem. called Hessenberg-triangular form. x ^ 0. Regular pencils and infinite eigenvalues An important difference between the generalized eigenvalue problem and the ordinary eigenvalue problem is that the matrix B in the former can be singular whereas the matrix / in the latter cannot. Definitions We begin with the definition of generalized eigenvalues and eigenvectors. then A maps x into a multiple of itself—i. 2. Instead. or) is an eigenpair of the pencil (A.2 and the QZ iteration in §4. the QZ algorithm begins by reducing A and B to a simplified form.\B.130 CHAPTER 2. It is important to keep in mind that the geometry of generalized eigenvectors differs from that of ordinary eigenvectors.. This difference has important consequences.3. B). and then iterates to make both pairs triangular. y) is a LEFT EIGENPAIR of the pencil (A. We will treat the reduction to Hessenberg-triangular form in §4. Let A and B be of order n. Ax = \Bx. x) is an EIGENPAIR or RIGHT EIGENPAIR of the PENCIL (A. THE QR ALGORITHM We begin in the next subsection with the theoretical background. If (A. or) is an eigenpair of A. the direction of x remains invariant under multiplication by A. including Schur forms and perturbation theory. Like the QR algorithm. if .e. Definition 4. The pair (A. it is possible for any number A to be an eigenvalue of the pencil A . the direction of both Ax and Bx are. In the first place if B is singular.1.1. eigenpairs of the generalized problem are assumed to be right eigenpairs unless otherwise qualified. The pair (A. This elementary fact will be useful in understanding the material to follow. On the other hand. provided we agree that the direction of the zero vector matches that of any nonzero vector. For reasons of space. THE THEORETICAL BACKGROUND In this subsection we will develop the theory underlying the generalized eigenvalue problem.

To see this. J?) for any A. Equivalence transformations We have already observed that the natural transformations for simplifying an ordinary eigenvalue problem are similarity transformations. In all these examples the determinant of A — XB is identically zero.XB) is a polynomial of degree m < n. However. B) is regular. B). URBV) is obtained from (A. However. infinite eigenvalues do present mathematical and notational problems. if A and B have a common null vector x. B) be a matrix pencil and let U and V be nonsingular. Then the pencil (UHAV. the pencil must have an infinite eigenvalue corresponding to the vector x.XB) is not identically zero. we will appeal to a generalization of the Schur form (Theorem 4. we say that(UH AV.XB) = 0. Definition 4. note that Ax = XBx (x ^ 0) if and only if det(A . B). what is the multiplicity of an infinite eigenvalue? To answer this question. We also say that the pair (A. They preserve the eigenvalues of a matrix and change the eigenvectors in a systematic manner. The effect of a equivalence transformation on the eigenvalues and eigenvectors of a matrix pencil is described in the following theorem. 4. It is therefore natural to regard oo = 1/0 as an eigenvalue of (A. A) and vice versa. It is possible for p(X) to be constant. then /x = A"1 is an eigenvalue of (5. which are eigenvalues of (A. An even more general example is the pencil in which A and B have no null vectors. For example. The possible existence of infinite eigenvalues distinguishes the generalized eigenvalue problem from the ordinary one. B). Hence for any X. B). Thus if p(X) is constant. in common. and zero is an eigenvalue of (J5. B) is REGULAR if det( A . Now if A / 0 is an eigenvalue of a pencil (A.SEC. in which case it appears that the pencil has no eigenvalues. Let (A. Definition 43. But first we must describe the kind of transformations we will use to arrive at this generalized Schur form. A). then (A. THE GENERALIZED EIGENVALUE PROBLEM 131 then Ae2 = Be^ — 0.2. UEBV) is said to be EQUIVALENT to(A. Ae2 = XBe%. Now p(X) = det(A . The corresponding transformations for the generalized eigenvalue problem are called equivalence transformations. A matrix pencil (A. Alternatively. If the pencil is regular. It should be stressed that infinite eigenvalues are just as legitimate as their finite brethren. A regular matrix pencil can have only a finite number of eigenvalues. among others. x) is an eigenpair of (A. then p(X) is not identically zero and hence has m zeros. Vol II: Eigensystems . this can only occur if B is singular. whose proof is left as an exercise. then Bx = 0 for some x ^ 0. If B is singular. This suggests the following definition. More generally. right or left. B) by an EQUIVALENCE TRANSFORMATION.5).

x) and (A. if we restrict ourselves to unitary equivalences. if Bv ^ 0 then it must be proportional to Av. Moreover. Let (A. THE QR ALGORITHM Theorem 4. Specifically. B) be a regular pencil. like the Jordan form. y) be left and right eigenpairs of the regular pencil (A. at least one of the vectors Av and Bv must be nonzero — say Av. For example. Let v be an eigenvector of (A. l)-block of the matrix on the right is zero because by construction Av is orthogonal to the column space of £/j_. so can we simplify a pencil by equivalence transformations. Equivalence transformations also leave the polynomial p(X) . which are well conditioned. and let (v V±) be unitary. the roots of p( A) and their multiplicity are preserved by equivalence transformations. Similarly. this form. Chapter 1). B) is regular. B) to triangular form.B). In fact. Generalized Schur form Just as we can reduce a matrix to simpler forms by similarity transformations. The proof is now completed by an inductive reduction of (A.XB) essentially unaltered. Let u — Av/||Av||2 and let (u U_L) be unitary. we have the following theorem. B) normalized so that || v\\2 = 1. Since the (A.132 CHAPTER 2. Then The (2.4. • We now turn to four consequences of this theorem. Fortunately.det(A . and the transformations that produce it can be ill conditioned (see §1. is not numerically well determined. we can reduce the matrices of a pencil to triangular form. Since U and V are nonsingular. Unfortunately. there is an analogue of the Jordan form for regular pencils called the Weierstrass form. Let (A. Proof.6.5 (Generalized Schur form). . then (A. In particular. Then there are unitary matrices U andV such thatS — UHAV andT = UHBV are upper triangular. Theorem 4. V~lx) and (A.UHBV).IfU and V are nonsingular. the equivalence transformation multiplies the p( A) by a nonzero constant. U~ly) are eigenpairs of (UEAV.

while the latter correspond to the infinite eigenvalues. i. Specifically. On the other hand. B) to appear in any order in a generalized Schur form. Projective representation of eigenvalues The Schur form has enabled us to solve the mathematical problem of defining the multiplicity of generalized eigenvalues—finite or infinite. B) is its multiplicity as a zero of the characteristic equation.. the degree of p^A^ would be too large. If the second eigenvalue in the sequence is finite. Then is the CHARACTERISTIC POLYNOMIAL of (A. B).6. then we say that (A. Thus any generalized Schur form will exhibit the same number of finite and infinite eigenvalues. Then where \u>\ = 1 (a. This argument justifies the following definitions. is the product of the determinants of the unitary transformations that produce the generalized Schur form). it must be an eigenvalue of (A. It also suggests how to put finite and infinite eigenvalues on the same notational footing. B). THE GENERALIZED EIGENVALUE PROBLEM Multiplicity of eigenvalues 133 Let (A. we have from (4.B)bea regular pencil of order n. finite or infinite. B).A B) = 0. Hence exactly m of the ba are nonzero and n—m are zero.7. Now the degree m of p is invariant under equivalence transformations [see (4. Definition 4. the eigenvalues of (A. for otherwise.2) and (4. The former correspond to the finite eigenvalues of (A.SEC. Corollary 4. then it must satisfy det(A . I f m is the degree of the characteristic polynomial and m < n. Another consequence of the above argument is that we can choose the eigenvalues of (A. After the first step of the reduction. Thus we have established the following corollary. In Theorem 4. B) has an INFINITE EIGENVALUE OF ALGEBRAIC MULTIPLICITY n-m. B) be a generalized Schur form of (A.e. B) can be made to appear in any order on the diagonals of(S.3) that where \u\ — 1. if the second eigenvalue is infinite. We say that the ALGEBRAIC MULTIPLICITY of a finite eigenvalue of (A. B) be a regular pencil and let (A. T). 4.5.1)]. we can choose any eigenvalue. B). In either case we can find an eigenvector corresponding to the second eigenvalue. An eigenvalue of algebraic multiplicity one is called a SIMPLE EIGENVALUE. B must be singular. to be in the first position. Let (A. VolII: Eigensystems .

/%. Thus when B is nonsingular.1} is a zero eigenvalue and (1. and it has computational difficulties when B is ill conditioned. In this nomenclature (0. B) be a regular pencil reduced to Schur form. fin) represents the same eigenvalue.-. B) be a pencil with B nonsingular. We are not restricted to shifting B alone.-). Moreover. Theorem 4. .. Then the equation Ax = \Bx can be written in the form B~lAx — \x. To remedy this difficulty we can work with the shifted pencil Ax = jj. this representation is not unique. we can reduce the generalized eigenvalue problem to an ordinary eigenvalue problem—a computationally appealing procedure. The reduction is not possible when B is singular.B}bea regular pencil and let be nonsingular. these eigenpairs have the same multiplicity. we can reduce the shifted problem to the ordinary eigenvalue problem (B + wA)~lAx = \LX.(B + wA)x.8. fin). Generalized shifting Let (-4. Otherwise the eigenvalue is infinite. An ordinary eigenvalue A can be written in the form (A. If /?t-. x). /?). 1). If we can determine w so that B + wA is well conditioned. We can shift A and B simultaneously. Then the eigenvalues of (A.is nonzero. D). We solve this problem by introducing the set We will call this set the projective representation of the eigenvalue. x) is an eigenpairof(A. We can erase the distinction between the two kinds eigenvalues by representing them by the pairs (o^-. as the following theorem shows. B) are determined by the pairs (a. THE QR ALGORITHM Let (A. since any nonzero multiple of the pair (an. whose eigenpairs are (A/(l + wA)..0) is an infinite eigenvalue. Let Then ((a. A = eta/Pa is a finite eigenvalue of A. Let (A. B) if and only if is an eigenpair of (C. This is equivalent to writing the generalized eigenvalue problem in the form Unfortunately.134 CHAPTER 2.

B). as the following theorem shows. Assume without loss of generality that (A. B) is in generalized Schur form with the pair (a. D} is also in Schur form with (7.Nonetheless.5) we may pass to the Schur form and assume that the pencil (A. D).D) back to (A. ((7. For otherwise the vector y consisting of the last n-\ components of y would be a left eigenvector of the pencil (A.0). Likewise. Ifx and y are right and left eigenvectors corresponding to (a. contradicting the simplicity of (a. the right and left eigenvectors of the pencil corresponding to the eigenvalue 1 are ei and 62. Hence t/H£ / 0.<$). Then (C. and it follows from (4. The statement about multiplicities follows from the fact that if an eigenvalue appears m times on the diagonal of (A. B). For example. so that the vector ei is the corresponding eigenvector. ft). ft) in the initial position. (3)W in the initial position. Since W is nonsingular (7. This allows us to calculate the eigenvalue in the form of a Rayleigh quotient yHAx/yHx. Let (a.ei) is an eigenpair of (C.8) that Vol II: Eigensystems . To establish (4. B) then by the nonsingularity of W the transformed eigenvalue appears exactly m times on the diagonal of (C. ft} be a simple eigenvalue of the regular pencil (A. Chapter 1]. then Consequently. ft}.6) ^ (0. there is an generalized form of the Rayleigh quotient. THE GENERALIZED EIGENVALUE PROBLEM 135 Proof. Consequently. Proof.12).6) = (a. B) has the form Then x is a nonzero multiple of ei. • Left and right eigenvectors of a simple eigenvalue We have seen that if A is a simple eigenvalue of a matrix A then its left and right eigenvectors cannot be orthogonal [see (3. B). 4. The left and right eigenvectors of a simple eigenvalue of a pencil can be orthogonal. The converse is shown by using the nonsingularity of W to pass from (C.9. D).SEC. Theorem 4. Now the first component of y is nonzero.

we have for the left eigenvector y From the perturbation theory of inverses and Theorem 3. since yH A is proportional to y^B. since B plays the role of the identity in the Rayleigh quotient for the ordinary eigenvalue problem. Let be the perturbed pencil. This theorem will also prove important a little later when we consider the perturbation of eigenvalues of matrix pencils.5) and the regularity of the pencil. we must have Ax = 0. we can assume that B is nonsingular. we can write the ordinary form of the eigenvalue as This expression is a natural generalization of the Rayleigh quotient. x) as e —» 0. B) that converges to the eigenpair ((a. ft). Again. we know that there exist perturbed eigenvectors x and y satisfying . ft). F) on the eigenvector x. y^Bx ^ 0. Hence the problem can be reduced to the ordinary eigenvalue problem Similarly. We begin with a qualitative assessment of the effects of the perturbation (E. • Since a — yHAx and ft = y^Bx. THE QR ALGORITHM To establish (4. we drop the Schur form and suppose that yH Ax = 0. we must have ?/H A — 0. /?}.136 CHAPTER 2.6). Perturbation theory We now turn to perturbation theory for the generalized eigenvalue problem. By shifting the pencil.7) is established similarly.x) of a regular pencil (A. Chapter 1. The implication (4.13. We also wish to know how close the perturbed eigenpair is to the original as a function of e. Since Ax is proportional to Bx.x) of (A. and specifically to the perturbation of a simple eigenpair ((a. Let us agree to measure the size of this perturbation by We wish to know if there is an eigenpair ({a. Then by (4. B).

Since the pencil is regular. we may write Similarly we may write Now Vol II: Eigensystems .B). Hence renormalizing x by dividing by 7 and setting e = f/c/7. (3} = (y^Ax. For defmiteness assume that UH = t/ H A / 0. x. Let x and y be simple eigenvectors of the regular pencil (A. Then by (4.SEC. Theorem 4. Now since x —>• x [see (4. we must have 7 —>• 1. Hence we can write x = -yx + Uc for some 7 and c.10. This implies that if U is an orthonormal basis for the orthogonal complement of u then the matrix (x U)is nonsingular. First-order expansions: Eigenvalues We now turn to first-order expansions of a simple eigenvalue.5) that [Here the notation (a.10)]. /? -f O(e)). (3) + O(e) is an abbreviation for the more proper (a + 0(e).9) implies that For the eigenvalue we have from (4. not both yH A and y^B can be zero. y.y^Bx} be the corresponding eigenvalue. THE GENERALIZED EIGENVALUE PROBLEM If we normalize x. 4.6) we have u^x ^ 0. The key to the proof lies in writing the perturbed eigenvectors x and y in appropriate forms.] Thus the projective representation of the eigenvalues converges at least linearly in e. y so that 137 then (4. Then Proof. and let (a. We will use the notation established above.

It is therefore natural to measure the distance between two eigenvalues (a.e. equation (4. Two observations on this theorem.||2 = \\y\\2 = 1.138 Similarly. ft) and (7. S) by the sine of the angle between them. CHAPTER 2. Chapter 1.11) can be written in the form Hence if A is finite. Denoting this angle by 0. THE QR ALGORITHM The expression (4. it is definite and satisfies the triangle inequality. /3) is the one-dimensional subspace of C 2 spanned by the vector (a ft) . We therefore turn to a different way of measuring the accuracy of generalized eigenvalues. Definition 4. The chordal metric From (4.4) we see that the projective representation (a. ft) and (7. that is.21).11. • If/? and S are nonzero and we set A = a/ft and p = 7/8. The CHORDAL DISTANCE between (a.11) says that up to second-order terms a and ft lie in the intervals [a . then . after some manipulation This justifies the following definition. a + c] and \J3 — e. • The function x is a metric. we have by the Cauchy inequality rp Hence. • Since ||a. ft + €\. This generalizes the Rayleigh-quotient representation of A in (3. S) is the number There are several comments to be made about this definition and its consequences. However. We can convert these bounds into bounds on the ordinary form A of the eigenvalue. the process is tedious and does not give much insight into what makes an eigenvalue ill conditioned.

and the chordal distance between them remains unchanged. the eigenvalue (a. Specifically. we can write Thus the chordal distance includes the point at infinity. oo) = 1. Vol II: Eigensystems .SEC. By a limiting argument. • The name chordal distance comes from the fact that x( A. x(0. • In Theorem 4. we saw how to shift a pencil and its eigenvalues by a nonsingular matrix W of order two. ft)W). the angle between two projective points is preserved. Hence for eigenvalues that are not too large. In general. By abuse of notation we will denote this distance by x( A. /u). the chordal metric behaves like the ordinary distance between two points in the complex plane. The condition of an eigenvalue Turning now to our first-order expansion. the chordal metric is not invariant under such shifts. ft) was shifted into {(a.8. THE GENERALIZED EIGENVALUE PROBLEM 139 Thus x defines a distance between numbers in the complex plane. we have from which we get But the numerator of this expression is Hence We summarize our results in the following theorem. In particular. in which we recapitulate a bit. //) is half the length of the chord between the projections of A and ^ onto the Riemann sphere. 4. But if W is a nonzero multiple of a unitary matrix.

where Let A ./?). Such are the regularizing properties of the chordal metric. • On the other hand.15) it is independent of the scaling of x and y.140 CHAPTER 2.12. This agrees with common-sense reasoning from the first-order perturbation expansion If both a and {3 are small.A + E and B = B + F. Let Xbea simple eigenvalue (possibly infinite) of the pencil (A. they are sensitive to the perturbation E and F. . then v is not large and the eigenvalue is well conditioned.I and F = 0. and set Then fore sufficiently small. there is an eigenvalue A the pencil (A. although A moves infinitely far in the complex plane. so that the eigenvalue A is infinite.B) and let x and y be its right and left eigenvectors. • If ||ar||2 = \\y\\2 = 1» the condition number v is large precisely when a and ft are both small. • The number v mediates the effects of the error £ on the perturbation of the eigenvalue and is therefore a condition number for the problem. However. it remains in a small neighborhood of infinity on the Riemann sphere. if one of a or j3 is large. By (4. J3) is ill determined. which means that the subspace (a. we may have j3 = 0. y). B) satisfying where There are five comments to be made on this theorem. THE QR ALGORITHM Theorem 4. Let the projective representation of A be (a. In particular. then |A| < 1 and ||ar||2|b||2 = sec Z(x. if we normalize A so that \\A\\2 = 1 and normalize x and y so that yHx — 1. In this case. • The bound does not reduce to the ordinary first-order bound when B . Hence.

2)]. 10 . It is obvious from the example that only a very special perturbation can undo a well-conditioned eigenvalue. B)(x X) must be of the form (4.16). and x must be an eigenvector. 4. B). If we perturb it to get the pencil then Thus the eigenvalues of the perturbed pencil are 1 ± ^/e. ife — 10 the "well-conditioned"eigenvalue 1 has changed by 10~5. The example shows that perturbations can cause an ill-conditioned eigenvalue to wander around. Conversely.13. then (u U) (A. by (4. let (x X) and (u U) be unitary matrices such that The fact that we can choose U so UHAx = UHBx = 0 is a consequence of the fact that x is an eigenvector. this problem should not be taken too seriously.SEC. Example 4. • Unfortunately. for the case \\A\\2 = 1 the results of ordinary perturbation theory and our generalized theory agree up to constants near one. IT Vol II: Eigensystems . if such a U exists. we appeal to the first step of the reduction to Schur form [see (4. bumping into other eigenvalues and destroying their accuracy. However. Perturbation expansions: Eigenvectors To derive a perturbation expansion for a simple eigenvector x of regular eigenpair (A. THE GENERALIZED EIGENVALUE PROBLEM Moreover. Specifically. as the following example shows. ill-conditioned eigenvalues can restrict the range of perturbations for which the first-order expansion of a well-conditioned eigenvalue remains valid. Consider the pencil where e is small.14). In particular. such as rounding or measurement error. Such perturbations are unlikely to arise from essentially random sources. 141 Hence.

where q is also to be determined. /?). B -f F). B}. Chapter 1. and we will drop it in what follows./3) would be an eigenvalue of (A. ||XH#||2 < 1. Hence p and q must satisfy the equations We may eliminate the auxiliary vector q from these equations by multiplying the first by /3 and the second by a. B) = (A+E. THE QR ALGORITHM Now let (A. and subtracting to get Now the matrix /3A . Thus where The condition of an eigenvector To convert (4. We also seek the corresponding matrix U in the form U = (U + uq^)(I + <jr# H )~ 2.16). But the last equality following from (4.12. differs from the identity by second-order terms in g.17) by a constant to make max{ | a |. Similarly. We seek the perturbed eigenvector in the form x — (x -f Xp). For x to be an eigenvector of (A.aB is nonsingular.18) to a first order-bound.142 CHAPTER 2. by Lemma 3. x) = ||J^Hx||2. up to second-order terms sin Z(z. contradicting the simplicity of the eigenvalue (a. B). J7H B x = UHFx+(3q+Bp. Finally. we may assume that p and q are small. Moreover. where p is to be determined. note that we can multiply (4. which makes f/ orthonormal. the factor (/+<?<?H)~ 2. |ft\} — 1. sinceX and U are orthonormal. Hence on taking norms we get: In the above notation suppose that and set Then . we must have /7H Ax = U^Bx — 0. In particular. For otherwise (a. Since we know that x converges to x as e = \j \\E\\p + \\F\\p —*• 0.

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM

143

The number <$[(a, /?), (A, B}] is analogous to the number sep(A, M) for the ordinary eigenvalue problem [see (3.16), Chapter 1]. It is nonzero if {a, /?) is simple, and it approaches zero as (a, /3} approaches an eigenvalue of (A, B). Most important, as (4.19) shows, its reciprocal is a condition number for the eigenvector corresponding to(a,/3).

4.2. REAL SCHUR AND HESSENBERG-TRIANGULAR FORMS
We now turn to the computational aspects of the generalized eigenvalue problem. As indicated in the introduction to this section, the object of our computation is a Schur form, which we will approach via an intermediate Hessenberg-triangular form. The purpose of this subsection is to describe the real generalized Schur form that is our computational goal and to present the reduction to Hessenberg-triangular form. Generalized real Schur form Let (A, B} be a real, regular pencil. Then (A, B) can be reduced to the equivalent of the real Schur form (Theorem 3.1); i.e., there is an orthogonal equivalence transformation that reduces the pencil to a quasi-triangular form in which the 2x2 blocks contain complex conjugate eigenvalues. Specifically, we have the following theorem. Theorem 4.14. Let (A, B) be a real regular pencil Then there are orthogonal matrices U and V such that

and

The diagonal blocks ofS and T are of order either one or two. The pencils corresponding to the blocks of order one contain the real eigenvalues of(A,B). The pencils corresponding to the blocks of order two contain a pair of complex conjugate eigenvalues of(A,B). The blocks can be made to appear in any order. The proof of this theorem is along the lines of the reduction to generalized Schur form, the chief difference being that we use two-dimensional subspaces to deflate the complex eigenvalues. Rather than present a detailed description of this deflation, we sketch the procedure. Vol II: Eigensystems

144

CHAPTER 2. THE QR ALGORITHM
1. Using Definition 4.6 and Theorem 4.8, argue that complex eigenvalues of a real pencil occur in conjugate pairs and that the real and imaginary parts of their eigenvectors are independent. 2. Let x = y + iz be an eigenvector corresponding to a complex eigenvalue and let X = (y z). Show that for some matrix L, we have AX — BXL. 3. Let(F V±) be orthogonal with K(V} = n(X). Let (U UL} be orthogonal withft(i7) = K(AV) U U(BV).
4. show that

where S and T are 2x2. By repeated applications of this deflation of complex eigenvalues along with the ordinary deflation of real eigenvalues, we can prove Theorem 4.14. Hessenberg-triangular form The first step in the solution of the ordinary eigenvalue problem by the QR algorithm is a reduction of the matrix in question to Hessenberg form. In this section, we describe a reduction of a pencil (A, B) to a form in which A is upper Hessenberg and B is triangular. Note that if A is upper Hessenberg and B is lower triangular, then AB~l is upper Hessenberg. Thus the reduction to Hessenberg-triangular form corresponds to the the reduction to Hessenberg form of the matrix AB~l. The process begins by determining an orthogonal matrix Q such that Q^B is upper triangular. The matrix Q can be determined as a product of Householder transformations (Algorithm 1:4.1.2). The transformation QT is also applied to A. We will now use plane rotations to reduce A to Hessenberg form while preserving the upper triangularity of B. The reduction proceeds by columns. Figure 4.1 illustrates the reduction of the first column. Zeros are introduced in A beginning at the bottom of the first column. The elimination of the (5, l)-element of A by premultiplication by the rotation $45 in the (4,5)-plane introduces a nonzero into the (5,4)-element of B. This nonzero element is then annihilated by postmultiplication by a rotation Z$± in the (5,4)-plane. The annihilation of the (4,1)- and (3, l)-elements of A proceeds similarly. If this process is repeated on columns 2 through n-2 of the pencil, the result is Algorithm 4.1. Here are some comments. • We have implemented the algorithm for real matrices. However, with minor modifications it works for complex matrices. • The computation of Q is special. Since Q is to contain the transpose of the accumulated transformations, we must postmultiply by the transformations, even though they premultiply A and B.

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM

145

Figure 4.1: Reduction to Hessenberg-triangularform

VolII: Eigensystems

146

CHAPTER 2. THE QR ALGORITHM

This algorithm takes a matrix pencil of order n and reduces it to Hessenberg-triangular form by orthogonal equivalences. The left transformations are accumulated in the matrix Q, the right transformations in the matrix Z.

Algorithm 4.1: Reduction to Hessenberg-triangularform

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM

147

• The operation count for the reduction of B is 2 |n3 flam. The reduction of A takes irot. Thus for real matrices, the algorithm requires

• If only eigenvalues of the pencil are desired, we need not accumulate the transformations. • The algorithm is backward stable in the usual sense.

4.3. THE DOUBLY SHIFTED QZ ALGORITHM
Like the QR algorithm, the doubly shifted QZ algorithm is an iterative reduction of a real Hessenberg-triangular pencil to real generalized Schur form. Since many of the details — such as back searching—are similar to those in the QR algorithm, we will not attempt to assemble the pieces into a single algorithm. Instead we begin with the general strategy of the QZ algorithm and then treat the details. Overview Let (A, B) be in Hessenberg-triangular form and assume that B is nonsingular. Then the matrix C — AB~l is Hessenberg and is a candidate for an implicit double QR step. The idea underlying the QZ algorithm is to mimic this double QR step by manipulations involving,only A and B. In outline the algorithm goes as follows. [Before proceeding, the reader may want to review the outline (3.6) of the doubly shifted QR algorithm.] 1. Let H be the Householder transformation that starts a double shift QR step on the matrix AB~l. 2. Determine orthogonal matrices Q and Z with Qei — GI such that (A, B} = Q^(HA, HB)Z is in Hessenberg-triangular form. To see why this algorithm can be expected to work, let C = AB~l. Then

Moreover, since Qei = e i»the first column of HQ is the same as that of H. It follows that C is the result of performing an implicit double QR step on C. In consequence, we can expect at least one of the subdiagonal elements c njn _! and c n _ 1?n _ 2 to converge to zero. But because (A, B) is in Hessenberg-triangular form,

Hence if frn-i,n-i and 6 n _2, n -2 do not approach zero, at least one of the subdiagonal elements a n>n _i and a n -i,n-2 must approach zero. On the other hand, if either 6 n _i ?n _i or 6n_2,n-2 approaches zero, the process converges to an infinite eigenvalue, which can be deflated. Vol II: Eigensystems

148

CHAPTER 2. THE QR ALGORITHM

If repeated QZ steps force a n)n _i to zero, then the problem deflates with a real eigenvalue. If, on the other hand, a n _ 1>n _ 2 goes to zero, a 2x2 block, which may contain real or complex eigenvalues, is isolated. In either case, the iteration can be continued with a smaller matrix. Getting started We now turn to the details of the QZ algorithm. The first item is to compute the Householder transformation to start the QZ step. In principle, this is just a matter of computing the appropriate elements of C and using Algorithm 3.1. In practice, the formulas can be simplified. In particular, if we set m = n—1, the three components of the vector generating the Householder transformation are given by

In EISPACK and LAPACK these formulas are further simplified to reduce the number of divisions. The formulas are not well defined if any of the diagonal elements 611, 622* bmrn, or bmm is zero. For the case of zero &n, the problem can be deflated (see p. 153). If the others diagonals are too small, they are replaced by eM||.Z?||, where B is a suitable norm. The above formulas do not use the Francis double shift computed from the matrix C. Instead they shift by the eigenvalues of the the trailing 2x2 pencil of (A, B): namely,

This approach is simpler and is just as effective. The QZ step Having now determined a 3x3 Householder transformation to start the QZ step, we apply it to the pencil (.4, B) and reduce the result back to Hessenberg-triangularform. As usual we will illustrate the process using Wilkinson diagrams. Our original pencil

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM
has the form

149

If we premultiply this pencil by the Householder transformation computed above, we get a pencil of the form

The next step reduces B back to triangular form. First postmultiply by a Householder transformation to annihilate the elements with superscripts 1. We follow with a plane rotation that annihilates the element with superscript 2. This produces a pencil of the form

We now premultiply by a Householder transformation to annihilate the elements of A with superscripts 1. This results in a pencil of the form

Once again we reduce B back to triangular form using a Householder transformation and a plane rotation:

Vol II: Eigensystems

150

CHAPTER 2. THE QR ALGORITHM

When we repeat the procedure this time, we need only use a plane rotation to zero out the (5,3)-element of A. The result is a matrix that is almost in Hessenberg-triangular form:

A postmultiplication by a plane rotation annihilates the (5,5)-element of B without affecting the Hessenberg form of A. Since the QZ algorithm, like the QR algorithm, works with increasingly smaller matrices as the problem deflates, the sweep will be applied only to the working matrix. Algorithm 4.2 implements a sweep between rows il and i2. In studying this algorithm, keep in mind that k—1 points to the column of A from which the current Householder transformation is obtained (except for the first, which is provided by the input to the algorithm). Also note that the Householder transformation that postmultiplies (A, B) reduces the three elements of B to a multiple of ej —not ei as housegen assumes. We therefore use the cross matrix

in statements 10 and 12 to reverse the order of the components. There are four comments to be made about this algorithm. • An operation count shows that the sweep requires

Thus a QZ sweep is a little over twice as expensive as a doubly shifted QR sweep. • The extra rotations in (4.21) are due to the necessity of eliminating 62 in (4.20). An alternative is to premultiply by a Householder transformation to eliminate the elements 61 in the pencil

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM

151

Given a real pencil (A, B) in Hessenberg-triangularform, and a vector u that generates a starting Householder transformation, qz2step implements a QZ sweep between rows il and /2. 1. <7z2ste/?(A, £, tt, i7, i2, Q, Z) 2. for jfc = il to /2-2 3. if(k^il) 4. fa?Msegew(egen A[fc:fc+2, fc—1], w, z/) 5. A[Ar,fe-lJ=i/;A[Jb+l:fc+2,Jt-l] = 0 6. end if
7.

8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.

VT = uT*B[k:k+2,k:n]; B[k:k+2,k:n] = B[k:k+2,k:n] - UVT v = Q[:, Ar:fc+2]*w; Q[:, k:k+2] = Q[:, k:k+2] - VUT housegen(B[k+2,k:k+2]*f, tiT, i/) ,B[Ar+2,A;+2] = i/;5[Jfe+2,A?:fe+l] = 0 WT = wT*f v = jB[l:A?-fl, k:k+2]*u; B[l:k+l, k:k+2] = B[l:k+l, k:k+2] - vu1 Jbk = min{A?+3,/2} t; = A[l:Jbk,fe:A;+2]*w;A[likk, k:k+2] = A[l:kk,fc:fc+2]- W1 v = Z[:, A;:A;+2]*w; Z[:,fc:A;+2]= Z[:, A;:fc+2]-vttT ro/^/i(5[A:+l,A;+l],k+1], 5[fc+l,Ar],c,5) rotapp(c, s, B[l:k, k+I], B[l:k, k}) rotapp(c, s, A[l:kk, jfc+l], A[l:Jbfe, fc]) rotapp(c,s,Z[:,k+l],Z[:,k]) end for A; rotgen(A[i2-l,i2-2], A[i2,i2-2], c, s) rotapp(c, s, A[/2-l, i2-l:z2], (A[i2,i2-l:i2]) rotapp(c, s, 5[i2-l, /2-l:i2], (J?[i2, i2-l:i2]) w/flpp(c» 5' Qt,*2-l], Qt,*2]) rotgen(B[i2,i2], B[i2,i2-l]1c,s) rotapp(c, s, 5[l:i2-l,i2], 5[l:/2-l,i2-l]) rotapp(c, s, A[l:i2, i'2], A[l:/2, i2-l]) rotapp(c, s, £[:, i2], Z[:, i2-l]) end qz2step Algorithm 4.2: The doubly shifted QZ step o

VT = u T *A[fc:fc+2, Jk:n]; A[fc:fc+2, fc:n] = A[fc:fc+2, fc:n] - «VT

Vol II: Eigensystems

. then our problem is to choose positive numbers 7*1. the sweep can be performed only on the subpencil (A[il : i2..il : z'2j) and the transformations need not be accumulated. This problem is not well determined.. 2x2 problems. Balancing The natural class of transformations for balancing the pencil (A.. If we take that value to be one.il : i2].. But we can make it well determined by specifying a value around which the scaled elements must cluster. . rn and c\. • If only eigenvalues are desired. Thus we should like to find diagonal matrices DR and Dc with positive diagonals so that the nonzero elements of of DRADC and DRBDC are as nearly equal in magnitude as possible. if we set then the problem is equivalent to finding quantities p\ and 7^ so that If we solve this problem in the least squares sense. B) by a scalar does not change the balance. .. cn so that The problem (4. IMPORTANT MISCELLANEA Anyone attempting to code a QZ-style algorithm from the description in the last two sections will quickly find how sketchy it is.. rn and c\. If we write then we wish to choose positive numbers ri. • The algorithm is stable in the usual sense.4.B[il : i2. 4..1S2 CHAPTER 2.. infinite eigenvalues. since multiplying the pencil (A. For more see the notes and references. However.22) is nonlinear in the unknowns rt. B) are the diagonal equivalences. .and GJ.... cn so that all the elements riCj\aij\ and TiCj\bij\ of DRADC and DRBDC are nearly equal. THE QR ALGORITHM This gives an operation count of which is exactly twice the count for a double QR sweep. . The purpose of this subsection is to fill the gaps—at least to the extent they can be filled in a book of this scope. back searching. we end up with the problem . and eigenvectors. We will treat the following topics: balancing.

Consider the Wilkinson diagram in which the fourth diagonal element of B is zero. it can be solved by an iterative technique called the method of conjugate gradients. where the elements A[/7-l. Consequently. il] and A[i2—1.6..12] have been deemed negligible.3)-plane to annihilate the hatted element in B. The chief problem is to decide when an element is negligible. in the course of the QZ iteration.23) are very simple. it is recommended that A and B be normalized so that their norms are one in some easily calculated. Infinite eigenvalues If. the pencil has an infinite eigenvalue. This scaling also helps in controling underflow and overflow in the QZ step.2 is performed between rows il and 12 of A and B. THE GENERALIZED EIGENVALUE PROBLEM 153 Without going into details. The integers il and 12 are determined by a back-searching algorithm analogous to Algorithm 2.e. balanced norm. rB) as a and r vary. a negligible element occurs on the diagonal of B.SEC. it will produce different results for the pair (ovl. There are many ways of treating infinite eigenvalues. A similar criterion is used for the elements of B. Back searching The QZ step implemented in Algorithm 4. The result is the pencil VolII: Eigensy stems . Each iteration requires 0(n) operations. We can postmultiply this pencil by a rotation in the (4. 4. The usual procedure is to assume that the problem is reasonably balanced and declare an element aij negligible if in some convenient norm. The procedure sketched above is not invariant under change of scale of A and B — i. The following is a simple and efficient deflation procedure. the normal equations for (4. and the method converges so swiftly that the determination of the scaling factors is also an 0(n) process (remember that we do not have to determine the scaling factors very accurately). Although the system is too large to solve directly.

.2)-plane to get and then premultiplying by a rotation in the (3.2)-plane gives As the lines in the above diagram show.5)-plane to return A to Hessenberg form: We can now repeat the process by postmultiplying by a rotation in the (3. THE QR ALGORITHM We may now premultiply a rotation in the (4.154 CHAPTER 2. the infinite eigenvalue has been deflated from the problem.and premultiplication gives Finally premultiplication by a rotation in the (1.4)-plane to get One more step of post.

The 2x2 problem When the QZ iteration deflates a 2x2 pencil. is not an easy problem when the effects of rounding error are taken into account.4)-plane to annihilate ci. This. Kronecker [152.141. who established the equivalent of the Jordan form for regular matrix pencils. In that case.1874] later gave a new proof for singular pencils. However. consider the matrix If we apply a plane rotation in the (5. Specifically. For modem treatments see [80.5. it may have real eigenvalues. so that the real eigenvalues appear explicitly on the diagonal of the Schur form. The details are too involved to give here. This amounts to computing the Schur form of a pencil of order two. this algorithm contains no significant numerical pitfalls. we can set the (5. and we only warn the reader not to go it alone. we will naturally want to deflate it further.269. and once again we refer the reader to the EISPACK or LAPACK code. The use of the chordal metric in generalized Vol II: Eigensystems . 1890] extended the result to rectangular pencils. Perturbation theory For a general treatment of the perturbation theory of the generalized eigenvalue problem along with a bibliography see [269]. if ei€2/c is negligible. 281. Unlike the generalized eigenproblem of order two.4)-element to zero and deflate the zero (4.SEC. Jordan [137. Copy the EISPACK or LAPACK code.4)-element as described above. it turns out.1868]. then the (5. Eigenvectors Eigenvectors can be computed by a back substitution algorithm analogous to the ones for the ordinary eigenvalue problem. 4. 4.304].4)-element becomes Consequently. its derivation is lengthy. The generalized Schur form is due to Stewart [251]. THE GENERALIZED EIGENVALUE PROBLEM 155 We can also deflate an infinite eigenvalue if there are two small consecutive diagonal elements. NOTES AND REFERENCES The algebraic background The generalized eigenvalue problem (regarded as a pair of bilinear forms) goes back at least to Weierstrass [293.

of course. Watkins shows that the the presence of small or zero elements on the diagonal of B does not interfere with the transmission of the shift. THE QR ALGORITHM eigenvalue problems is due to Stewart [255]. Kahan. is due to Moler and Stewart [178]. Ward [286] has observed that the algorithm performs better if real shifts are treated by a single shift strategy. Thus an alternative to deflating infinite eigenvalues from the middle of the pencil is to wait until they reach the top. The QZ algorithm The QZ algorithm. works when the eigenvalues of the trailing pencil are real. Moreover. who builds on earlier work of Curtis and Reid [50]. In particular. who is happy to acknowledge a broad hint from W. This option has been incorporated into the EISPACK and LAPACK codes. Ward [286] and Watkins [291] consider the problem of infinite eigenvalues. For the treatment of 2x2 blocks see the EISPACK code qzval or the LAPACK codeSLAG2. . where they can be deflated by a single rotation. However. including the preliminary reduction to Hessenberg-triangular form.156 CHAPTER 2. The balancing algorithm is due to Ward [287]. For a chordless perturbation theory and backward error bounds see [112]. The double shift strategy. both the reduction to Hessenberg-triangular form and the steps of the QZ algorithm move zero elements on the diagonal of B toward the top.

we treat the symmetric positive definite generalized eigenvalue problem Ax = XBx.3 THE SYMMETRIC EIGENVALUE PROBLEM We have seen that symmetric (or Hermitian) matrices have special mathematical properties. It is included in this chapter because the squares of the singular values of a real matrix X are the eigenvalues of the symmetric matrix A = X^X and the eigenvectors of A are the left singular vectors of X. but it is mathematically and numerically less tractable. They also have special computational properties.1 and the tridiagonal QR iteration itself in §1.1). In §2 we will consider some alternative methods for the symmetric eigenvalue problem: a method for updating the symmetric eigendecomposition. In the ordinary eigenvalue problem. this approach has numerical difficulties (see §3. the Cholesky algorithm for triangularizing a symmetric matrix requires half the operations that are required by its nonsymmetric counterpart. the QR algorithm for symmetric matrices has an analogue for the singular value problem. The first part of this chapter is devoted to efficient algorithms for the symmetric eigenvalue problem. We will treat this reduction in §1. In particular. a divide-and-conquer algorithm. Moreover. in principle. This problem has appealing analogies to the ordinary symmetric eigenproblem. Finally. In practice. Because of their symmetry they can be stored in about half the memory required for a general matrix. and methods for finding eigenvalues and eigenvectors of band matrices. Fortunately. The procedure begins as usual with a reduction to a simpler form—in this case tridiagonal form—followed by a QR iteration on the reduced form. where A is symmetric and B is positive definite. Thus. and it is better to solve the singular value problem on its own terms.2). in a brief coda. We next turn to the singular value decomposition. real and complex matrices are treated in es157 . their eigenvalues are real and their eigenvectors can be chosen to form an orthonormal basis for Rn. For example. In the first section we will show how to adapt the QR algorithm to solve the symmetric eigenvalue problem. Gaussian elimination (see Algorithm 1:2.2.2. the algorithms of the first section of this chapter can be used to solve the singular value problem. we can frequently use the symmetry of the problem to reduce the operation count in certain algorithms.

Hermitian. . and the definite generalized eigenvalue problem. THE SYMMETRIC EIGENVALUE PROBLEM sentially different ways. the singular value problem.158 CHAPTER 3. REDUCTION TO TRIDIAGONAL FORM If a symmetric matrix A is reduced to Hessenberg form by orthogonal similarity transformations. For the symmetric eigenvalue problem. But first we must consider how symmetric matrices may be economically represented on a computer. the complex case differs from the real case only in the fine details. the Hessenberg form becomes a tridiagonal form [see (1. Because orthogonal similarities preserve symmetry.1. the process simplifies considerably. the final Schur form becomes diagonal. Thus the matrix assumes a tridiagonal form illustrated by the following Wilkinson diagram: In this volume almost all tridiagonal matrices will turn out to be symmetric or Hermitian. Hence: Unless otherwise stated a tridiagonal matrix is assumed to be symmetric or. Thus the algorithm computes the spectral decomposition where V is an orthogonal matrix whose columns are eigenvectors of A and A is a diagonal matrix whose diagonal elements are the corresponding eigenvalues. 1. First. the columns of the accumulated transformations are the eigenvectors of A. The general outline of the procedure is the same as for the nonsymmetric eigenvalue problem—reduce the matrix to a compact form and perform QR steps. THE QR ALGORITHM In this section we will show how the QR algorithm can be adapted to find the eigenvalues and eigenvectors of a symmetric matrix. Second. accumulating the transformations if eigenvectors are desired. if complex. The purpose of this subsection is to describe the details of the reduction of a symmetric matrix to tridiagonal form. all the elements below its first subdiagonal are zero and hence by symmetry so are all the elements above the first superdiagonal. For this reason we will assume in this chapter that: A and B are real symnetric matrices of order n.1)]. Finally. 1.

since the code references only a one-dimensional array. we will assume in this volume that the symmetric matrix in question is contained in the upper part of a two-dimensional array. Second. First. The following diagram illustrates a 4x4 symmetric matrix stored in an array a in column order. First. on nonoptimizing compilers it may run more efficiently. THE QR ALGORITHM The storage of symmetric matrices 159 Since the elements a^ and ajz of a symmetric matrix are equal. 1. Packed storage offers two advantages. Second. We partition A in the form and determine a Householder transformation H such that Ha^i = 76!. The price to be paid for these advantages is that the resulting code is hard to read (although practice helps). Reduction to tridiagonal form The algorithm for reduction to tridiagonal form is basically the same as the algorithm for reduction to Hessenberg form described beginning on page 83. The two usual ways of doing this are to store them in column order and in row order. This approach has the advantage that it permits the use of array references like A[z. Because our algorithms are easier to understand in the natural representation. j] while leaving the rest of the array free to hold other quantities. say.SEC. The disadvantage is that the rest of the array is seldom used in practice because its n(n-l)/2 elements are insufficient to hold another symmetric or triangular matrix of order n. the upper half of a two-dimensional array of real floating-point numbers and code one's algorithms so that only those elements are altered. There are two ways of taking advantage of this fact to save storage. one can put the diagonal and superdiagonal elements of a symmetric matrix into a one-dimensional array — a device known as packed storage. it uses all the storage allocated to the matrix. a symmetric matrix is completely represented by the n(n+l)/2 elements on and above its diagonal. Then Vol II: Eigensystems . one can store the matrix in. unlike the natural representation.

then This formula can be simplified by setting so that Before we can code the algorithm. If we denote the vector defining E by r. say d. and the off-diagonal elements in another one-dimensional array.160 CHAPTER 3. since in our storage scheme traversing a row . • The computation of the product of A[k+1. The computation of EA^E can be simplified by taking advantage of symmetry. The reduction continues by recursively reducing EA^E to tridiagonal form. The usual convention is to store the diagonal of the matrix in a onedimensional array. Thus the final tridiagonal matrix has the form Algorithm 1. THE SYMMETRIC EIGENVALUE PROBLEM This matrix has the Wilkinson diagram (n = 6) from which it is seen that the transformation has introduced zeros into the appropriate parts of A. k+1] and the Householder generating vector must be done at the scalar level. Here are some comments. say e.1 implements the tridiagonalization described above. we must decide how to represent the final tridiagonal matrix.

1.n-l:n] = I 28. -f 5[z]*r[z] 16. end for fc. 5. V[fc+l:n. r[k+l:n]. 7.(w/2)*r[k+l:n] 18. end TriReduce Algorithm 1. 8. rf[n-l] -1] = A[n-l. THE QR ALGORITHM 161 Given a real symmetric matrix A of order n. VT = rTF[A. 15. end for k 34.n-l] 25.SEC. r = V[k+l:n1k] 30.r*vT 32. TriReduce reduces A to tridiagonal form.j]*r[. e. The transformations are accumulated in the array V.] end for j iu = i/. d[n] = A[n. 6. 24. 14.k+l:n]. A?+l:w] = V[A:+l:w.s[i]*r[j] 21.r[i]*s[j] . e[k+l]) V[k+l:n. 1. e[n-l] = A[n-I. F[l:n-2. The diagonal elements of the tridiagonal matrix are stored in the array d\ the offdiagonal elements.n] 26. 4. V[n-l:n.1: Reduction to tridiagonal form o Vol II: Eigensystems . 13. in the array e. A [ i . d*r[j] end for j for j = i to n sM = *[*] + 4[i. end for i 17.n-l:n] = 0. 12. d.k] housegen(A[k. V) for k ~ 1 to n-2 3. k+l:n] 31. 2. s[k+l:n] = s[k+l:n] .+l:n. 11. f o r j = Ar+1 to 7^ 19. for A. d(k) = A[k.A ? ] = c Jb 33. f o r i = 1 to j 20. V r [:. end for i 22.k] = r[k+l:n] w =0 for i = fc+1 ton s[i] = 0 9.fc+l:«]. j ] = A[iJ] . n] 27. TriReduce(A. 10. end for j 23. tor j = k+l to i-l s[i] = s[j] + A [7. = n-2 to 1 by-1 29.

• The algorithm is backward stable in the usual sense (see Definition 2.4. The tridiagonal matrix T has the form . Chapter 2. can contain complex numbers. however. we can make the elements of e real by a sequence of diagonal similarity transformations. • The first column of the matrix V is GI .162 CHAPTER 3. • In spite of the fact that the final tridiagonal form requires only two one-dimensional arrays to store. This particular implementation uses the array containing the original matrix A. The process is sufficiently well illustrated for the case n — 3. and it is an interesting exercise to recast it so that it has uniform orientation.n]. the array e. • The computation of s. Specifically: The operation count for Algorithm 1. and r 1 A^r in (1. Chapter 2).1 is flam for the reduction of A flam for the accumulation of V This should be compared with an operation count of |n3 flam for the reduction of A in Algorithm 2. Making Hermitian tridiagonal matrices real Algorithm 1. This fact will prove important in deriving the implicit shift QR algorithm for symmetric tridiagonal matrices. This means that subsequent manipulations of the tridiagonal form will use complex rotations.2. This computation is neither column nor row oriented. which is expensive (see the comment on page 91). To keep the indexing consistent we store the Houselder vector in r[fc+l.2) is done in statements 8-17. its computation requires the upper half of a two-dimensional array. The computation of A^r requires two loops because only the upper part of the array A is referenced. Fortunately. we economize by saving the generating vectors for the Householder transformations and then accumulating them in a separate loop. • As in Algorithm 2. Chapter 2. • The algorithm is cheaper than the reduction of a general matrix to Hessenberg form. destroying A in the process. In this form the array d will necessarily contain real numbers.1 can easily be converted to compute the tridiagonal form of a complex Hermitian matrix.2. THE SYMMETRIC EIGENVALUE PROBLEM of A must be done by proceeding down the corresponding column to the diagonal and then across the row.

it is real.2 sketches this scheme. z/i.SEC. Then 163 If e\ — 0.3)]. The details are left as an exercise. the result of performing a QR step on T is symmetric and tridiagonal. the result of performing a QR step on T is upper Hessenberg. This problem can be overcome by adjusting the e^ (and the columns of V) as they are generated in the course of the reduction to tridiagonal form. Otherwise we take v\ — e-t l\e\ . As above. THE QR ALGORITHM The first step is to make e\ real. 1. THE SYMMETRIC TRIDIAGONAL QR ALGORITHM Since a tridiagonal matrix T is necessarily upper Hessenberg and the basic QR step preserves upper Hessenberg form. It is incomplete in an important respect. Since the QR step also preserves symmetry. speedy algorithm for computing the eigenvalues and optionally the eigenvectors of a symmetric tridiagonal matrix. When this fact is taken into account the result is a sleek. Algorithm 1. 1. where then Thus the original matrix has been reduced to real tridiagonal form. whereas at the end we require an array of real floating-point numbers. Let D\ = diag(l. Vol II: Eigensystems . Thus the QR algorithm preserves symmetric tridiagonal form. 1. and we do not need to do anything. 1). The array e is necessarily an array of type complex. so that Similarly if we define D-2 — diag(l. In this case we may take 1/1 = 1. v<£). we will assume that the tridiagonal matrix T in question is represented by an array d containing the elements of the diagonal of T and an array e containing the elements of the superdiagonal [see (1. if T is symmetric and tridiagonal.2.

Ar+1] = */*F[l:n. Since PI 2 is a rotation in the (1. end if 9. end for fc Algorithm 1.1)-element ofP-^T-K^iszero. F[l:n. 5.6). Determine an orthogonal matrix Q whose first column is a multiple of ei such that f = QT-Fj2 APi2Q is tridiagonal. 4. the matrix P^APu has the form . In broad outline the tridiagonal QR step goes as follows. the reasoning following (3.2)-plane. end if 11. 2. 1. &+1] 10. 2. An implementation of this sketch goes as follows. The transformations are accumulated in a matrix V. shows that T is the matrix that would result from an explicit QR step with shift AC.2: Hermitian tridiagonal to real tridiagonal form The implicitly shifted QR step The symmetric tridiagonal QR algorithm has explicitly and implicitly shifted forms. 1. Chapter 2. 3. As with the implicit double shift. Given a shift AC determine a plane rotation P12 such that the (2.164 CHAPTER 3. if (& / n-1) 7. of which the latter is the most commonly implemented. THE SYMMETRIC EIGENVALUE PROBLEM This algorithm takes a Hermitian tridiagonal matrix represented by the arrays d and e and transforms it to a real tridiagonal matrix. efc+i = £*efc+i 8. for k = 1 ton-1 a = \ek\ if(a/0) v — ek/a ek = a 6.

The first two statements of the loop complete the Vol II: Eigensystems . Consequently. we obtain a matrix of the form We see that / has been chased one row and column down the matrix.3 implements a QR step between rows il and 12 of T. 1. the matrix has the form where a hat indicates a final value.3)-plane to annihilate /. In the implementation of this algorithm we must. THE QR ALGORITHM 165 Figure 1. Algorithm 1.1 illustrates the entire procedure. On applying the similarity transformation. The application of the rotations is tricky. The element g is an intermediate value of di after the rotation Pi-ij has been applied.1: Implicit tridiagonal QR step We now choose a rotation P23 in the (2. take into account the fact that the problem may have deflated. This process can be repeated until the element / drops off the end. Figure 1. As one enters the loop on i. as usual.SEC.

fi 9. AC. /. / = s*e[i] 6.«+!]) 15. 2. b = c*e[i] 1. 4. this algorithm performs one QR step between rows il and i"2.p = 0. end triqr Algorithm 13: The symmetric tridiagonal QR step o .166 CHAPTER 3. rotgen(g. i'2. 5. V[:.p 17. end for i 16. THE SYMMETRIC EIGENVALUE PROBLEM Given a symmetric tridiagonal matrix whose diagonal is contained in the array d and whose superdiagonal is contained in e. 1. 5 = 1. 13. e. c[t2-l] = ^ 18.K 3. g = c*v—b 14. triqr(d. fori = i7toi2-l 5. /o/flpp(c. F[:. /7. V) g = d[i7] . if(tV/7)e[2-l] = ^. d[i2] = d[i2] . v = (d[z+l] — w)*5 + 2*c*6 11. 5) 8. w = d[i] — p 10. p = 5*v d[i] = w+p 12. c. t]. c = l. The transformations are accumulated in V.

14). which is simply the diagonal element c/.2. Local convergence The local convergence analysis beginning on page 73 applies to the symmetric problem. If only eigenvalues are required. 1. Specifically. statement 14 can be eliminated with great saving in computations. then (2. The first is the Rayleigh quotient shift. so that in the presence of rounding error the statements do not produce the same results as the direct application of the plane rotations PI. because the multiple eigenvalues of a defective matrix VolII: Eigensystems . Chapter 2. The accumulation of the transformations in V requires mn flrot.. Thus the computation of eigenvectors is the dominant part of the calculation. THE QR ALGORITHM application of P{-\^ to bring the matrix in the form 167 Finally the subsequent statements bring the matrix into the form which advances the loop by one step. dn is converging to a simple eigenvalue.The second is the Wilkinson shift. It is worth noting that the derivation of this sequence requires the application of the identity c2 -f s2 = 1. Choice of shift Two shifts are commonly used with Algorithm 1. provided the Rayleigh quotient shift is used. In fact. then the transformation of T requires 6m fladd and 5m flmlt plus m calls to rotgen.2. Such "simplifications" based on exact mathematical identities can turn a stable algorithm into an unstable one.3.+I. this is not true of Algorithm 1. which is the eigenvalue of the matrix that is nearest c/. say. if.The Wilkinson shift is globally convergent and is generally the one that is used.SEC. implies that the values of e n _i converge cubically to zero. which is stable in the usual sense. It is easy to see that if m = 12 — il.3. Fortunately.

Here we will propose yet another solution adapted to tridiagonal matrices. we have the latter approximation holding when p is small. We will proceed informally by deriving a criterion for deflating a graded matrix of order two. If we wish the relative error in pd to be less than the rounding unit.168 CHAPTER 3. Chapter 2. we must have or . where we proposed two possible solutions. we can set it to zero to deflate the problem. We have already treated this problem in §2. The Wilkinson shift usually gives cubic convergence. Since A = pd. although in theory the convergence can be quadratic. one for balanced matrices and one for graded matrices. the matrix has eigenvalues d and pd. This possibility is so remote that it is not considered a reason to prefer the Rayleigh quotient shift over the Wilkinson shift. Differentiating this equation implicitly with respect to g gives where \g is the derivative of A with respect to g.4. The characteristic equation for T is where g = e2. Hence when e is small. the perturbation in the eigenvalue pd is approximately —e2/a. one ending with di and the other beginning with d{+i. Deflation If ei is zero. the tridiagonal matrix T deflates into two tridiagonal matrices. When e is zero. Thus we must consider what criterion we shall use to decide when a subdiagonal element of a tridiagonal matrix is effectively zero. Consider the matrix where p < 1 represents a grading ratio. THE SYMMETRIC EIGENVALUE PROBLEM are nondefective the convergence is cubic even for multiple eigenvalues. If et is negligible. We will now see how e perturbs the eigenvalue pd when e is small.

3 to realize that the formulas can be manipulated in many different ways.10. which is required reading for anyone interested in the subject. each with their advantages—real or imagined. Alternatively. See the notes and references.5). For more see the notes and references. then the criterion is stricter than (1. as does the Handbook series of algorithms [305]. 1. but it is too liberal for ordinary matrices.36). 1. if the matrix is graded as in (1. Chapter 2]. there is a variant of the QR algorithm. which tends to preserve the smaller eigenvalues. Not surprisingly.4). which is especially valuable when only the eigenvalues of a symmetric tridiagonal matrix are needed. Wilkinson's Algebraic Eigenvalue Problem [300] contains much valuable material. On the other hand. Variations This brings to an end our discussion of the QR algorithm for symmetric matrices.3. the grading should be downward [see (2. called the QL algorithm. Graded matrices It is important to remember that when one is using the tridiagonal QR algorithm to solve graded problems.SEC.6) always preserves the absolute accuracy of the eigenvalues. and the criterion reduces comparing et. which performs well on matrices that are graded upward. NOTES AND REFERENCES General references The title of this chapter is the same as that of Beresford Parlett's masterful survey of the field [206]. If the grading is upward we can flip the matrix about its cross diagonal to reverse the grading.to the rounding unit scaled by the size of the local diagonals. Because ^f\d^di^\\ < ||T||F. THE QR ALGORITHM 169 Note that the quantity y |a||pa| is the geometric mean of the two diagonal elements of T. Vol II: Eigensystems . This suggests that we regard a subdiagonal of a graded tridiagonal matrix to be negligible if This criterion is effective for graded matrices. Of these variants perhaps the most important is the square root free PWK method. But it by no means exhausts the subject. We therefore replace it with Note that if di and di+i are of a size. One has only to write down a naive version of the tridiagonal QR step and compare it with Algorithm 1. then their geometric mean is about the same size. Chapter 1) insures that the criterion (1. the Hoffman-Wielandt theorem (Theorem 3. people have proposed many versions of the basic tridiagonal algorithm.

For positive definite matrices.6) for setting an element to zero informally by considering a 2x2 matrix. which is based on the sequence defined by The QL step moves northwest from the bottom of the matrix to the top. that algorithm must also proceed from the bottom up. Ch. Householder [123] observed that a general matrix could be reduced to Hessenberg form by Householder transformations and by implication that a symmetric matrix could be reducted to tridiagonal form.26]. If the tridiagonal matrix was obtained by Householder tridiagonalization. although the method performs well in practice. and it is the first off-diagonal element that converges rapidly to zero. The tridiagonal QR algorithm Although Francis proposed his implicitly shifted QR algorithm for Hessenberg matrices.170 CHAPTER 3. However. 8]. He went on to find the eigenvalues of the tridiagonal matrix by Sturm sequences and the inverse power method (see §2. Chapter 2. The actual algorithm given here is due to Wilkinson [297]. An alternative is the QL algorithm. which in consequence are often called Givens rotations.1954] was the first to use orthogonal transformations to reduce a symmetric matrix to tridiagonal form. Jiang and Zhang [135] have proposed a way of alternating between the Rayleigh quotient and Wilkinson shifts in such a way that the algorithm is globally and cubically convergent. As mentioned in §2.3]. If a symmetric tridiagonal matrix is graded upward. Perhaps the most complete set of options is given by the LAPACK codes xSYTRD and xSTEQR.35]. For an error analysis of Householder's method see [300. The result is not quite global convergence.6.3). For a detailed treatment of the algorithm see [206.1. rather than top down as in Algorithm 1. and the method is called Givens' method. 4] gives a proof of the convergence of the QR algorithm with Rayleigh quotient shift that he attributes to Kahan. the extension to symmetric tridiagonal matrices is obvious. He used plane rotations. For a brief summary of the various implementations of the symmetric tridiagonal . The latter automatically chooses to use either the QR or QL algorithm based on the relative sizes of the diagonals at the beginning and ends of the current blocks. THE SYMMETRIC EIGENVALUE PROBLEM Reduction to tridiagonal form Givens [83. Theorem 2. For an error analysis of Givens' method see [300. Ch. the QR algorithm will generally fail to compute the smaller eigenvalues accurately. In 1968 Wilkinson [302] showed that the QR algorithm with Wilkinson shift (not his name for it) is globally convergent. the criterion can be rigorously justified [59. We have derived the criterion (1. and this has been the preferred shift ever since. The former reduces a symmetric matrix to tridiagonal form and can be coaxed into performing the reduction either top down or bottom up. Parlett [206. §5. §5.

3 we will show how to compute selected eigenvalues and eigenvectors of symmetric band matrices. RANK-ONE UPDATING In this subsection we will be concerned with the following problem. 2. and the central part of the computation—the diagonal updating problem—is useful in its own right. the Cholesky decomposition—updating can result in a striking reduction in computation. §§8. the updating algorithm is less expensive than computing the decomposition from scratch.1). This algorithm can in turn be used to implement a divide-and-conquer algorithm for the symmetric tridiagonal problem. Finally. in §2. In this section we will consider some important alternative algorithms. since a — ±1. Reduction to standard form We first turn to the problem of reducing A = A-fcrxx1 to the standard form D + zz1.14-15]. which is treated in §2. it might be hoped that we can use the spectral decomposition (2.2. A CLUTCH OF ALGORITHMS 171 QR algorithm see [206. Algorithm 2. Let where a — ±1. With some decompositions—e. We proceed in stages.10). For an implementation see the LAPACK routine xSTERF. which also contains a complete exposition of the PWK (Pal-Walker-Kahan) algorithm. At each stage we underline the currently reduced matrix First. Chapter 1]. to simplify the computation—a process called updating. updating the spectral decomposition requires 0(n3) operations. A CLUTCH OF ALGORITHMS The QR algorithm is not the only way to compute the eigensystem of a symmetric matrix. We will begin this section with the problem of computing the eigensystem of a diagonal matrix plus a rank-one modification. which we already know. we may write the updating problem in the form Vol II: Eigensystems . typically from 0(n3) to O(n2). Unfortunately.. 2.1. However. Nonetheless. 2. Let the symmetric matrix A have the spectral decomposition [see (1. Find the spectral decomposition A — VAV"T of A.1 outlines the basic steps of the updating algorithm.g.SEC. We will treat these steps each in turn. One way of solving this problem is to form A and use the QR algorithm to compute its eigensystem.

1: Rank-one update of the spectral decomposition This makes the sign of xx1 positive. Find all the remaining eigenvalues of D + zz1'.1). from the spectral decomposition (2. 3. and z is a vector with nonnegative components. Third. This is done by solving a rational equation whose roots are the required eigenvalues. Second. where <r = ±l. let 5 be a diagonal matrix whose elements are ±1.172 CHAPTER 3. which is diagonal. Fourth. 5. This algorithm sketches how to compute the spectral decomposition of A = A + axx^. such that Sy is nonnegative. Then This makes the elements of the vector defining the update nonnegative. we have rp The matrix to be updated is now <jA. 2. Transform the eigensystem of D + ZZT back to the original eigensystem of A. Use the eigenvalues to compute the eigenvectors of D -f zz^. let P be a permutation such that the diagonals of cr_PT AP appear in nondecreasing order. 1. Algorithm 2. THE SYMMETRIC EIGENVALUE PROBLEM Let A — FAFT be the spectral decomposition of the symmetric matrix A. Reduce the problem to that of updating the eigensystem of D + zz . Deflate the problem to get rid of all very small components of z and to make the diagonal elements of D reasonably distinct. where D is diagonal with nondecreasing diagonal elements. 4. Then If we now set then Thus if we can compute the spectral decomposition .

The computation of ( V S ) P is a permutation of the columns of VS. Instead one might construct a data structure that keeps track of where these quantities should be and reference them through that structure. Algorithmically. Any permutation of the diagonals must be reflected in a corresponding permutation of the columns of VS and the components of z. A CLUTCH OF ALGORITHMS then with A = ah the decomposition 173 is the spectral decomposition of A. Deflation: Small components of z If the ith component of z is zero. update . 2. By a permutation we can move this eigenvalue to the end of the matrix and bring it back at the end of the update. But still it is faster. and the components of z until the very end. When we speak of such permutations.1). • To complete the updating. • Unless the order n of the problem is small. • In an actual implementation it may be better not to permute the columns of VS.SEC. the diagonals of D. • We will have occasions to exchange diagonals of D later in the algorithm. The computation of VS is simply a column scaling. Here are some comments on the reduction. This has the consequence that our updating algorithm does not gain in order over the naive scheme of computing the decomposition of A from scratch. so that z has the form then D -f ZZL can be partitioned in the form In this case the element di is an eigenvalue of the updated matrix. what the equations say is that if a — -1. The computation of (VSP)Q is a matrix-matrix multiplication and takes about n3 flam. we must compute VSPQ (this is essentially step five in Algorithm 2. we will tacitly assume that VS and z have also been adjusted. about which more below. care should be taken to choose an efficient algorithm to sort the diagonals of D in order to avoid a lengthy series of column interchanges in the matrix VS. • The way a = ±1 keeps popping in and out of the formulas may be confusing.A and then change the sign back at the end. Vol II: Eigensystems .

Now in this case we can write Thus the effect of replacing <?. if this criterion is satisfied.. we get the bound Moreover we can estimate Hence our criterion becomes Here we have absorbed the constant \/2 into 7. The process is sufficiently well illustrated by the 2x2 problem Let c and s be the cosine and sine of a plane rotation that annihilates the component Z2 of z: . setting Z{ to zero will move each eigenvalue by no more that \\D-\. we can use a plane rotation to zero out one of the corresponding components of z.zz^ I^M.by zero is to replace D + zz1 by D + ZZT . Deflation: Nearly equal d{ It turns out that if two diagonals of D are sufficiently near one another. By Theorem 3.is small.Unless the original matrix has special structure.174 CHAPTER 3. this is the kind of perturbation we should expect from rounding the elements of A. we will seldom meet with components of z that are exactly zero. and we must decide when we can regard it as zero. thereby deflating the problem.E. Now by computing the Frobenius norm of E. Thus a reasonable criterion is to require Z{ to be so small that where eM is the rounding unit. Instead we will find that some component zt.8. THE SYMMETRIC EIGENVALUE PROBLEM In practice. and 7 is some modest constant (more about that later). Chapter 1.

The secular equation We turn now to the computation of the eigenvalues of D + ZZH .SEC. because the diagonal elements of d are nondecreasing it is only necessary to test adjacent diagonal elements. the accuracy of our final results will deteriorate. A CLUTCH OF ALGORITHMS where z\ = y1' z\ + z\. The conventional wisdom is that 7 should be a modest constant. Theorem 2. the eigenvalues ofD + zz1 satisfy the SECULAR EQUATION Moreover. the above discussion suggests that it should have a reasonable large value. If d\ < d% < • • • < dn and the components of z are nonzero.1) satisfy Vol II: Eigensystems . if the criterion is too liberal. the solutions AI < • • • < An of (2. Since 7 controls the stringency of the criterion. we will seldom deflate and miss the substantial benefits of deflation. the diagonal that remains must be tested again against its new neighbor. this is too liberal because the rounding-error analysis is necessarily conservative. Thus the deflation procedure can be seen as a preprocessing step to insure that the computation as a whole runs smoothly. As above. If it is too stringent. This favors a liberal criterion over a conservative one. A strict rounding-error analysis suggests that 7 ought to grow with n. When to deflate There is an important tradeoff in our criteria for deflation. 2. However. once a diagonal has been deflated. On the other hand. We begin with a theorem that shows where the eigenvalues lie and gives an equation they must satisfy. say 10. Of course. There is yet another tradeoff. and the problem deflates. we deem this quantity sufficiently small if In the general case. if (d^ — d\)cs is sufficiently small we can ignore it. The calculations of the eigenvalues and eigenvectors to be described later require that the Z{ be nonzero and that the d{ be strictly increasing. Then 175 Consequently.1. Saving accuracy at one stage is a poor bargain if it is lost at another.

. To derive the secular equation we will employ the useful fact that This can be proved by observing that any vector orthogonal to v is an eigenvector of / — UVT corresponding to one. /(A) becomes negative.. Hence / has a zero AI in (d\. Thus the characteristic polynomial is the product of Hi(^ ~ ^) anc* /(A) defined by (2.9). we assume that. Thus our root-finding algorithm must be specifically designed to work with the secular equation. as A approach dn from above. excluding overflows and underflows. This can be done by . /(A) must become negative. Thus in principle we could use an off-the-shelf root finder to compute the A.176 CHAPTER 3. oo) for An]. of/ are also zeros of the characteristic polynomial p and hence are the eigenvalues of D + zz^. The characteristic polynomial of D + zz^ is the last equality following from (2. Let us consider the zeros of the latter.7). The proof is completed by observing that the n zeros A. the interlacing inequality (2. However. while the vector v is an eigenvector corresponding to 1 — v*-u. But first a word on computer arithmetic.In the same way we can show that / has zeros \i £ (di. The first step is to determine whether A^ is nearer <4 or dk+i.1 shows that we can find the updated eigenvalues A z of A (up to the sign a) by solving the secular equation (2. to determine the eigenvectors the eigenvalues must be calculated to high accuracy. /(A) -*• 1. /(A) must become positive. The result now follows from the fact that the determinant of a matrix is the product of the eigenvalues of the matrix. Specifically.. and here we will merely sketch the important features... Moreover. As A approaches di from below. For more see the notes and references.8) shows that each root A z is simple and located in (d. The detailed description of such an algorithm is beyond the scope of this book. d. • Solving the secular equation Theorem 2. each floating-point operation returns the correctly rounded value of the exact operation. This assumption is critical for the algorithms that follow. As A approaches d\ from above. where k < n.7).e^). THE SYMMETRIC EIGENVALUE PROBLEM Proof. while as A —»• oo. Finally.+i) [or in (dn. Throughout these volumes we assume that floating-point computations are done in standard floating-point arithmetic.di+i) (i — 1.. Hence / has a zero A n satisfying An > dn. n .1). For definiteness we will consider the problem of computing A^.

the functions r(r) and s(r) can be evaluated to high relative accuracy whenever r is between 0 and (cfor+i — cfo)/2. Specifically. A& is nearer di\ if it is positive. Finally. Our stopping criterion is: Stop when the computed value of g(T) at the current iterate ios s than its error biound. For more. Then This shift has the advantage that we are now solving for the correction T^ to dk that gives Afc. For definiteness we will suppose that \k is nearer d^. the most effective schemes are based on approximating g ( r ) by a rational function g ( r ) and finding the root of of g as the next approximation. because g ( r ) has poles at zero and dk+i — dk. However. Moreover. In principle any good zero-finding technique applied to g will suffice to find the correction r^. Because of the simplicity of the function g(r). let and define g(r) = f(dk -f T). to which we now turn. 2. The second step is to shift the secular equation so that the origin is d^.D) lz. see the notes and references. Computing eigenvectors Let A j be an eigenvalue of D+zz1. Afc is nearer di+i.SEC. Then the corresponding eigenvector q must satisfy the equation (D + zz^}qj = \jq3 or equivalently Thus qj must lie along the the direction of (\jl . as it is calculated one can also calculate a bound on the error in its computed value. A CLUTCH OF ALGORITHMS 111 evaluating f[(dk -f dk+i )/2]. If the value is negative. It can be shown that this criterion returns eigenvalues that are sufficiently accurate to be used in the computation of the eigenvectors. and hence the normalized eigenvector is given by VolII: Eigensystems . one must decide when the iteration for T> has converged.

It is based on the following theorem. as it will be when d{ and di+i are very near each other. Theorem 2. Hence the z^ are well defined and given by (2.. we find that The interlacing inequalities imply that the right-hand side of this equation is positive. If Xj is very near di. replacing Xj by Xj in (2. r-r\ Proof. If we set z = (zi • • • zn) . eigenvectors computed in this way may be inaccurate and.10)]. Let the Quantities A. Instead we have approximations Xj obtained by solving the secular equation. One cure for this problem is to compute the eigenvalues Xj to high precision (it can be shown that double precision is sufficient).14) under the assumption that the required Zi exist. However. unless Xj is a very accurate approximation to Xj.. be far from orthogonal to one another.178 CHAPTER 3. . . and solving for zf. Now the ith component of qj is given by where rjj is the normalizing constant. A n . We compute the characteristic polynomial of D + ZZT in two ways: [see (2. satisfy the interlacing conditions Then the quantities are well defined. In particular.14). then the eigenvalues of D — zz1 are AI. we do not know the eigenvalues of D + zz1'. worse yet. Setting A = c/j. We will first derive the formula (2. there is an alternative that does not require extended precision. THE SYMMETRIC EIGENVALUE PROBLEM Unfortunately. .2. and then show that these values have the required property.12) will make a large perturbation in qij.

There remains the question of the stability of this algorithm. with the appropriate signs. it can be shown that if we use the stopping criterion (2.12). 2. However. the Z{ can be computed to high relative accuracy. The computation of the vectors requires approximately n2 fladd and n2 fldiv for the computation of the components of Q and n2 flam and n2 fldiv for the normalization (the divisions in the normalization can be replaced by multiplications). they must be within a small multiple of the rounding unit. after all. To apply the theorem. We summarize the computation of the eigenvectors in Algorithm 2. this does not imply that the computed eigenvectors are accurate.15) are equal at the n distinct points dj. they will be orthogonal almost to working accuracy. In particular. the matrix Q of these eigenvectors will be orthogonal to working accuracy.14).. we have not found the eigenvectors of D + zz1 but those of D -f ZZT. and then use them to compute the components of the eigenvectors from (2. But by Theorem 3. Hence the components QJ of the computed eigenvectors qj will be to high relative accuracy the components of the exact eigenvectors of D + ZZT . the Aj satisfy the secular equation for D + zz?'. • The computation of the Z{ requires approximately 2n2 flam and n2 fldiv. they must be identical. Now in standard floating-point arithmetic. the vectors corresponding to poorly separated eigenvalues will be inaccurate.. as we have pointed out. and. As we shall see in a moment. in the expression the Aj are the exact eigenvalues of D + ZZT. For if Z{ and Z{ have opposite signs. note that by the way the Z{ were defined the two expressions for the characteristic polynomial of D+z ZT in (2.2.SEC.14) does not quite specify 5t-. A CLUTCH OF ALGORITHMS 179 To show that the eigenvalues of D -f ZZT are the Aj. then where fi is a slowly growing function of n. • We have already noted that the algorithm is backward stable.11) in finding the approximate eigenvalues A.e. Since both expressions are monic polynomials of degree n. Chapter 1. The inequality (2. Here are two comments on this program. • Equation (2. since the right-hand side contains the factor sign(^). They will have small residuals [see (2. and hence insignificant. suppose that we compute the zt. Thus the computation of the eigenvectors of D + ZZT is backward stable. in our application it is sufficient to replace sign(ii) with sign(^).13. However.). Chapter 2]. In particular if we evaluate both expressions at the AJ.16) also justifies our replacing sign(^) by sign(z. Thus the computation VolII: Eigensy stems . Moreover.35).from (2. we discover that they are zeros of the second expression—i.

10. 5. 6.1 to n 13. Z{ = A n — u{ for j = 1 to i—1 4.11)] to the eigenvalues of D + zz1 that satisfy the interlacing inequalities (2. 8.180 CHAPTER 3. for i = 1 ton 2.j] = «M/IW[:.di)/(dj+i . 14. Si = Zi*(Xj . Q[iJ] = 2i/&-di) end for i 15.J]||2 16. 3. Zi = Zi*(Xj . 11. this algorithm returns approximations to the eigenvectors of D + ZZT in the array Q. end for j Algorithm 2.di) end for j Zi — sqrt(zi)*sign(zi) end Compute the eigenvectors for j — 1 ton for i . 9.2: Eigenvectors of a rank-one update o .di) end for j forj r j — i ton-1 7. flM:. Compute the modified update vector 1. THE SYMMETRIC EIGENVALUE PROBLEM Given sufficiently accurate approximations Xj [see (2.13). 12.di)/(dj .

Is it worth it? The answer is yes. The deflation also saves time. The updating algorithm has only one 0(n3) overhead: a single matrix-matrix multiplication to update the eigenvectors V. which allows small but meaningful elements in a graded matrix to be declared negligible. and so on. we solve two problems each at a cost of C(|) for a total cost of 2(7(|). Concluding comments Our sketch of the algorithm for spectral updating is far from a working program. For convenience. the deflation criteria—e. the count rises to O(n3).2. and in some applications there are many deflations.g. We can estimate the cost of a divide-and-conquer algorithm by counting the costs at each level of the recurrence. In particular. 2. Extra storage is required for the eigenvectors of the diagonalized problem D + zz1'. At the top level of the recurrence we solve one problem at a cost of C(n). The algorithm is not without its drawbacks. Now if PI and P2 are problems of the same type as P. Let P(n) denote a computational problem of size n. then on the subproblems of PI and PI.. The algorithm does not work well for graded matrices. (2. At the second level. we can repeat this dividing procedure on PI and P2.3) is taken into account. particularly the expensive accumulation of the plane rotations generated while solving the tridiagonal system. Thus for n large enough the updating algorithm must be faster. Thus the code for any implementation will be long and complicated compared with the code for forming A directly and applying the QR algorithm. However. and the routine to solve the secular equation is intricate. assume that n — 2m. it must be remembered that the diagonal updating is only a part of the larger Algorithm 2. there is a second aspect.6) — are couched in terms of the norms of D and z. We will return to this point in the next subsection. at least for sufficiently large matrices.SEC. The data structures to handle permutations and deflation are nontrivial. We begin with some generalities about the divide-and-conquer approach.1. Once the cost of computing VSPQ in (2. 2. However. Generalities The notion of a divide-and-conquer algorithm can be described abstractly as follows. At the third Vol II: Eigensystems . A DlVIDE-AND-CONQUER ALGORITHM In this subsection we will present a divide-and-conquer algorithm for solving the symmetric tridiagonal eigenvalue problem. A CLUTCH OF ALGORITHMS 181 of the eigenvectors is an O(n 2 ) process. and suppose that we can divide P(n) into two problems Pi(f) and ^(f) °f size f in such a way that the solution of P can be recovered from those of PI and PI at a cost of C(n). This recursion will stop after the size of the subproblems becomes so small that their solutions are trivial—after no more than about Iog2 n steps. We have seen that the O(n3} overhead in the QR algorithm is the accumulation of the transformation.

182 CHAPTER 3. after we have derived our algorithm. Then we can write Thus we have expressed T as the sum of a block diagonal matrix whose blocks are tridiagonal and a matrix of rank one. To show how to combine the eigensystems. We will return to it later. The idea behind our divide-and-conquer algorithm is to compute the eigensystems of TI and T<2 (which can be done recursively) and then combine them to give the eigensystem of T. and let T be partitioned in the form (Here s stands for splitting index.) We can also write this partition in the form Now let and so that TI is TI with the element ds replaced by ds — es and T-z is T% with the element ds+i replaced by ds+i — es. Summing up these costs. Derivation of the algorithm Let T be a symmetric tridiaeonal matrix of order n. let and . we get for the total cost of the divide-and-conquer algorithm Of course we must know the function C(n) to evaluate this expression. THE SYMMETRIC EIGENVALUE PROBLEM level. we solve four problems at a cost of C( J) for a total cost of 4C( J)—and so on.

Here are some comments. For example. the value of mindac should be the break-even point where the two algorithms require about the same time to execute. 2. Actually. In LAPACK it is set at 25.Then 183 The matrix D + zz1 is a rank-one modification of a diagonal matrix. whose spectral decomposition can be computed by the algorithm of the last section. It follows that is the spectral decomposition of T. Ideally. This reduces the work from n3 flam to |n3 flam. Then one can work upward combining them in pairs. so that for small matrices the QR approach is more competitive. as in (2. it can be shown that cuppen is also stable. • The recursive approach taken here is natural for this algorithm. bottom-up approach is possible.3 implements this scheme. a direct. • We have noted that the updating algorithm diagupdate may not work well with graded matrices. we abandon the divide-and-conquer approach and compute the decompositions directly via the QR algorithm. The code is not pretty. In practice. Algorithm 2. • In statements 16 and 17 we have taken advantage of the fact that V is block diagonal in forming VQ. • We have already observed that the method for updating a spectral decomposition has a great deal of overhead. for large matrices the ill consequences of using the wrong value are not very great. A CLUTCH OF ALGORITHMS be spectral decompositions of TI and T^. Vol II: Eigensystems . we would represent T in some compact form — e.. However. specified here by the parameter mindac. Consequently. one might initially partition the matrix into blocks of size just less than mindac.g. • For ease of exposition we cast the algorithm in terms of full matrices. We are now in a position to apply the divide-and-conquer strategy to compute the eigensystem of a symmetric tridiagonal matrix.SEC. • Because the algorithm diagupdate is backward stable.18). as we shall see. when the submatrices in the recursion fall below a certain size.3 inherits this drawback. Algorithm 2.

end cuppen Algorithm 23: Divide-and-conquer eigensystem of a symmetric tridiagonal matrix o .:] .3+l] 12.:])T 15. 1. s= \n/1\ 7. T[s. THE SYMMETRIC EIGENVALUE PROBLEM Given a symmetric tridiagonal matrix T.l] = r 2 [l. Q) 16. else 6.A2) 14.s] = f[s. Cuppen(f2.:} 18. A. Cuppen computes its spectral decomposition VAVT. f2 2= T[s+l:n. z. If the size of the matrix is less than the parameter mindac (min divide-and-conquer) the QR algorithm is used to compute its eigensystem. Compute A and V using the QR algorithm 5. Q) that returns the spectral decomposition D + zz1 = QAQ^. V) 2. V[a+l:n.184 CHAPTER 3.Vi) 10.l]--r[3. &!.V2*Q[s+l:n. A.s+l:n] 11.%>) 13. n = the order of T 3.Ai.:] 17.l:s] 8. T1[s.s+I] C 9. fi = T[l:s. It uses the routine diagupdate(D. s+1]. o. end if 19. y[l:«. Cuppen(T. diagupdate(D. if (n < mindac) 4. A.:] = Vi*9[l:5.s]-T[s. D = diag(Ai. f 22 [l. Cuppen(Ti.:] F2[1. ^ = (Vi[5. z.

Of the total of f n3flamwork | (i. the choice of the point at which one shifts to the QR algorithm (parameterized by mindaq} is not very critical. Thus after a two or three stages.e.3.17)] for the total cost of Algorithm 2. So little work is being done at the lower stages that it does not matter if we use an inefficient algorithm there. which effectively make part of Q diagonal. It is still O(n 3 ). This is essentially the cost of Cuppen exclusive of its cost of the recursive calls to itself. We see that our divide-and-conquer algorithm has not really conquered in the sense that it has reduced the order of the work. 2. A CLUTCH OF ALGORITHMS Complexity To apply the formula 185 [see (2. we must first determine the function C(n). This has the consequence that for large matrices. leaving \ of the work for the remaining stages. Thus. so that the divide-and-conquer approach becomes dramatically faster than the QR approach. Vol II: Eigensystems . With some classes of matrices the number of deflations grows with n. An examination of the algorithm shows two potentially significant sources of work: the call to diagupdate in statement 15 and the matrix multiplication in statements 16 and 17. But if it has not conquered. the algorithm benefits from deflations in diagupdate. it has certainly prospered. leaving only ^ of the total. We have already noted that the updating requires only O(n2) operations. The second stage doe | of this work..SEC. The order constant | is quite respectable — compara Gaussian elimination and far better than the QR algorithm. the computation is dominated by the matrix multiplication. and we may take It is now an easy matter to get a sharp upper bound on Ctotai. In addition to the low-order constant. while the multiplication requires |n3 flam.3 requires about |n3 flam.Specifically. In general after the &th stage only -^ o the total work remains to be done. Thus: Algorithm 2. ^n3flam)is done in the veryfirstst the recursion. the remaining work is insignificant compared to the work done by the higher stages.

Another possibility is to store the transformations that reduce A to T. 2. cannot be stored. . Reduce A to tridiagonal form T Find the selected eigenvalues of T Find the selected eigenvectors by the inverse power method applied to A Note that we compute the eigenvalues from T but the eigenvectors from A. and transform back.3. Thus we are forced to find the eigenvectors of A directly from A.186 CHAPTER 3. In this case it turns out that there are better algorithms than the ones proposed here. This means that it is possible to store band matrices that are so large that the matrix of their eigenvectors. which is of order n. Then A is a BAND MATRIX of BAND WIDTH 2p + 1 if Thus if p = 1. EIGENVALUES AND EIGENVECTORS OF BAND MATRICES We begin with a definition. Definition 23. How to compute small but well-determined eigenpairs of a general band matrix is at present an unsolved problem. The computation of a selected set of eigenpairs of a large symmetric band matrix is the subject of this subsection. often a few of the largest or smallest. THE SYMMETRIC EIGENVALUE PROBLEM 2. In outline our algorithm goes as follows. then A is pentadiagonal and has the form Band matrices are economical to store. then A is a tridiagonal matrix. For the most part our algorithms will be cast in terms of absolute tolerances relative to the norm of A. A special case of our algorithm is when the matrix A is already tridiagonal. Fortunately. and the reader is referred to the notes and references. the transformations will require too much storage. However. Unfortunately. find the eigenvectors of T. in many applications involving band matrices we are interested in only a few eigenpairs. a symmetric pentadiagonal matrix can be represented by 3n-3 floating-point numbers. if n is large. their complexity puts them beyond the scope of this book. 3. If p = 2. 1. Let A be symmetric of order n. For example. This means that small eigenvalues may be determined inaccurately.

in practice we must come to grips with the fact that such storage is not economical. More generally if B is the array in which the matrix is stored. Tridiagonalization of a symmetric band matrix We will now show how to reduce a symmetric band matrix to tridiagonal form. This step of the reduction is illustrated below: Vol II: Eigensystems . Here we consider another alternative that uses plane rotations and preserves the band size. When A is complex and Hermitian. The asterisks denote unused elements. We will illustrate it for a matrix of band width seven (p = 3). We begin by choosing a rotation in the (3. Instead such a matrix is usually stored in an array just large enough to accommodate its nonzero elements. 2.7)-elements. the band width grows as the reduction proceeds. The first scheme stores a band matrix in a (p+l)xn array as illustrated below for a pentadiagonal matrix. this is not just a rearrangement of the same numbers. a symmetric band matrix of band width 2p-fl has approximately (p+l}n nonzero elements on and above its diagonal. since the corresponding upper and lower elements are not equal but are conjugates of one another. l)-element (and by symmetry the (1. One possibility is simply to apply Algorithm 1. The application of this rotation introduce new nonzeros outside the band in the (7. The current standard is the LAPACK scheme — actually two schemes. so that eventually one is working with a full dense matrix.4)-element) of A.and (3. Specifically.4)-plane to annihilate the (4.SEC. then The other scheme stores the elements according to the formula Thus one scheme stores the upper part of A. zeros and all. There are many ways of doing this. it is easy to see that even if one takes into account the zero elements in the band matrix. whereas the other stores the lower part.1 for tridiagonalizing a dense matrix to the full band matrix. A CLUTCH OF ALGORITHMS The storage of band matrices 187 Although we will present our algorithms for band matrices as if they were stored in full arrays. However.3). If p <C n it would be a waste of memory to store it naturally in an array of order n.

as illustrated below: Thus we have annihilated the (4. l)-element.188 CHAPTER 3.and (1. These elements can be chased out of the matrix as illustrated below: .6).2).2)-elements. However.and (6.1). no new elements are generated. but introduces nonzeros in the (6.4)-elements while preserving the band structure. when we eliminate this element. This places new nonzeros in the (10. We are now ready to work on the (3.and (6.3)-plane annihilates this element.7)-plane to annihilate the elements outside the band.10)-elements. THE SYMMETRIC EIGENVALUE PROBLEM We now choose a rotation in the (6. A rotation in the (2.

Thus the total number of annlications of rotations to nairs of elements is We can convert this to multiplications and additions as usual. and we will not give it here. and so on. But for both algorithms the accumulation of the transformations requires O(n3) operations. 2. It is true that the reduction itself requires O(n3) operations as opposed to O(n 2 ) for the banded reduction. and hence the eigenvalues of A are moved by no more than quantities of order ||A||2£M (Theorem3.8. Chapter 1).SEC. A CLUTCH OF ALGORITHMS 189 Thus we have reduced the first row and column to tridiagonal form. The actual code is tedious and not very informative. To get an operation count for this algorithm. each zero introduced requires roughly n/p rotations to chase the element outside the band to the end of the matrix. An important implication of these counts is that if we can afford n2 units of storage to accumulate transformations we are better off using the standard Algorithm 1. Thus the reduction is truly an 0(n2) process. However. and so on. The key observation is that when an element is being chased out of the matrix it moves down the matrix in steps of length p. and the extra work in accumulating plane rotations. Similarly it requires (n-l)(p-l)/p rotations to process the second row and column. in this case E does not share the banded structure of A. then the third row and column. The computed tridiagonal matrix is orthogonally similar to A + E. The basic tool is a method for counting how many eigenvalues lie Vol II: Eigensystems . we see that each rotation is applied to 2(p + 1) pairs of elements (here we count the adjustment of the 2 x 2 block on the diagonal as two applications).1. This can be inferred from our example and easily confirmed in general. (n—2)(p—l)/p to process the third. Hence when the first row and column are being processed. where ||J?||2/||j4||2 is of the order of £M. Since we must zero out p— 1 elements in the first row and column. we will begin by counting plane rotations. tips the scales against the band algorithm. But it is symmetric. The algorithm is stable in the usual sense. On the other hand. Inertia We now turn to the problem of computing some of the eigenvalues of a symmetric tridiagonal matrix. since the reduction generates 0(n 2 ) rotations we cannot afford to save or accumulate the transformations when n is very large. as opposed to Householder transformations. The reduction now continues by reducing the second row and column. Hence the total number of rotations for this algorithm is about Remembering that we only operate on the upper or lower part of the matrix. the total number of rotations is roughly n(p-l)/p.

where v is the number of negative eigenvalues of A. Definition 2. Let X be the space spanned by the eigenvectors corresponding to the positive eigenvalues of A. Let A be symmetric. Specifically. By hypothesis. we can show that the number of negative eigenvalues of A is the same as the number of negative eigenvalues of (7T AU and therefore that the number of zero eigenvalues is the same for both matrices.5 (Sylvester. THE SYMMETRIC EIGENVALUE PROBLEM to the left of a point on the real line. Hence they have a nonzero vector in common—a contradiction. The proof is by contradiction. • Inertia and the LU decomposition Turning now to the computation of inertia. i. Then inertia (A) is the triplet (f. and TT is the number of positive eigenvalues of A. the sum of these dimensions is greater than the order of A.190 CHAPTER 3.e.4. we begin with a constructive theorem for determining inertia. Thus to count the location of the eigenvalues of a symmetric matrix with respect to an arbitrary point. it is much easier to compute than the eigenvalues themselves. Although inertia is defined by the eigenvalues of a matrix. Then Now let y be the space spanned by the vectors corresponding to the nonpositive eigenvalues of U^AU. it is sufficient to be able to count their locations with respect to zero. Let A be symmetric and U be nonsingular. Then It follows that X H Z — {0}. we have the following theorem. C is the number of zero eigenvalues of A. We begin with the mathematical foundations of this method. and let Z = Uy. Let A be symmetric. . Then Proof. But the dimension of X is the number of positive eigenvalues of A and the dimension of Z is the number of nonpositive eigenvalues of UTAU. transformations of the form U^-AU..XI lying above zero. (. Similarly. which are more general than similarity transformations. This suggests the following definition. The reason is that inertia is invariant under congruence transformations. Suppose that A has more positive eigenvalues than U^AU. Theorem 2. The number of eigenvalues lying above the point A on the real line is the same as the number of eigenvalues of A . Jacobi). TT).

A CLUTCH OF ALGORITHMS 191 Theorem 2. then'meTtia. and hence by the uniqueness of LU decompositions. Since A is nonsingular. Now the LU decomposition of a matrix can be computed by Gaussian elimination (see Chapter 1:3). the diagonal elements of U are nonzero. But if A is tridiagonal. in principle. Hence. It is well known that if the leading principal submatrices of A are nonsingular then A has a unique factorization A = LU into a unit lower triangular matrix L and an upper triangular matrix U (Theorem 1:3. written in the usual form: Since we will later be concerned with the inertia of T — XI as a function of A. Let A be symmetric and suppose that the leading principle submatrices of A are nonsingular. and the usual pivoting procedures do not result in a U-factor that has anything to do with inertia. are just the pivot elements in the elimination)..SEC. we will work with the matrix VolII: Eigensystems . Gaussian elimination requires pivoting for stability. we can compute the diagonals of U in a stable manner. Let us see how. D^U — £T. Proof. Then there is a unique unit lower triangular matrix L and a unique upper triangular matrix U such that Moreover if DU = diag(wn.3). • It follows that if we have an LU decomposition of A. we can compute the inertia of A by performing Gaussian elimination and looking at the diagonal elements of U (which. then we can read off the inertia of A by looking at the diagonal elements of U. Hence we can write A = LDu(DylU).1. • • . By symmetry (D^1 U)T(LDU) is an LU decomposition of A.6. Unfortunately. Hence A and DU are congruent and have the same inertia. so that A — LDuL^. ^22. The inertia of a tridiagonal matrix Consider a symmetric tridiagonal matrix T.w nn ).(A) = inert!a(DU). 2. in fact.

In particular.X + e[i-l]2/u 14.(A) requires a division by w 4 -_i(A).LeftCount + 1 fi 15.20) and a constant A the function LeftCount returns a count of the eigenvalues less than A. u = d [ i ] . u = d[l] . THE SYMMETRIC EIGENVALUE PROBLEM Given a tridiagonal matrix T in the form (2. = 2 to n 6.192 CHAPTER 3. A) 2. if (|w| < umin) 7. end for i 16.4: Counting left eigenvalues If we perform Gaussian elimination on this matrix we get the following recursion for the diagonals of the U-factor: 11. We will return to this point later. • The algorithm requires 2n fladd n flmlt and n fldiv. LeftCount — 0 3. if (u < 0) LeftCount . e. if (u < 0) LeftCount = LeftCount + 1 fi 5. 1. In applications where the count must be repeated for many different values of A. It turns out that we can replace ui-i(X) by a suitably small number. u = —umin 13. Here are some comments. for i:.4 counts the eigenvalues to the left of A. and we must specify what happens when ^^(A) is zero. • The method is stable. if(w>0) 8. it will pay to modify LeftCount to accept a precomputed array of the numbers t\. Algorithm 2.21). else 10. 12. end if end if The generation of w. u = umin 9. The parameter umin > 0 is a minimum value on the absolute value of the current t*t-( A) defined by (2. end LeftCount Algorithm 2.A 4. The choice of such a number is a matter of some delicacy and depends on how the matrix is scaled. LeftCount(d. it can be shown that the the computed sequence Wj(A) is an exact sequence of a perturbed matrix T(A) whose elements satisfy .

Vol II: Eigensystems . after k iterations. if k > ra. so that LeftCount(a) = 0 and LeftCount(b) — n. These results do not imply that unpivoted Gaussian elimination applied to a tridiagonal matrix is stable. we have isolated A m in an interval that is half the width of the original. There are two cases. b] halves at each step. then A m < c. First. If we rename this interval [a. 6]. Second. A m £ [c. then Am > c.~k\bo — O.a\ < 2.SEC. Set c — (a + 6)/2 and k = LeftCount(c). we choose the sign so that it is consistent with our previous adjustment of LeftCount Calculation of a selected eigenvalue Suppose that the eigenvalues of T are ordered so that and we wish to compute the eigenvalue A m . then the method must converge in about Iog2( |&o . We can do so by combining LeftCount with a technique called interval bisection. If we demand that this bound be less that eps. so that at the ith step we take This choice assumes that the matrix has been scaled so that ei/umin will not overflow.-. Algorithm 2.5 uses this procedure to find A m . • In replacing u with ±umin. • Except perhaps at the end. c]. A CLUTCH OF ALGORITHMS and 193 This result implies that the count will be correct unless A is extremely close to an eigenvalue of T — so close that in the presence of rounding error the two can scarcely be distinguished. A m G [a. 6]. Here are some comments. Consequently.«o I jeps) iterations. We begin by using Gerschgorin's theorem to calculate a lower bound a on \i and an upper bound b of A n . The stability concerns only the w z . the size of the interval [a. we can repeat the procedure until the interval width is less than a prescribed tolerance. if k < m. Consequently. the difference between the current a and 6 is bounded by \b . • The choice of umin depends on the scaling of T and how accurately we wish to determine the count. 2. where ao and bo denote the initial values.Q\. not the LU decomposition as a whole. This procedure is called interval bisection. One possibility is to make the tolerance relative to e. Consequently. In either case.

m) 2. FindLambda(d.194 CHAPTER 3. 11.20) and let the eigenvalues of T satisfy AI < A 2 < • • • < A n . 5.6 = d(l)+|c(l)| for i — 2 to n—1 a = imn{a. 16. 1.4). Find upper and lower bounds on \m a = d(l)|e(l)|. 20. 10. 12.d(i] — e(i—l)\ — \e(i)\} b = max{6. 19.e/(n) — \e(n—1))} 6 = max{6. 3.d(n) + |e(n—1))} Find \m by interval bisection while (\b-a\ > eps) c = (a+b)/2 if (c = a or c = 6) return c fi k = LeftCount(d. 9. c) 13. 8. if(fc<m) a—c else b= c end if end while return (a+6)/2 end FindLambda Algorithm 2. 6. e. THE SYMMETRIC EIGENVALUE PROBLEM Let the tridiagonal matrix T be represented in the form (2. 4.5: Finding a specified eigenvalue o . 14. 7. The function FindLambda finds an approximation to \m. 17. 15. e. d(i) + \e(i-l)\ + \e(i)\] end for i a = min{a. The quantity eps is a generic convergence criterion. 18. It uses the function LeftCount (Algorithm 2.

The computation of eigenvectors requires very accurate eigenvalues. A CLUTCH OF ALGORITHMS 195 • Statement 11 takes care of an interesting anomaly. the secant method. then the eigenvector corresponding to \k will be sensitive to perturbations in A. • The exact choice of eps depends on the accuracy required by the user. Therefore. Chapter 1]. the product of the w's it generates is the value of the characteristic polynomial at c. c must be equal to one of a and 6. But this is inefficient for two reasons.. • If the eigenvalue being sought is simple. in the process of finding one eigenvalue. It can also be used to find all the eigenvalues in an interval. 6]. bounds that can be used to speed up the bisection process.g.28). First the computation of the Gerschgorin bounds is repeated with each call to FindLambda. then eventually it will be isolated in the interval [a. and in this case a choice like will result in low relative error for all nonzero eigenvalues. Selected eigenvectors We now turn to the problem of computing the eigenvectors corresponding to a set of eigenvalues of a symmetric matrix A. and we will not be able to calculate it accurately [see (3. Second. special code adapted to the task at hand should be used. we will assume that they are accurate to almost full working precision. For definiteness we will assume that these eigenvalues are and that they are approximated by the quantities We will make no assumptions about how the fjtk were obtained. when more than one eigenvalue is to be found. The first thing we must decide is what we can actually compute. one generates bounds on the other eigenvalues. If the eigenvalue Afc is poorly separated from the other eigenvalues of A. Instead we will attempt to find an approximation Xk such that the normalized residual Vol II: Eigensystems . Although LeftCount does not produce a function whose zero is the desired eigenvalue. it is possible to get endpoints a and b that differ only in a single bit.SEC. If eps is too small. In that case. The function FindLambda can obviously be used to find a sequence of several consecutive eigenvalues. and without statement 11 the algorithm would cycle indefinitely. 2. At this point it may pay to shift over to an interation that converges more quickly—e. but because we are going to use inverse iteration to compute their corresponding eigenvectors.

kI)v — u. let us start with a vector u normalized so that 11 u \ \ 2 = 1. where ||£||2/||A||2 is equal to the scaled residual norm. we have It follows from (2. . say near the rounding unit. note that (2. if we take u normalized so that \\u\\2 = 1 as a starting vector for the inverse power step and let v be the solution of the equation (A — }J.3. Ifu is chosen at random. we have from (2.23) that Thus the effectiveness of the inverse power method depends on the size of the component y^uofu along yk.24) by Thus if n = 10. the estimate is only 100 times larger that 7€M.kl)~lu.22) that Now suppose that ^ is a good relative approximation to A^. Chapter 2]. Then if v — (A — jj. THE SYMMETRIC EIGENVALUE PROBLEM is small. and consequently we can estimate the bound (2. There remains the question of whether the inverse power method is capable of producing a vector with a sufficiently small residual.196 CHAPTER 3. A nice feature of this criterion is that it is easy to test. see the discussion following the theorem. so that for some modest constant 7. which says that xk is the exact eigenvector of A+E.9).000. Specifically. However.25) is likely to be an overestimate since it ignores the contribution of the components of u along the n—1 remaining eigenvectors of A. For the implications of this result. Then since |A^| < ||A||2. Chapter 2. then we can accept x^ — v/ \\v\\2 as a normalized eigenvector. The justification for this criterion may be found in Theorem 1. Once again. we can expect this component to be of the order l/^/n [see (1. Let yk be the normalized eigenvector of A corresponding to \k. then Thus if the quantity ||w||2/||A||2||v||2is small enough.

Since this u has had its components associated with the eigenvalues near ^ enhanced. we find that The vectors x\ and x2 have lost half the orthogonality we expect from eigenvectors of a symmetric matrix. A CLUTCH OF ALGORITHMS 197 If it turns out that the scaled residual is not satisfactorily small. In addition to having small scaled residual norms. However. By (3. we can expect this second iteration to do the job. these perturbations will move the computed eigenvectors by about 10~8. Moreover. Specifically. Unfortunately. if ll^i \\2 = 1. A little perturbation theory will help us understand what is going on here.28). we can always try again with u = v/\\v\\2. we perform this orthogonalization. we find that x^x2 = 7-10~18. Since the errors in the eigenvectors are independent of one another. as the following example shows. the residual norm for x2 is 2-10~15. we would like our computed eigenvectors to be orthogonal. For safety one might allow additional iterations. where Thus A has two very close eigenvalues near one and an isolated eigenvalue at two. The cure for this problem is to force x2 to be orthogonal to x\. so that VolII: Eigensystems . so that x\ and x2 we orthogonal to working accuracy.7. the vector is easily seen to have 2-norm one and to be orthogonal to x2. Since the residual norms are of order 10~15. The eigenvalues AI and A 2 are very close—10~8 apart. when we compute the inner product z T #2. 2. Example 2. Let us approximate the first two eigenvalues by and perform inverse iteration with each Hk to approximate the corresponding eigenvectors. there is a corresponding deterioration in the orthogonality between the two vectors. Chapter 1. The residual norms for the approximations xi and x2 are about 1-10~15 and 2-10~15. which is extremely good. the computed vectors are exact vectors of A perturbed by matrices of order 10~~15. A matrix A was constructed by generating a random orthogonal matrix Q and setting A = <2AQT.SEC. in our example. If. it is easy to lose orthogonality when two of the \k are close.

1..4).. We divide the //& into clusters according to some tolerance. We will also assume that we have bounds on the residual norms of the or z .198 CHAPTER 3. the routine returns a normalized vector v that is orthogonal to the columns of X and satisfies Unfortunately.. As we pass through a cluster we use the Gram-Schmidt algorithm to orthogonalize each vector against the previously computed vectors in the cluster. If a stable method is used to calculate v. Xk-i. we leave the process to a routine that orthogonalizes a vector u against the columns of X returning the result in v (which can be identified with u). The orthogonalization process can be adapted to handle a sequence of close eigenvalues. . Mfc of <? approximate eigenvalues and have determined normalized approximate eigenvectors x^-g+i. Since the details of the orthogonalization are tricky (for the full story see §1:4. Now the approximate eigenvector Xk nroduced bv esreorthne satisfies Hence It follows that Hence . with Xk to be determined. the orthogonalization complicates our convergence criterion. suppose that we have a group ^k-g+i •>• • • . THE SYMMETRIC EIGENVALUE PROBLEM the quality of x2 as an eigenvector has not been compromised by the orthogonalization process. Now suppose that we have computed v by the inverse power method with shift //& applied to normalized vector u. To see why. it satisfies where E is of the order of the rounding unit. Specifically.

g = 1 if (fjtk <= /**-! + groupsep) g = g+1 else g . r. end for z 23. rik=:'nk + (*7.SEC.26)]. for k = 1 to ra 5. end for k Algorithm 2. + lA*i-A«)bl)*l r i-Jb+ J | 22. p) else u — u/\\u\\2 end if it = 1 while (frae) Solve the system (A—/*fc/)v = w if (p > 1) gsreorthog(X[:. f = k-g+ltok-l k-1 20. 14.&-!]. 9. This algorithm returns in the columns of the array X approximations to the eigenvectors corresponding to the nk. 18. if (it > maxit) error fi 25.The algorithm uses the routine gsreorthog to orthogonalize a vector against a set of vectors [see (2. 2. w. 1. 8. if(Ar>l) 6. 19. 13. groupsep= ||A||oo*10~3 tol = 100*5gr/(n)*||A||00*€M maxit — 5. end while 26. 12. 16.fc-(jr-f !. w. A CLUTCH OF ALGORITHMS 199 Let Pi < V2 < ' ' ' < Vm be a consecutive set of eigenvalues of a symmetric matrix A with relative accuracy approximating the rounding unit eM. 7. u . 10.6: Computation of selected eigenvectors o Vol II: Eigensystems .k—l].v/p end if rfk = \\u\\2 + ||v|| 2 *>/(n)*ll^ll<x>*€M 4. v. 15. r. 2. 17. 21. />) else p = ||v||2. 11. w. 3.k—g+l.1 fi end if it = 1 u — a vector of random normal deviates if (g > 1) gsr£0r?/i0g(J*C[:. // = it + 1 24.

the choice of a starting vector.200 CHAPTER 3. On the other hand..g. See the notes and references. • The starting vector for the inverse power method is a random normal vector. • The algorithm uses the residual bound (2. destroys band structure. (The routine gsreorthog insures that the vectors within a group are orthogonal to working accuracy. the algorithm is not a reliable way of computing a complete set of eigenvectors of a small perturbation of the identity matrix.6 can be expected to work only because we have aimed low. but the same can be said of other choices. The matrix A — ^1 will in general be indefinite. Even as it stands Algorithm 2. with L of bandwidth p+l and U of bandwidth 2p+l. There is a certain common sense to our choices.27) to test for convergence.2. But the L and U factors it computes are banded.. Moreover. It should be stressed that Algorithm 2. if the criterion for belonging to a group is too small. diagonal pivoting. Unfortunately. it can and does fail on groups of eigenvalues that are separated by up to 103cM. Limited experiments show that the bound tracks the actual residual norm with reasonable fidelity. It is orthogonalized against the other members of the group to purge it of components that will contribute to loss of orthogonality. Thus the natural choice is Gaussian elimination with partial pivoting. There are several comments to be made about it. see statements 1 and 2).) • The while loop can be exited with a new vector or with an error return when the number of iterations exceeds maxit. • We have not specified how to solve the system in statement 15.F°r example. the convergence criterion.6 pulls all these threads together. For more see §1:3. . Thus the storage requirements for the algorithm are four times the amount required to represent the original symmetric matrix. THE SYMMETRIC EIGENVALUE PROBLEM is a bound on the residual norm for XkAlgorithm 2. resulting in loss of orthogonality between groups. It is intended for use on balanced matrices eigenvalues and uses absolute criteria to determine when quantities can be ignored (e. when to orthogonalize. • An absolute criterion is used to group eigenvalues for orthogonalization (see statements 1 and 6). the distance between successive groups can be small. It will not work on graded matrices. This method does not respect symmetry.4. which preserves symmetry.g. If we use too liberal a grouping criterion.6 involves a number of ad hoc decisions—e. and some form of pivoting is required for stability. and it is not clear that it can be modified to do so. we will perform a lot of unnecessary orthogonalizations. the clustering criterion.

It was later modified by Dongarra and Sorensen [66]. Givens [83. Finally. suggested that the algorithm can be made to work provided extended precision is used at certain critical points and noted that for IEEE standard arithmetic the extended precision arithmetic could be simulated.1994] noted that by recomputing the Z{ so that the A z are exact eigenvalues of the corresponding matrix the orthogonality could be maintained in the eigenvectors without resorting to higher precision arithmetic. beginning with a survey of eigenvalue updating problems by Golub [87. and Sorensen [30. An implementation of this method can be found in the LAPACK code xLAEDl. Inertia and bisection The fact that the inertia of a matrix is preserved by congruence was discovered by Sylvester [272. who was the first to find eigenvalues of tridiagonal matrices by bisection. posthumous]. 1954]. in unpublished communications. Bunch.1991] showed that more was required and gave an algorithm that worked with simulated extended precision on all IEEE compliant machines.4 to count eigenvalues. a matrix of the form Reduction to tridiagonal form The technique of reducing a band matrix to tridiagonal form is due to Schwarz [235].4.1857.1978] formulated the algorithm treated here and noted that there were problems with maintaining orthogonality among the eigenvectors. For an error analysis see [13].1853] and independently by Jacobi [129. Sorensen and Tang [247. A CLUTCH OF ALGORITHMS 2. The algorithm has been implemented in the LAPACK routine SLAEDO. The divide-and-conquer algorithm The divide-and-conquer algorithm is due to Cuppen [49]. Kalian. Nielsen. Gu and Eisenstat [100. 1973]. The stopping criterion in the secular equation solver insures that the procedure is backward stable.SEC. 2. Gu and Eisenstat [101] propose a different divide-and-conquer algorithm in which the critical step is computing the spectral decomposition of an arrowhead matrix — that is. NOTES AND REFERENCES 201 Updating the spectral decomposition This updating algorithm has a long history. Instead he used closely VolII: Eigensystems . For an analysis of methods for solving the secular equation see [164]. did not use Algorithm 2.

jk)-element of Ak+i = R^A^Rk is zero. The Handbook program bisect [15] appears to be the first to use the diagonal elements of U to count eigenvalues. In 1846 [128]. Jacobi regarded the algorithm now named for him as a preprocessing step. For working code see the Handbook routine tristurm [305. given a current iterate A^ and a pivot element a\ '• . he applied the same idea to reduce the symmetric eigenproblem to a diagonally dominant one.202 CHAPTER 3. a rotation Rk in the (ik. In other words.1845]fr"8*proposed this method as a technique for reducing the normal equations from least squares problems to diagonally dominant form. There are bisection codes in EISPACK and LAPACK. In his Ph. if we choose (ik. If T — XI has the LU decomposition LU. not as a complete algorithm in itself.3. Jacobi's method Jacobi's method for the solution of the symmetric eigenvalue problem uses rotations to annihilate off-diagonal elements of the matrix in question. If we let r| be the sum of squares of the diagonal elements of Ak and <r| be the sum of squares of the off-diagonal elements. THE SYMMETRIC EIGENVALUE PROBLEM related Sturm sequences. after which it could be readily solved by what is now called the point Jacobi iteration [285. Thus. §2. The polynomials^(A) are called a Sturm sequence (see [122.1]. after which the eigenvectors could be found by an iteration based on a perturbation expansion. thesis Dhillon [65] has described an O(n2) to compute all the eigenvalues and eigenvectors of a tridiagonal matrix. The thesis also contains a review and critique of previous attempts to use the inverse power method to compute eigenvectors and an extensive bibliography. then and Hence. then the product pk( A) = wii • • • Ukk is the determinant of the leading principal minor of T — A/. although the authors justified their use by an appeal to Sturm sequences. j'^-plane is determined so that the (»*. then it is easy to show that r| -» \\A\\p. Specifically. Jacobi [127. Specifically. Unfortunately. they are prone to overflow and underflow.jk) so that o?^fc is maximal. .4] for definitions and properties).6 is included to illustrate the problems of implementing a power method. Ak approaches a diagonal matrix. Eigenvectors by the inverse power method Algorithm 2. §3. Clearly. the p/t(A) change sign when the u^ do. This program also saves information from previous counts to speed up the convergence of the bisection procedure.D. n/18] or the LAPACK routine SSTEIN.

First. 3.4. This cyclic Jacobi method considerably complicates the convergence proofs. In §3. its asymptotic convergence is at least quadratic. Much of this material has been treated in detail in §1:1. It has great natural parallelism. von Neumann. in which all off-diagonal elements greater than a given threshold are annihilated until there are none left above the threshold. a search for a maximal element is too expensive on a digital computer. The situation with multiple eigenvalues and clusters of eigenvalues is considerably more complicated. as we shall see. and at one time it and its variants were considered candidates for systolic arrays — a class of computers that never realized their promise. For more on this topic see [59. including the perturbation of singular values and vectors.SEC. Like other algorithms that proceed by rotations. and many results will be presented without proof.4. after which the process is repeated with a smaller threshold. 3. and Murray in 1949 (also see [85]). it was proposed to sweep through the elements of the matrix in a systematic order.172].1. Jacobi's method was the algorithm of choice for dense symmetric matrix. there is an analogue of the QR algorithm to solve the singular value problem. (For an elegant implementation by Rutishauser— always a name to be reckoned with — see the Handbook code [226]. The main purpose of this section is to describe this algorithm.110. The QR algorithm itself is treated in §3. In particular. THE SINGULAR VALUE DECOMPOSITION 203 According to Goldstine and Horowitz [84]. though it may be the algorithm of choice in special cases. THE SINGULAR VALUE DECOMPOSITION In this section we will treat methods for computing the singular values and vectors of a general nxp matrix X. This method. For more on these topics see [75.2 we will show how the singular value decomposition can be reduced to the solution of a symmetric eigenvalue problem.) But since then it has not fared well.298]. BACKGROUND In this subsection we will present the background necessary for this chapter. For definiteness we will assume that n > p and that X is real. Jacobi's method works well with graded matrices and matrices with small. We will begin with the mathematical background. The subsequent literature was dominated by two aspects of the convergence of the method.3 of this series. But no one can say for sure what the future holds for this bewitching algorithm.107.3 we will begin our development of the QR algorithm by showing how to reduce a general matrix to bidiagonal form. Consequently. has limitations that render it unsuitable as a general-purpose routine. Vol II: Eigensystems . if the matrix has no multiple eigenvalues. In §3.171. The singular value decomposition is closely allied to the spectral decomposition. the method was rediscovered by Goldstine. Until it was superseded by the QR algorithm. Second. An interesting alternative to the cyclic method is the threshold method [212]. 3. well-determined eigenvalues—at least when the matrix is positive definite.

• It often happens that n > p.2) is conventional and will be assumed unless otherwise stated.From this it is easily established that: . They satisfy and Moreover. As an alternative we can set and write This form of the decomposition is sometimes called the singular value factorization. • The columns of are called left and right singular vectors ofX.204 CHAPTER 3. \\X\\2 = \\Z\\z. • The numbers <jj are called the singular values of X. The nondecreasing order in (3. • Since the 2-norm is unitarily invariant. where n > p. in which case maintaining the nxn matrix U can be burdensome.1. LetX£Rn*p. Theorem 3. Then there are orthogonal matrices U and V such that where with Here are some comments on this theorem. THE SYMMETRIC EIGENVALUE PROBLEM Existence and nomenclature The following theorem gives the particulars of the singular value decomposition. We will use the notation &i(X) to denote the ith singular value of X.

SEC. Sometimes they cannot be ignored [see (3. the left singular vectors corresponding to nonzero singular values are uniquely determined by the relation Xv{ = OiU{. The right singular vectors are the eigenvectors of X^X. However. but the space they span is. The right singular vectors corresponding to a simple singular value (one distinct from the others) are unique up to a factor of absolute value one. We will denote it by • Since the Frobenius norm is unitarily invariant. From the definition of the 2-norm it follows that 205 • Later it will be convenient to have a notation for the smallest singular value of X. the matrix XXT has n—p additional zero eigenvalues.17)]. The right singular vectors corresponding to a multiple singular vector are not unique. THE SINGULAR VALUE DECOMPOSITION The 2-norm of a matrix is its largest singular value. it follows that This shows that: The squares of the singular values X are the eigenvalues of the CROSSPRODUCT MATRIX XTX. VolII: Eigensystems . Once the right singular vectors have been specified. and we will call them honorary singular values. An analogous result holds for XX1-. Uniqueness The singular value decomposition is essentially unique. when n > p. it follows that In other words: The square of the Frobenius norm of a matrix is the sum of squares of its singular values. Relation to the spectral decomposition From (3.1) and the fact that U is orthogonal. 3. The singular values themselves are unique.

206 CHAPTER 3.2. However. Let the singular values ofX be arranged in descending order: Let the singular values ofX = X + E also be ordered in descending order Then Thus singular values. Theorem 3. let us partition the matrices of left and right singular vectors in the forms . are perfectly conditioned. this does not mean that perturbations of small singular values have high relative accuracy.9. First-order perturbation expansions The perturbation of singular vectors is best approached via first-order perturbation theory.) Assuming that n > p. Theorem 33. Chapter 1. Let the singular values ofX be ordered in descending order: Then and An immediate consequence of this characterization is the following perturbation theorem for singular values. THE SYMMETRIC EIGENVALUE PROBLEM Characterization and perturbation of singular values Given the relation of the singular value and spectral decompositions it is not surprising that the min-max characterization for eigenvalues (Theorem 3. Chapter 1) has an analogue for singular values. like the eigenvalues of a symmetric matrix. See Example 3. Let <TI ^ 0 be a simple singular value of the nxp matrix X with normalized left and right singular vectors u\ and v\.5. (Here we have discarded the conventional downward ordering of the singular values.

their product must eventually become negligible.SEC. where 0i is assumed to go to zero along with E. and hi are also assumed to go to zero along with E. as above. Then we can write where (p\ = u^Eui. THE SINGULAR VALUE DECOMPOSITION where U^ and Vi have p—l columns.gs. we get The second row of (3.. eJ2 = u^EV2. Now consider the relation From the first row of this relation. 3.. Then 207 where £2 = diag(<j2.7) we have Ignoring higher-order terms. we have Since both f^2 and /i22 go to zero with E.7) gives us VolII: Eigensystems . We will seek the singular value of UTXV in the form a\ + 0i. etc. where E is presumed small. . crp) contains the singular values of X other than a\.where the g-2. The singular vectors corresponding to a\ in the matrix U^XV are both ei (of appropriate dimensions). and we can write From the third row of (3. Now let X — X + E.. We will seek the corresponding left and right singular vectors of t/T^V in the forms (1 g^ p3 > ) T and(l /iJ)T.

Hence the quantities Q\. g-z.. Let the matrices of left and right singular vectors be partitioned in the form where Ui and V<2 have p— 1 columns and let S2 . Then and .ui. . Theorem 3. If we transform these approximations back to the original system. and the products we have ignored in deriving our approximations are all 0(||^||2). THE SYMMETRIC EIGENVALUE PROBLEM or This expression is not enough to determine gi and hi.. and h^ are all 0(||E||). Let a\ > 0 be a simple singular value of X £ Rnxp (n > p) with normalized left and right singular vectors u\ and v\. we have the following theorem. </3. <rp).4..U?XV2 = diag(<7 2 .9) and (3.208 CHAPTER 3.10) can be readily solved for gi and hi to give the approximations and A simple singular value and its normalized singular vectors are differentiable functions of the elements of its matrix. Let a\.But from the second row of the relation we get The approximations (3. and vi be the corresponding quantities for the perturbed matrix X = X + E.

5. Since we will be concerned with angles between vectors and orthogonal transformations do not change angles.It follows that Vol II: Eigensystems . /i2. The problem is that singular values are not differentiable at zero. Now the normalized perturbed right singular vector is By Lemma 3. We will begin with the right singular vector. THE SINGULAR VALUE DECOMPOSITION Here are two comments on this theorem. the sine of the angle between this vector and ei is Thus to get a bound on the angle between ei and the vector (3.12) can be written in the form 209 In this form it is seen to be an analogue of the Rayleigh quotient for an eigenvalue (Definition 1. Chapter 2). and <foi represent corresponding components of /*2.12. The condition of singular vectors We can use the results of Theorem 3. • The expression (3.13)—or equivalently between v\ and v\ — we must obtain a bound on ||/&2 \\2Consider the approximation (3.5) and (3. Because £2 is diagonal we may write it out component by component: Here 77. and 02 is the corresponding diagonal element of £2. |0i2|}.11) for h^. and its absolute value is bounded by max{|</>2i|.6). For more on the behavior of zero singular values.We can rewrite this equation in the form Since a\ and a<2 are positive. the singular value of a scalar is its absolute value.4 to derive a condition number for a singular vector. 3.SEC. we may work with the transformed matrices in (3. see the notes and references. Chapter 1. <£i2. For example. the fraction in this expression is a weighted average of <j>i2 and -<(>2i • Hence it must lie between fa and -<foi. and /2i. • The hypothesis that 0\ be nonzero is necessary.

28). and it plays a role analogous to sep in the perturbation of eigenvectors of Hermitian matrices [see (3.2.e. THE SYMMETRIC EIGENVALUE PROBLEM where Since U<2 and v\ are orthonormal. the additional zero eigenvalues of XX^-. Chapter 1]. Moreover.21C CHAPTER 3. however.. Specifically: Let where 6 is defined by (3. Similarly. the quantity 11 h2 \ \ 2 is the sine of the angle between Vi and vi. The number 8 is the separation of o\ from its companions. 3. THE CROSS-PRODUCT ALGORITHM In (3. if n > p. if we know V we .8).14). Now up to terms of order 11E \ \ 2. which we seek in the form (1 g£ g% ) .15). We can do this by adjusting 6 to take into account the honorary zero singular values — i.15). Then Turning now to the left singular vectors.We can bound \\g-2\\2 as above. we see that the sine of the angle between HI and u\ is essentially x/H^lli + ll^slli. Hence: Let 6 be defined by (3. ||/12|| < \\E\\* Hence from (3. Then Thus 6 1 (or S0 1 ) is a condition number for the singular vectors corresponding to the singular value a\. we must take into account g3 defined by (3.4) we have seen that the eigenvalues of the cross-product matrix A = XTX are the squares of the singular values of X and that columns of the matrix V of the eigenvectors of A are the right singular vectors of X.

The computation of Up involves only matrix multiplication. which for defmiteness we will call the cross-product algorithm. Nonetheless. This algorithm has some attractive features. THE SINGULAR VALUE DECOMPOSITION 211 Let X G R n x p (n> p). and the first p left singular vectors of X.has fewer and fewer accurate digits. Inaccurate singular values The first problem is that we lose information about the singular values in passing from X to A. E = AJ FormA = Jf T *X Compute the spectral decomposition A = VAVT Up = X*V Normalize the columns of Up Algorithm 3.1: The cross-product algorithm o can recover the first p columns of U via the formula XV = UPE. 2.3).. Let us first consider what happens to the singular values when we round X. By Theorem 3. This amounts to replacing X by X — X + E. All this suggests Algorithm 3.SEC. to expect any accuracy in crt. is the ith singular value of-X t then the relative error in <j{ satisfies As <j{ gets smaller. Chapter 1]. let us now assume that we compute A — X^-X exactly and then round it to get A (we could hardly do better than that VolII: Eigensystems . 5.1. the algorithm has severe limitations. if <7. the perturbed singular value <7.we must have Turning now to the cross-product method. which is cheaper than accumulating transformations. «t becomes larger and cr. 1. Otherwise put. 3. 4. We already have algorithms to calculate the spectral decomposition. 3. By the time K.3. the right singular vectors. Specifically. this perturbation can change the singular values of X by as much as (?i€M. This algorithm computes the singular values. and it is important to understand what they are. The formation of A is cheaper than the reduction to bidiagonal form that we will propose later—much cheaper if X is sparse. where Here eM is the rounding unit for the machine in question [see (2. is of the order e"1. will be totally inaccurate.

-.212 CHAPTER 3. < e^1 to expect any accuracy in A.000009260909809e-06 1. if €M = 10~16 then any singular value satisfying ftz > 108 will have no significant figures when represented as an eigenvalue of A—even though it may have as many as eight significant figures as a singular value of X.000000000014129e-06 9.5.OOOOOOOOOOOOOOOe+00 1. THE SYMMETRIC EIGENVALUE PROBLEM in statement 1 of Algorithm 3. A matrix X was generated in the form X = UDVT. Example 3.18).999999970046386e-09 VA l.OOOOOOOOOOOOOOle-02 1.000000000000072e-02 1. An example will illustrate the above considerations. Thus we should not expect the cross-product algorithm to work well if the ratio of the largest to the smallest singular value of X is too large. when KP > eM 2 . a l. it is seen that the errors increase as predicted by the analysis above. In particular we must have Ai/A. We now note that A 2 = of.177076334615305e-08 Since €M = 10~16. it is possible for Xp to be negative. We can show as above that the eigenvalues of A and A satisfy This bound has the same interpretation as (3.1). .000000000000487e-04 1.OOOOOOOOOOOOOOOe+00 l. _i_ Although it is not illustrated by this example. For example.000000001992989e-04 1. Consequently the condition Ai/A z < c"1 is equivalent to This says that the range of singular values that the cross-product algorithm can compute with any accuracy has been severely restricted. where U and V are random orthogonal matrices and The singular values ofX and the square roots of the eigenvalues of A = X^X were computed with the following results. so that the cross-product algorithm produces an imaginary singular value.

The singular vector in A is more sensitive to rounding errors than the same vector in X. VolII: Eigensystems .20). then Since \\E\\2 < <7ieM» we have A similar analysis for A shows that where V{ is the ith eigenvector of A. From (3. so that right-hand sides of the bounds (3. 3. It is a curious and useful fact that the singular vector corresponding to the smallest singular value can be quite accurate. Note. = of. THE SINGULAR VALUE DECOMPOSITION Inaccurate right singular vectors 213 We can perform a similar analysis to show that the right singular vectors of X can be less accurately represented in A. the bound for the cross-product algorithm deteriorates more rapidly. Noting once again that A. Specifically.20) are approximately Since K P _I = 1. If Vi denotes the singular vector of the rounded matrix.20) has the smaller denominator.SEC. suppose v\ = ap-\ while ap is small. we find that KP is much greater than K P _I. that as ap-\ drifts away from a\.16). we see that the perturbation of Vi depends on the separation of O{ from its nearest neighbor—call it aj. the vector vp is well determined by both algorithms. (3. consider the effects of rounding X on the left singular vector vt. however. although the small singular value ap is ill determined. we see that when K{ and KJ are large.19) and (3.19) and (3. Then with i = p and j = p— 1. Specifically. The common sense of these bounds is that the subpace corresponding to the large singular values is well determined and hence its orthogonal complement—the space spanned by vp — is also well determined. even when the singular value itself has no accuracy. we can rewrite this bound in the form If we compare (3.

be very accurate. and assume that The cross-product algorithm computes the zth left singular vector by forming Wi = Xvi and normalizing it. Let V{ be the ith right singular vector computed by the cross-product algorithm. if KP is not too large.21) will be about 10~4. if V{ is computed from the cross-product algorithm.214 CHAPTER 3. although the cause of the loss of accuracy is different. For example. if £M = 10~16 and Ki = 106. However. e can be quite large and have a devastating effect. as we have seen. For more see the notes and references. Now W{ = Xvi + Xei + Evi = W{ + X e{ + Ev{. then the computed singular value ap may be satisfactory. the right singular vectors vp can. as previously. Normalizing it will not restore lost accuracy. THE SYMMETRIC EIGENVALUE PROBLEM Inaccurate left singular vectors The left singular vectors computed by our algorithm will also be less accurate. Perhaps more important. For example. Hence the normwise relative error in w^ is Thus. wz. we have \\Wi\\2 = &{.will be computed inaccurately if <rz is small. —call it Wi — satisfies where Here 7 is a constant that depends on the dimensions of X. We have not specified the size of e in our analysis because the technique for computing left singular vectors is sometimes used independently with accurate V{ (c = eM). and the computed w. Even when KP is large. then e in (3. Hence Since Wi = <J{Ui. Assessment of the algorithm The above analysis shows that the cross-product algorithm is not suitable as a generalpurpose algorithm. will have no accuracy at all. Now an elementary rounding error analysis shows that the computed w. . the bounds from the analysis suggest when the algorithm can be used.

we determine a Householder H transformation to introduce zeros in the hatted elements in the following Wilkinson diagram: Then H X has the form We next choose a Householder transformation K such that HXK has zeros in the hatted elements of (3. its efficient implementation requires that the matrix be reduced to a simpler form. 3. Like its counterparts for general and symmetric matrices. For the first step. THE SINGULAR VALUE DECOMPOSITION 215 3.and postmultiplying X by Householder transformations.23) to give a matrix of the form Vol II: Eigensystems . which we will write as The reduction As usual let X be an nxp matrix with n > p. REDUCTION TO BIDIAGONAL FORM The classical algorithm for computing the singular value decomposition is a variant of the QR algorithm. The reduction to bidiagonal form proceeds by alternately pre. In this case the form is an upper bidiagonal form.3.SEC.

17. 24. a.fi Accumulate U V=( /' ) \Un-p. V) Reduce X 2. k+l:p] = X[k+l:n. 13. fork = Hop if (k < porn > p) housegen(X[k:n. e. 1. 14. and the transformations from the right in the pxp matrix V. BiReduce reduces X to the bidiagonal form (3.:7i. k+l:p] = X[k:n. the array X contains the generators of the Householder transformations used in the reduction. 23.6*aT end for k end BiReduce Algorithm 3.22).k+l:p] = bT a = X[k+l:n. k+l:p] . k+l:p] . 26.?A:-M:ri aT = 6T*F[Ar+l:p. n—1} to 1 by —1 a = X[A. 4.k+l:p]*b X[k+l:n. The transformations from the left are accumulated in the n xp matrix U.py 9.Ar] 6T = aT*C/[^:n.fe+l:p]= V[A?+l:p. 11. 8. 5. 29. d. On return. 15. 18.ri. it(k<p-l) for k — min{p. 6.k+l:p] X[k:n. 28. 21. 22. 10. 30. 6T. 1. d[k}) X[k:n. THE SYMMETRIC EIGENVALUE PROBLEM Given an nxp matrix X with n > p. 16.p] if(n = p)rf[p] = A'[p. BiReduce(X. 27. Ar+l:p] . 6T = A"[A. A?:p] .a*6T end if housegen(X[k. 12.k] = a b1 = aT*X[k:n. 20.216 CHAPTER 3.^:p] ?7[jfc:ra.a*6T end if end for k e\p-l] = X\p-l.k].a*6T end for A. k:p] = C/"[/^:n. 3. k+I:p]. Accumulate V V = Ip 25.2: Reduction to bidiagonal form fork = p-2tolby-l . e[k]) X[k. U. A?+l:p] F[fc+l:p. 19.

It alternately scales the rows and columns of the bidiagonal matrix. Here are some comments. real. performs the reduction to real form. THE QR ALGORITHM In this subsection we are going to show how the QR algorithm can be used to compute the singular values of a bidiagonal matrix B. Instead it saves the generating vectors in X and accumulates the transformations after the reduction beginning with the last transformations.2. Algorithm 3. • If we require the effect of the full matrix U. and the operation count will be reduced if the rotations are real. We can insure this by transforming the complex bidiagonal form into a real form. we can save the transformations that are generated and apply them whenever we need to form a product of the form Ux. 3.4.2 requires 2np2 — -p3 flam for the reduction of X. • The algorithm is backward stable in the usual sense. 3. it is about |p3 flam. 2:p\. which is an analogue of Algorithm 1. o 2np2 — p3 fl flam for the accumulation of U. What distinguishes the bidiagonal singular value problem from the other problems we have considered is that small relative changes in the elements of B make only small relative changes in the singular Vol II: Eigensystems .2. Algorithm 3. When n >• p this part of the computation becomes overwhelmingly dominant. THE SINGULAR VALUE DECOMPOSITION 217 The reduction proceeds recursively on the matrix X[2:n. the tridiagonal form will also be complex. Real bidiagonal form When X is complex. Thus if n > p the total work is about 4np2 flam. the algorithm sets the stage for the subsequent computation of the singular value factorization (3.4.2 implements this reduction. the row scalings making the di real and the column scalings making the e. • As in Algorithm 2.3). • By accumulating only the first p columns of the transformations from the left.3. • The first column of the matrix V is ei. If we accumulate the entire U. If n = p. Chapter 2. This saves some operations. Our subsequent manipulations of this form will use plane rotations. the operation count rises to 2n2p—np2. Algorithm 3. • In deriving an operation count for the algorithm. it is convenient to divide the labor.SEC. p3 flam flam for the accumulation of V. the algorithm does not accumulate the transformations as they are generated. This fact will prove important in §3.

The transformations are accumulated in U and V. THE SYMMETRIC EIGENVALUE PROBLEM This algorithm takes the output of Algorithm 3.3: Complex bidiagonal to real bidiagonal form .218 CHAPTER 3. Algorithm 3.2 and transforms the bidiagonal matrix to real bidiagonal form.

SEC. this procedure is a variant of the cross-product algorithm. we can mimic a QR step on BTB while working with B.6). Chapter 2.3. 1. this column is easily computed: Thus the cosine and sine of the initial rotation PI 2 are determined by the numbers d\ a2 and d\ei.2)-element of (C . The solution provides us with the wherewithal to describe a back search in the spirit of Algorithm 3. Throughout this subsection: B will be an upper bidiagonal matrix represented in the form The bidiagonal QR step The matrix C = B^B is easily seen to be symmetric and tridiagonal. we need the first column of C. Consequently. Since B is bidiagonal. Unfortunately. in which case the problem deflates. To implement this procedure. determine a plane rotation PI 2 such that the (1.<72/)Pi2 is zero. 3. 2. THE SINGULAR VALUE DECOMPOSITION 219 values of B. some scaling should be done to prevent overflow and underflow. Given a shift <r. Determine an orthogonal matrix U and an orthogonal matrix V whose first column is a multiple of ei such that B = U^BP^V is bidiagonal. Since these numbers are composed of products. We then treat the problem of when the superdiagonal elements of B can be considered negligible. We begin with the basic QR step and then show how to compute the shift. which we have shown to have substantial drawbacks. This means that we must take care that our convergence and deflation procedures do not introduce large relative errors. Chapter 2. we could apply the tridiagonal QR algorithm to B^B to find the squares of the singular values of B. Since the repeated application of this algorithm (with suitable shifts) will cause C P _I )P to converge to zero. However. shows that C is the matrix that would have resulted from applying one step of the QR algorithm with shift cr2 to C. we may expect to see e p _i also converge to zero. Vol II: Eigensystems . the reasoning following (3. Since C = BTB = VTP?2BTBP12V = VTP?2CP12V. The procedure goes as follows.

Algorithm 3. However. • The accumulation of the transformations accounts for most of the work. In practice the QR step will not be applied to the entire matrix.2)-plane to get the matrix The bulge has now moved to the (1. Specifically: Algorithm 3. The entire procedure is illustrated in Figure 3. THE SYMMETRIC EIGENVALUE PROBLEM We could use by Algorithm 3. Here are some comments.220 CHAPTER 3. l)-element. at least when p is large.4 requires The step is backward stable in the usual sense.3)-element.3)-plane: The bulge is now at the (3. the matrix BPi2 has the form Note that BP bulges out of its bidiagonal form at the hatted (2. . We can remove this bulge by premultiplying by a plane rotation U\i in the (1.2) element.1. We can eliminate it by postmultiplying by a rotation ¥23 in the (2.4 implements a QR step between rows il and i'2.2 to reduce PuB to bidiagonal form. We can continue in this manner until the bulge is chased out the bottom of the matrix. Specifically. the fact that B is bidiagonal can be used to simplify the algorithm.

Algorithm 3. The transformations are accumulated in U and V. THE SINGULAR VALUE DECOMPOSITION 221 Given an upperbidiagonal matrix represented as in (3.24) and a shift cr.4: Bidiagonal QR step Vol II: Eigensystems .SEC. BiQR performs a QR step between rows il and i2. 3.

THE SYMMETRIC EIGENVALUE PROBLEM Figure 3. Theorem 3. Let Then the largest singular value ofT is given by the formula If T •£ 0.222 CHAPTER 3. then the smallest singular value is given by Proof. A reasonable choice is to take a to be the smallest singular value of this matrix.4 is generally computed from the trailing principal submatrix B[il—l:ilj il—l:il].6. The following theorem gives an explicit formula for this singular value.1: A bidiagonal QR step Computing the shift The shift a in Algorithm 3. The eigenvalues of the matrix .

are roots of (3.6 not only produce singular values to high relative accuracy. we must decide when we can set a superdiagonal element to zero.26) follows from the fact that <rmax<7min = \pq\. and hence 0"max and cTmin are the largest and smallest singular values of T. Fortunately. 3. there is a real danger of compromising the relative accuracy of the singular values. g. Since this is an absolute change. • In practice the formulas must be scaled and rearranged to avoid overflows and harmful underflows. The characteristic equation of this matrix If we set then it is easily verified that Thus cr^ax and cr^. q. and r. the singular values of T can be computed by inspection. Small relative perturbations in the elements of a bidiagonal matrix cause only small relative perturbations in its singular values. • If at least one of p. the following result can be used to tell when we can safely ignore superdiagonals. which is not greater than crmax. Unfortunately. and r are zero. THE SINGULAR VALUE DECOMPOSITION 223 are the squares of the singular values of T. This is a special case of an important fact about the singular values of bidiagonal matrices. Negligible superdiagonal elements The formulas in Theorem 3. • Here are three comments on this theorem.27). The formula (3.SEC. For more see the notes and references. but they are themselves insensitive to small relative perturbations in the quantities p. Let the quantities TV be defined as follows. VolII: Eigensystems . in testing convergence. • The formulas yield approximations to the singular values that have high relative accuracy.

Some of these are discussed in the notes and references.e. which searches back for a block in which the problem deflates. it works well on graded matrices. Nonetheless. If the singular value in question is isolated and e is sufficiently small. provided the grading is downward. A reasonable value is the rounding unit.224 It can be shown that CHAPTER 3. il = i'2). the sequence of rt. Back searching We can combine (3. Note that whenever the problem deflates (i.7 (Demmel-Kahan). is contained in the following theorem. there are further embellishments that can improve the algorithm.30) the parameter tol is approximately the relative error we can expect from setting the current superdiagonal to zero. By (3.is restarted. Let and consider the intervals If<?i lies in the connected union ofm intervals Ik that are disjoint from the other intervals. Let c?i and <j{ denote the singular values ofB and B arranged in descending order. the real significance of the number r. then To see what (3. Theorem 3.24). and for fixed j let B be the matrix obtained by setting ej to zero. then the relative error is bounded by e. then e±mc = 1 ± me. . Although no theorems can be proved. take the exponential to get If me is small..7 as it proceeds backward through the matrix.7 to get Algorithm 3. THE SYMMETRIC EIGENVALUE PROBLEM However.29) means. It follows that Thus the result of setting ej to zero can introduce a relative error of no greater than me in any singular value of B.28) and the results of Theorem 3. It generates the rt from Theorem 3. Final comments The algorithmic pieces we have created in this and the previous section can be assembled to produce a reliable algorithm for computing the singular values of a general matrix. Let B be a bidiagonal matrix in the form (3.5.

12 < t such that one of the following conditions holds.SEC. Algorithm 3.5: Back searching a bidiagonal matrix VolII: Eigensystems .24). a tolerance tol. THE SINGULAR VALUE DECOMPOSITION 225 This algorithm takes a bidiagonal matrix B represented as in (3. 3. and an index i (1 < t < p) and determines indices /7.

5.4.1. When n > p. THE SYMMETRIC EIGENVALUE PROBLEM Let X G R nxp .226 CHAPTER 3. one pays a high price for the accumulation of plane rotations in U.1874]. bringing the total operation count to 3np2 flam. Algorithm 3. however. which requires 0 (p3) operations. • The matrix Q will generally be computed as the product of the Householder transformations used to reduce X to triangular form (§1:4. • When n > p. Missing proofs in §3. • The computations can be arranged so that Q and then U overwrites X. we store the transformations themselves and retain the matrix W. Algorithm 3. Perturbation theory Parts of Theorem 3.2). An alternative is to first compute the QR factorization of A. where the expansion of the singular value is carried out to the second order. For ordinary singular values the second-order term makes no essential difference. Specifically. the QR decomposition of X can be computed in Inp2 flam while the generation of U requires np2 flam. The price to be paid is that one must access that basis through subprograms that manipulate the Householder transformations. statement 2. where n > p. 3.1 can be found in §1:1. A HYBRID QRD-SVD ALGORITHM The algorithm sketched in the last subsection is reasonably efficient when n is not much greater than p. NOTES AND REFERENCES The singular value decomposition The singular value decomposition was established independently and almost simultaneously by Beltrami [19. but it shows that small singular . Theorem 1. If instead.6 implements this scheme.6]. contributes only negligibly to the total operation count. which is 0(np2).4. For more historical material see [263] and §1:1.5. we save np2 flam and in the bargain have an implicit basis for both the columns space of X and its orthogonal complement.6: Hybrid QR-SVD algorithn 3. compute the singular value decomposition of R. and multiply the left singular vectors intoQ. tion^Y = UXVH.4. This algorithm computes the singular value factoriz.1873] and Jordan [137.6. Here are three comments.4 were presented in [269.

For matrices that are graded upward there is are variants of both the shifted and unshifted algorithm that proceed from bottom to top. For example. The authors present a forward recurrence that computes the quantity ||J9[fc:n. fcin]"1!]^1. et. though not foolproof. Much of the material on the implementation of the QR algorithm for bidiagonal matrices is taken from an important paper by Demmel and Kahan [58]. The derivation of the first-order bounds given here is new. From this. Reduction to bidiagonal form and the QR algorithm The reduction to bidiagonal form is due to Golub and Kahan [88. If the shift in the QR algorithm is zero.SEC. Whenever an analysis involves the eigenvalues and eigenvectors of C. The lesson of the analysis given here is that one should not condemn but discern. the computations can be arranged so that all the singular values retain high relative accuracy. and form a cross-product matrix C called the correlation matrix.is negligible if |e. Namely. • Additional tests for negligibility. This permits a cheap test for negligibility of the €i.| < tol-a. The derivation given here may be found in the Handbook contribution [91]. a natural. method for choosing among these methVolII: Eigensystems . • Zero shifted QR. much data is noisy enough that the errors introduced by passing to C are insignificant—at least when computations are performed in double precision. If the current block extends from il to /2. It is a curious fact that the QR algorithm for the singular values of bidiagonal matrices was first derived by Golub [86. This algorithm is used whenever the relative accuracy tol-g_ required of the smallest singular value is less than the error CTCM that can be expected from the shifted QR algorithm (here a is an estimate of the largest singular value). Our implementation of the shifted bidiagonal QR algorithm proceeds from the top of the matrix to the bottom. it is possible to compute a lower bound g_ for the smallest singular value. 3. Owing to limitations of space we cannot do the paper full justice in this chapter. although partial analyses and instructive examples may be found in most texts on numerical linear algebra— often with warnings never to form the cross-product matrix. which is fine for matrices graded downward. it amounts to an implicit use of the cross-product algorithm. THE SINGULAR VALUE DECOMPOSITION 227 values tend to increase in size under perturbation.1965]. The cross-product algorithm The cross-product algorithm is widely and unknowingly used by statisticians and data analysts. Subsequent analyses are cast in terms of C. a standard transformation is to subtract column means. But here are some additional details. normalize. 1968] without reference to the QR algorithm. The analysis of the cross-product algorithm appears to be new. where tol is the relative error required in the singular values. • Bottom to top QR. Fortunately. given a data matrix X.

The latter prob lem is called downdating and comes in several varieties. depending on what parts o: the singular value decomposition we have at hand. j).. Code based on these ideas may be found in the LAPACK routine xBDSQR. 202) can be generalized in two ways to find the singular value decomposition of a matrix X. as well as references to previous work. A typical iteration step goes as follows. An interesting problem is to update th( singular value decomposition after a row has been added or removed. 35]. For some * < j. THE SYMMETRIC EIGENVALUE PROBLEM ods is to use the top-down method if d[il] > d[i2] and otherwise use the bottom-up method. As with the Jacob algorithm. Higham [115] has shown that row and columi pivoting in the computation of the QR decomposition improves the accuracy of th( singular values. Fernando and Parlett [73] have devised a nev variant of the shifted qd algorithm that calculates them efficiently to high relative ac curacy. The differential qd algorithm When only singular values are desired.228 CHAPTER 3. Divide-and-conquer algorithms Gu and Eisenstat [101] give a divide-and-conquer algorithm for the bidiagonal singu lar value problem. i)-elements ol U^XV are zero—i. due to Kogbetliantz [149.e. Their paper also contains a summary of other divide-and-conque algorithms with extensive references. Gu and Eisenstat [102] give three algorithms for the downdating problem. . The QRD-SVD algorithm The idea of saving work by computing a singular value decomposition from a QR de composition is due to Chan [34. Jacobi methods Jacobi methods (see p. this transformation increases the sum of squares of the diagonal elements of X and decreases the sum of squares of the off-diagonal elements by the same amount. solve the singular value problem for the embedded 2x2 matrix applying the transformations to rows and columns % and j of X.and (j. let V and V be plane rotations in the (i. Dovvndating a singular value decomposition The singular value decomposition is useful in applications where X is a data matrb with each row representing an item of data.1955] works with square X. The algorithm has been implemented in the LAPACK routine SLASQ1. The two-sided method. j)-plane such that the (t.

Sec.2 we present Wilkinson's algorithm. The following theorem shows that we can diagonalize A and B by such transformations. There is an analogous two-sided method in which rotations are also applied from the left. 203]. which is due to Wilkinson. The problem arises naturally in many applications. We begin by developing the theory of the S/PD generalized eigenvalue problem. In §4. abbreviated S/PD. B) is said to be SYMMETRIC POSITIVE DEFINITE. 107. 37. The slash in S/PD is there to stress that two matrices are involved—the symmetric matrix A and the positive definite matrix B (which is by definition symmetric). 4. 4. transformations of the form ( X ^ A X ..109. 75. In transforming S/PD pencils it is natural to use symmetry-preserving congruences. The one-sided method does not require X to be square. then the result of a complete sweep is a lower triangular matrix. XTBX). it is the best we have.1. The techniques that are used to analyze the Jacobi method generally carry over to the variants for the singular value decomposition. Definition 4. all matrices are real. 4.1. As usual in this chapter. Although this reduction. i. THEORY Here we establish the basic theory of S/PD pencils. in fact. THE SYMMETRIC POSITIVE DEFINITE (S/PD) GENERALIZED EIGEN VALUE PROBLEM A generalized eigenvalue problem Ax = \Bx is said to be symmetric positive definite (S/PD) if A is symmetric and B is positive definite. Mathematically it is a generalization of the symmetric eigenvalue problem and can. and the purpose of this subsection is to describe the problem and its solution. 108. The fundamental theorem We begin with a definition. VolII: Eigensystems . does not yield an unconditionally stable algorithm.e. The S/PD Eigenvalue Problem 229 An interesting and work-saving observation about this method is that if X is upper triangular and elements are eliminated rowwise. Let A be symmetric and B be positive definite. Then thepencil(A. including some modifications to handle the case where B is ill conditioned with respect to inversion. be reduced to an equivalent symmetric eigenvalue problem. It is easy to see that the rotation V is the same as the rotation that would be obtained by applying a Jacobi step to X^X with pivot indices i and j. 67. Choose two columns i and j of X and determine a rotation V such that the ith and j>th columns of XR are orthogonal. For more see [8. where X is nonsingular.

But other normalizations are possible. Then there is a nonsingular matrix X such that and where Proof. • If we write AX = X~^D^ and denote the ith column of X by Xi. Let be its spectral decomposition. Let (A>B) be an S/PD pencil. There are four comments to be made about this theorem. In this sense the S/PD eigenvalue problem is a generalization of the symmetric eigenvalue problem. it is natural to say that X is ^-orthogonal. It follows that so that Aj = Cti/Pi is an eigenvalue of (A. the numbers fa are positive. • The proof of the theorem gives eigenvectors Xi normalized so that xjBxi = 1.B) has real eigenvalues and a complete set of jB-orthonormal eigenvectors. THE SYMMETRIC EIGENVALUE PROBLEM Theorem 4. Bxi = [3iX~iei. The matrix Z~TAZ~l is symmetric. Set Then by construction X'1AX is diagonal. B) and X{ is the corresponding eigenvector. Thus. Since B is positive definite and hence nonsingular.230 CHAPTER 3. then Similarly.2. Let Z be any matrix of order n satisfying For example Z could be the Cholesky factor of B. This implies that an S/PD pencil cannot have infinite eigenvalues. our theorem states that the S/PD pencil (A. we have proved that A can be diagonalized by a i?-orthogonal transformation. Z is also nonsingular. In this language. the theorem says A can be diagonalized by an orthogonal transformation. • Since X^ BX = /. • Whatever the normalization. When B — /. Moreover. . is also diagonal.

1) shows that the eigenvalues of an S/PD problem. then by Theorem 4.) On the other hand. The bound (4. The algorithm Wilkinson's algorithm is based on the proof of Theorem 4. Chapter 1.9.2.2)-element of B is perturbed to 2-10~5. lf(A+E. B) with ||x||2 = 1. see Example 4. R is upper triangular with positive diagonals (see §A. shows. 4. consider the problem If the (2. The S/PD Eigenvalue Problem Condition of eigenvalues 231 Let (A. However. for E and F sufficiently small there is an eigenvalue A of the perturbed pencil such that Thus (a2 + ft2) 2 is a condition number for the eigenvalue A in the chordal metric. This is generally true of the small eigenvalues. while perturbations of the same size can move the eigenvalue 1 anywhere in the interval [0. This happens when both a.13. the well-conditioned eigenvalues of an S/PD pencil are not necessarily determined to high relative accuracy. (However.5). Chapter 2. oo).B+ F) is a perturbed pencil. can be ill conditioned.Sec. unlike the eigenvalues of a Hermitian matrix. Let VolII: Eigensy stems . Let a = x T Axand/3 = xTBx so that (a. For example.12. and fti are small. WILKINSON'S ALGORITHM In this subsection we consider an algorithm due Wilkinson for solving the S/PD eigenvalue problem. or) be a simple eigenvalue of the S/PD pencil (A.e. Chapter 2.2. well-conditioned large eigenvalues can also be ill determined in the relative sense.. B) be an S/PD pencil and let be the Cholesky decomposition of B—i. in the problem the eigenvalue 2 is generally insensitive to perturbations bounded by 10 . We begin with the algorithm itself and then consider modifications to improve its performance in difficult cases. the eigenvalue 105 becomes |-105. 4. as Example 3. Let (A. For example./?) is a projective representation of A.

its upper triangle may be overwritten byR. B) of order ra.. the operation count is 0(n3) with a modest order constant. Here are some comments on this algorithm. Computation of R~T AR~l The computation of C = R~^ AR~l is tricky enough to deserve separate treatment. • The exact operation count for the algorithm depends on how long the method used to compute the spectral decomposition of C takes. • The Cholesky factor R can be computed by a variant of Gaussian elimination— e.2) is used. Algorithm 4. If B is not needed later. The one given here proceeds by bordering. Algorithm 1:3. THE SYMMETRIC EIGENVALUE PROBLEM Given an S/PD pencil (A. there is more than one algorithm. • The columns Xj of X may be formed from the columns Uj of U by solving the triangular systems The calculations can be arranged so that X overwrites U [cf.g. • We will treat the computation of C in a moment.1. (1:2. • Only the upper triangles of A and B need be stored. See page 159. .2. In fact.1 implements this scheme.1: Solution of the S/PD generalized eigenvalue problem be the spectral decomposition of R T AR l.232 CHAPTER 3. Then the matrix diagonalizes the pencil. which is determined in part by C. this algorithm produces a matrix X such that Algorithm 4. its upper triangle may be overwritten by the upper triangle of C. If A is not needed later. If the QR algorithm (§1) or the divide-and-conquer algorithm (§2.7)].2.

2. if ll^lbll-B" 1 !^ ls l^S6 (see §I:3-3)—then B~1A will be calculated inaccurately.e. For a scheme in which the bulk of the work is contained in a rank-one update see the LAPACK routine XSYGS2.2) in the form Then by direct computation we find that and Thus we can compute 7 and c by the following algorithm. Three comments. • The bulk of the work in this algorithm is performed in computing the product of the current principal submatrix with a vector.. Simply replace all references to C by references to A. The S/PD Eigenvalue Problem Specifically.2 implements this scheme. This will enable us to solve our problem by successively computing the leading principal majors of R~T AR~l of order 1. 4.3) may seem artificial. Algorithm 4.Sec.. Ill-conditioned B Wilkinson's algorithm is effectively a way of working with B~l A while preserving symmetry. since the large eigenvalues of (A... There is no complete cure for this problem. but they are designed to allow Algorithm 4. we will show how to compute c and 7. B) are ill determined. A partial cure—one which keeps the errors confined to the larger eigenvalues—is to arrange the calculations so that the final matrix C is Vol II: Eigensystems . • The algorithm takes |n3 fla • The arrangement of the computations in (4.. We begin by rewriting (4. A defect of this approach is that if B is ill conditioned with respect to inversion—i. n.2 to overwrite A by C. consider the partitioned product 233 Assuming that we already know C = R~TAR 1.

234 CHAPTER 3.5. then the diagonal elements of A are eigenvalues of (A. One way to produce a graded structure is to compute the spectral decomposition B = VMVT. THE SYMMETRIC EIGENVALUE PROBLEM Given a symmetric matrix A and a nonsingular upper triangular matrix R. B). much more can be said about the eigenvalues and eigenvectors of S/PD pencils. This produces a decomposition of the form where P i s a permutation matrix and R is graded downward. then the diagonal elements of A are eigenvalues of (A. As we have observed (p. and the corresponding eigenvectors are the columns of X = PR~lfU.3. this algorithms computes C = R~TAR~l. NOTES AND REFERENCES Perturbation theory We have derived the perturbation theory for S/PD pencils from the perturbation theory for general pencils.2: The computation of C = R~T AR~l graded downward. then C will be graded downward.If we define C = M _ 1 V T AVM — 2 and compute its spectral decomposi2 tion C = £/A{7T. where the diagonal elements of M are ordered so that ^i < ^ < — • • • < Mn. . and the corresponding eigenvectors are the columns of X = VM~^U. If we then compute the spectral decomposition C = t/AJ7T. 108ff) the reduction to tridiagonal form and the QR algorithm tend to perform well on this kind of problem. Algorithm 4.3). B). As might be expected by analogy with the Hermitian eigenproblem. If we set where f is the cross matrix. A less expensive alternative is to use a pivoted Cholesky decomposition of B (see §A. 4.

Ch. Eigenvectors can then be computed by the inverse power method suitably generalized. Code for both problems may be found in the Handbook [170]. that only a very few deflation steps are required. B) can be found by interval bisections. They also reduce the eigenproblem B Ax — Xx. The LAPACK routinesxxxGST reduce the S/PD eigenproblem Ax — XBxto standard form. To take advantage of the banded structure Wilkinson notes that Gaussian elimination on A — \B gives a count of the eigenvalues less than A. B) is definite if A definite pencil can in principle be reduced to an S/PD pencil.3].Sec. Wilkinson mentions. consider the set Vol II: Eigensystems . Wilkinson discusses the problem of ill-conditioned B and suggests that using the spectral decomposition of B instead of its Cholesky decomposition to reduce the problem will tend to preserve the small eigenvalues. Li and Mathias [163] have revisited this theory and. just as in Algorithm 2. the algorithm potentially requires 0(n4} operations. have given significantly improved bounds. Thus the eigenvalues of (A. however. Recently. giving a smaller pencil containing the remaining eigenpairs. Band matrices Wilkinson's method results in a full symmetric eigenvalue problem. It seems.4. The definite generalized eigenvalue problem A Hermitian pencil (A. Wilkinson's algorithm Wilkinson presented his algorithm in The Algebraic Eigenvalue Problem [300. an algorithm for problems of the form ABx — Ax. but does not derive. §§6872. starting from a different perspective. §IV. even when A and B are banded. Chandrasekaran [36] has noted that the large eigenvalues and corresponding eigenvectors of C (obtained from the spectral decomposition of B) are well determined. 4. 5]. He shows how to deflate these well-determined eigenpairs from the original pencil. The S/PD Eigenvalue Problem 235 This theory is surveyed in [269. where A is symmetric and B is positive definite. Specifically. where A is symmetric and B is positive definite. Since each deflation step requires the computation of a spectral decomposition. although he does not mention the necessary sorting of the eigenvalues.

3]. and V. For much the same reasons as given in §3. whereTO.] The above argument is nonconstructive in that it does not give a means for computing a suitable angle 0. and since it is convex. For an implementation see [43].4. Then there are orthogonal matrices Ux and Uy and a nonsingular matrix W such that AND where EX and Sy are diagonal. Crawford and Moon [44] describe an iterative algorithm for computing a 9 such that BO is positive definite (provided. [It is a curious fact that the above argument does not work for n = 2 and real symmetric A and B when the vector x in (4. with W~l on the right-hand sides. This generalized singular value decomposition was proposed.6). However. it can be rotated to lie in the upper half of the complex plane. let be a QR factorization of (JfT yT)T. If we define then the field of values TQ of Ag + Be is just the set jF rotated by the angle theta. and the algorithm can be 0(n4). It is easily seen that AND so that the generalized singular value decomposition is related to the generalized eigenvalue problem in the cross-product matrices just as the singular value decomposition is related to the eigenvalue problem for the cross-product matrix.236 CHAPTER 3. G) is called the field of values of A + iB. then f( A. UY. which is equivalent to Be being positive definite. Then the CS decomposition exhibits orthogonal matrices Ux. B) > 0.3. Each step requires a partial Cholesky factorization of the current B$.4) ranges only over 7£n. a computational approach to the generalized singular value decomposition through cross-product matrices is deprecated.258] or §1:1. The generalized singular value decomposition is related to the CS decomposition of an orthonormal matrix (see [202. For more on definite pairs see [269. of course.2. Now if 7( A. The generalized singular value and CS decompositions Let X £ E nxp and Y G R m X p . B) does not contain the origin. §§VI. that the original pencil was definite). THE SYMMETRIC EIGENVALUE PROBLEM The set f(A.1. by Van Loan [283.n > p.1981].1976] and in the above form by Paige and Saunders [202. and it can be shown to be convex. such that . VI. Specifically. For another approach see [40].

Paige [200] has given a method that works directly through X and Y. Algorithms for computing the CS decomposition and hence the singular value decomposition have been proposed by Stewart [259. Vollt: Eigensystems .7) and set J£T = V^Ry the result is the generalized singular value decomposition (4. 260] and Van Loan [284].5).Sec. (This last relation implies that the diagonal elements of E^ and Sy can be interpreted as cosines and sines.6) and (4. 4.) If we combine (4. The S/PD Eigenvalue Problem 237 where EX and Sy are diagonal matrices with S^ + Sy = /. The LAPACK routine xGGSVD uses Van Loan's approach. whence the name CS decomposition.

This page intentionally left blank .

We now shift focus to problems that are so large that storage limitations become a problem. Unfortunately. their zero elements greatly outnumber their nonzero elements. This limitation is not as burdensome as it might appear. Instead we must be satisfied with computing some subset of its eigenpairs. like eigenvectors. if the order of the matrix in question is sufficiently large. With appropriate data structures we can get away with storing only the latter. most algorithms for large sparse eigenvalue problems confine themselves to matrix-vector multiplications or perhaps the computation of an LU factorization. Consequently. Therefore: Unless otherwise specified A will denote a matrix of order n. and it turns out that algorithms for large matrices are best explained in terms of eigenspaces. Many large matrices arising in applications are sparse. Second. It will often help to think ofn as being very large. In §3 we treat one of the most popular ways of generating subspaces—the Krylov sequence—and in §4 we analyze the RayleighRitz method for extracting individual eigenvectors from a subspace. our chief concern is with large eigenproblems. It follows that if the matrix is large enough we cannot afford to store its entire eigensystem. in consequence. The space spanned by a set of eigenvectors is an example of an eigenspace. most algorithms for large problems proceed by generating a sequence of subspaces that contain increasingly accurate approximations to the desired eigenvectors. First. and in §2 we present the analytic theory—residual analysis and perturbation bounds. and. we will not be able to store all its elements. We begin with the basic algebraic theory of eigenspaces.4 EIGENSPACES AND THEIR APPROXIMATION The first half of this book has been concerned with dense matrices whose elements can be stored in the main memory of a computer. This problem has two aspects. 239 . cannot be computed directly. Although all of the results of this chapter apply to matrices of any size. the eigenvectors of a sparse matrix are in general not sparse. this chapter is devoted to the basic facts about eigenspaces and their approximations. Because eigenspaces. these data structures restrict the operations we can perform on the matrix. that is.

They are also important because one can use them to isolate ill-behaved groups of eigenvalues. Chapter 2). 1. the eigenvalues of L are A and A. DEFINITIONS We have already observed (p. Eigenspaces can be used to bring together combinations of eigenvalues and eigenvectors that would be difficult to treat separately. This construction illustrates the utility of eigenspaces. In the first subsection we will present the basic definitions and properties. while the expression (1. We have already noted that these invariant subspaces—or eigenspaces as we will call them — are useful in the derivation and analysis of algorithms for large eigenvalue problems.1). The notion of an eigenspace is a generalization of this observation. Moreover. x) is captured more economically by the relation (1. There we showed that if (A. Eigenbases The definition of eigenspace was cast in terms of a subspace of C n . Then X is an EIGENSPACE Or INVARIANT SUBSPACE of A if The space spanned by an eigenvector is a trivial but important example of an eigenspace.1) concerns a specific basis for the subspace in question.2 we will consider simple eigenspaces—the natural generalization of a simple eigenvector. EIGENSPACES 1.1. The information contained in the set of complex eigenpairs (A. In §1. x} and (A. The following theorem summarizes the properties of a basis for an eigenspace. A less trivial example occurs in the proof of the existence of real Schur decompositions (Theorem 3.1. . a matrix that does not have a complete system of eigenvectors may instead have a complete system of eigenspaces of low dimension corresponding to groups of defective eigenvalues. EIGENSPACES In this subsection we are going to develop the theory of subspaces that remain invariant when acted on by a matrix. which consists entirely of real quantities. or) = (\.240 CHAPTER 4. In computational practice we will always have to work with a basis. Let A be of order n and let X be a subspace ofCn.1. Definition 1. the space X — 7l[(y z)} is an eigenspace of A. then for some real 2x2 matrix L. 2) that the subspace spanned by an eigenvector is invariant under multiplication by its matrix.y + iz] is a complex eigenpair of a real matrix A. For example. Since the column space of (y z]L is a subset of the column space of (y z).

If we set L — (i\ • • • ^). i.2. then Now let (A. then (A. then so that (A. The theorem shows that L is independent of the choice of left inverse in the formula L = X1AX. Hence and the relation Lu = Xu follows on multiplying by X1. Vol II: Eigensystems . then (A. Since y\ 6 X and X is a basis for X. The theorem also shows that there is a one-one correspondence between eigenvectors lying in X and eigenvectors of L. Xlx) is an eigenpairofL. the relation between the eigensystem of A and that of L can be very complicated. however. ElGENSPACES 241 Theorem 1. then AX = XL. if (A. u] is an eigenpairofL. for some unique vector t{. yi can be expressed uniquely as a linear combination of the columns of X. Xu) is an eigenpair of A. Let be partitioned by columns. as required. Then there is a unique matrix L such that The matrix L is given by where X1 is a left inverse ofX —i. a matrix satisfying X1X — I. • We say that the matrix L is the representation of A on the eigenspace X with respect to the basis X.. As we will see. The existence of a unique representation L associated with a basis X for an eigenspace X justifies the following extension of the notion of eigenpair. In particular L has a complete set of eigenvectors if and only if the eigenvectors of A lying in X span X.e.7). Xu) is an eigenpairofA. for the important case of a simple eigenspace (Definition 1.. If L does not have a complete system of eigenvectors. Let X be an eigenspace of A and letX be a basis forX. 1.e. Conversely. x) is an eigenpairofA with x £ X. the relation is straightforward. Then there is a unique vector u such that x — Xu. #) be an eigenpair of A with x G X. if Lu = Xu. Proof. Conversely. If (A. If X1 is a left inverse of X. However. u = Xlx.SEC.

X) be an eigenpair of A. . it it natural to ask if any set of eigenvalues has a corresponding eigenspace.W~1X) is an eigenpair of"W~l AW. Proof Let be a partitioned Schur decomposition of A in which TH is of order k and has the members of C on its diagonal.XU) is also an eigenpair of A. The following theorem shows that indeed it has. LetC = { A i . Theorem 1. IfW is nonsingular. Y) is a LEFT EIGENPAIR of A. Then there is an eigenspace X of A whose eigenvalues are AI. V) is a left eigenpair of A. we say that (L. then y = T^(Y] is a left eigenspace of A in the sense that The following theorem shows how eigenpairs transform under change of basis and similarities. Existence of eigenspaces Given that every eigenspace has a unique set of eigenvalues. Its proof. Theorem 1. . X) is orthonormal. we say that the eigenpair (L. IfU is nonsingular. . An important consequence of this theorem is that the eigenvalues of the representation of an eigenspace with respect to a basis are independent of the choice of the basis. A^. . EIGENSPACES Definition 1.. we can speak of the eigenvalues of A associated with an eigenspace. IfY G Cnxk has linearly independent columns and YHA = LYH. which is simple and instructive. Computing the first column of this relation. .X) is an EIGENPAIR OF ORDER & Of RIGHT EIGENPAIR OF ORDER k of A if We call the matrix X an EIGENBASIS and the matrix L an EIGENBLOCK. then the pair (U~1LU. Note that if (L. ForX € Cnxk andL € Ck*k.. Let A beofordern. Let (L. Consequently.4. .3. then (L.A) be a multisubset of the eigenvalues of A. If X is orthonormal. is left as an exercise. A^} C A(. we get . we say that (L.5.242 CHAPTER 4.

Otherwise put: If X is an eigenspace of A and y is the orthogonal complement of X. This is a special case of a general technique.1. ElGENSPACES 243 Hence the column space of U\ is an eigenspace of A whose eigenvalues are the diagonal elements of the triangular matrix TH —i. if X is any basis for an eigenspace of A and (X Y)is nonsingular. Chapter 2) we essentially used the eigenspace corresponding to a pair of complex conjugate eigenvalues to deflate the matrix in question. can be treated as separate eigenvalue problems. however. it is preferable to use unitary transformations to deflate an eigenspace. Unfortunately. which. • The construction of Theorem 1.2) implies that Thus (M. the members of C. • The similarity transformation (1. See the discussion on page 10. Deflation In establishing the existence of the real Schur form (Theorem 3. M.6. • The relation (1. Vol II: Eigensystems .. Theorem 1.2 is not the most general way to deflate an eigenspace. 1. then for some matrices L.2) of A deflates the problem in the usual sense. and H.SEC. then y is a left eigenspace of A corresponding to the eigenvalues not associated with X. if A has multiple eigenvalues. Then where Here are three comments on this theorem. Y) is a left eigenpair of A. • Thus a matrix A of order n with distinct eigenvalues has 2n distinct eigenspaces — one corresponding to each subset of A( A). We will return to this point in the next subsection. We call y the COMPLEMENTARY LEFT EIGENSPACE OF X. Let(i. the eigenspaces of A are not as easily described. The proof of the following theorem is left as an exercise. Specifically. The spectrum of A is the union of spectra of L and M. For numerical reasons.e. X) bean orthonormaleigenpairofA andlet(X Y) be unitary. for computational purposes.

We will approach the problem by way of block diagonalization. Let (L\. SIMPLE EIGENSPACES Definition We have seen (Theorem 1.5) that if £ is a subset of the spectrum of A.Y<2) be unitary so that (Theorem 1.8.2). The above example suggests that the nonuniqueness in eigenspaces results from it not having all the copies of a multiple eigenvalue. then there is an eigenspace X of A whose eigenvalues are the members of L.244 CHAPTER 4. then it has two linearly independent eigenvectors.2. The problem is not as simple as the existence of left eigenvectors. if A is a double eigenvalue of A that is not defective. Block diagonalization and spectral representations We turn now to the existence of left eigenspaces corresponding to a right eigenspace. Hence we make the following definition. Then X is a SIMPLE EIGENSPACE of A if In other words. Let X bean eigenspace of A with eigenvalues C. Theorem 1. For example. an eigenspace is simple if its eigenvalues are. EIGENSPACES 1. this eigen space need not be unique. Unfortunately.7. since we cannot use determinantal arguments as in the second comment following Theorem 1. counting multiplicities. it is impossible to pin down an object that is not unique. Computationally. Definition 1. Chapter 1. and any nonzero vector in the plane spanned by these two vectors spans a one-dimensional eigenspace corresponding to A. disjoint from the other eigenvalues of A.2.X\) be a simple orthonormal eigenpair of A and let(Xi. Then there is a matrix Q satisfying the Sylvester equation such that if we set where then .

6).6) we can write We will call this formula a spectral representation of A.X<2) is an eigenpair of A. If. • Here are some observations on this theorem.4) and (1. In establishing the spectral representation we assumed that Xi and Y% were orthonormal. then there is a left eigenvector y corresponding to x.5) result from accumulating the transformation in the successive reductions to block triangular form and block diagonal form.5). For example. we have the following corollary. Let X be a simple eigenspace of A.9. and LI by S~lLiS. Saying that (1. 1. YI by Yi5~ H . Chapter 1].6) in the form we see that AX? = X^L^ so that (L<2. so that 7£( YI) is the left eigenspace corresponding to Tl(Xi). The eigenbases in a spectral representation are not unique. EIGENSPACES 245 Proof. • From (1. The eigenspace 7Z(X2) is complementary to the eigenspace 1Z(Xi) in C n . • If (A. S is nonsingular.SEC. We will return to this point later when we consider canonical angles between subspaces (see Corollary 2.8 uses a single simple eigenspace to reduce a matrix to 2x2 block diagonal form. and we call it the complementary right eigenspace. to reduce A to block diagonal form. • Theorem 1.3). The orthogonal complement ofX is the complementary left eigenspace. The formulas (1. We say that the bases Xi and YI are biorthogonal. x) is a simple eigenpair of A. for example. Corollary 1. The fact that the eigenbasis Y\ for the left eigenspace corresponding to Xi is normalized so that Y^Xi = I generalizes this result. then we may replace X\ by XiS. in which the first Vol II: Eigensystems . an obvious induction shows that if we have k simple eigenspaces then we can reduce v4. However.18.12). which can be normalized so that yHx — I [see (3. apply Theorem 1. Then X has a corresponding left eigenspace and a complementary right eigenspace.to(fc-fl)x(fc+l) block diagonal form. YI ) is a left eigenpair of A. which gives the formula for X^ in (1.5). • If we write the second equation in (1. Likewise (Li.7) is a spectral representation of A is a convenient shorthand for (1. Chapter 1. We will call any such spectral representation a standard representation. Combining these facts with the observation that the orthogonal complement of 7l(X\) is the complementary left eigenspace of A. Beginning with (1.

M). It will be sufficient to prove the uniqueness for a simple eigenspace corresponding to a single eigenvalue. M) are both similar to A. then so is the original space. Chapter 1).0. Similarly Yi2 .246 CHAPTER 4. it can be decomposed by diagonalization into eigenspaces corresponding to its individual eigenvalues.13. Since diag(Z/. M). For if a simple eigenspace has more than one eigenvalue. EIGENSPACES k blocks are eigenblocks of the eigenspaces and the last eigenblock is the eigenblock of the complement of their union.7). Thus X is block diagonal. there is a corresponding spectral representation of the form Uniqueness of simple eigenspaces We turn now to the question of whether a simple eigenspace is uniquely determined by its eigenvalues. M). Since M cannot have zero eigenvalues. Let X be a simple eigenspace corresponding to a single eigenvalue. .2)-block of this relation. they are similar to each other. M) and diag(Z/. and write the similarity transformation in the partitioned form Let k be an integer such that Lk = Lk — 0.8) to get Since MkXM is nonsingular. we get Since M is nonsingular.2)-block of (1. Then Computing the (2. the result is not easy to prove. Although it seems obvious that it is. Then we can diagonalize A as in Theorem 1. Let Y = X~l. so are X^ and ¥22 • Now compute the (1. Let Jf~1diag(Z. In analogy with (1. where again L is nilpotent and M is nonsingular. Now let X be a second eigenspace corresponding to the eigenvalue zero.8 to get the matrix diag(Z. Since L has only zero eigenvalues it is nilpotent (Theorem 1. If the latter are unique. M )X — diag(i. which without loss of generality we may assume to be zero. F2i = 0. and the spaces X and X are the same. We can then use this eigenspace to diagonalize A to get diag(Z/. it is nonsingular.

Vol II: Eigensystems . The generalization of the notion of eigenpair as well as the terms "eigenbasis" and "eigenblock" (Definition 1. For a systematic development of this approach see [146]. It is easily verified that Pi is idempotent—i. and for that reason the Pi are called spectral projectors. For example.e.SEC. An alternative approach is to use the resolvent defined by For example. Such matrices project a vector onto its column space along its null space. such spaces are usually called invariant subspaces. it can be shown that the spectral projection corresponding to a set £ of eigenvalues is given by the complex integral where F£ is a closed path in the complex plane that contains £ and no other eigenvalues. Uniqueness of simple eigenspaces That a simple eigenspace is determined by its eigenvalues seems to be taken for granted in the literature on matrix computations. A trivial reason for preferring "eigenspace" over "invariant subspace" is that the latter need not be truly invariant under multiplication by A — an extreme example is A = 0. Spectral projectors can be used to decompose a matrix without transforming it. Spectral projections From the spectral representation we can form the matrices Pi = XiY^.. Pf = Pi. 2. The proof given here is new. it is easy to verify that an equation that corresponds to our reduction to block diagonal form. NOTES AND REFERENCES Eigenspaces 247 Although the term "eigenspace" can be found in Kato's survey of perturbation theory [146]. where there is no one matrix to operate on the subspace. For more on spectral projections see [146].3.3) are new. A more substantial reason is that we can use the same terminology for generalized eigenvalue problems. PERTURBATION THEORY 1. which collapses any subspace into {0}. The resolvent Our approach to eigenspaces through block triangularization and diagonalization reflects the way we compute with eigenspaces.

Let X and y be subspaces ofCn of dimension p and letX and Y be orthonormal bases forX and y. we will first present the analysis of residuals and then go onto perturbation theory proper.11). 2. Definitions Let X and y be subspaces of the same dimension.1. For if X and y are other bases.1] and can be regarded as cosines of angles. We then turn to the analysis of residuals. In §2. where V = XHX and W = YEY are unitary. Specifically. Definition 2. we will restrict the dimensions of our subspaces to be not greater than ^. Then the CANONICAL ANGLES between X and y are the numbers where 7. we begin with preliminary material on the canonical angles between subspaces. and consider the matrix C = Y^X.H2/|/(||z||2||t/2||) [see(3. EIGENSPACES 2. CANONICAL ANGLES We have defined the angle between two nonzero vectors as the angle whose cosine is |a. Let X and Y be orthonormal bases for X and y. we can write X = XV and Y = YW. in §2. . We have Hence all the singular values of C lie in the interval [0. In this subsection we extend this definition to subspaces in C n . To avoid certain degenerate cases. This justifies the following definition.4 we present some residual bounds for Hermitian matrices.. because of the important role played by residuals in algorithms for large problems. PERTURBATION THEORY In this subsection we will treat the perturbation of eigenspaces and their associated eigenblocks. however. y) for the ith canonical angle in descending order and define We will also write Q(X.248 CHAPTER 4. Much of this theory parallels the perturbation theory in §3.5. Chapter 1]. Hence C = YHX = WUCV and C are unitarily equivalent and have the same singular values. Y) for 6(*.are the singular values of YUX. We write Oi(X. These angles are independent of the choice of bases for X and y.3 we use these residual bounds to derive a perturbation theory for simple eigenpairs. Finally. Chapter 1. For a more complete development of this subject see §1:4. y).1.

and let Y_\_ be an orthonormal basis for the orthogonal complement ofy. the largest canonical angle has the following characterization: Thus for each x in X we look for the vector y in y that makes the smallest angle 9(x) with x. Subspaces of unequal dimensions We will occasionally need to compare subspaces X and y of differing dimensions. For details. The largest such angle 9(x) is the largest canonical angle. Specifically. IfO < 10~8. Hence the singular values of S — Y±X are the sines of the canonical angles between X and y. Proof. A cure for this problem is to compute the sine of the canonical angles. The following theorem. But It follows that E2 is diagonal and its diagonal elements are 1 .2 as a mathematical device for bounding canonical angles But it also provides the means to compute the sines of canonical angles numerically. The statement that X is near y makes sense only when dim(A') < dim(J^). Let X and Y be orthonormal bases for X and y. Let VH(CHC)V = T2 be the spectral decomposition of CHC. provides the wherewithal.7?—the squares of the sines of the canonical angles. Computing canonical angles In principle we could compute canonical angles from the characterization in Definition 2. see the notes and references. and we will conclude that 9 = 0.SEC. PERTURBATION THEORY 249 Geometrically. For otherwise there would be a vector # in A" that is orthogonal to every vector in y and the Vol II: Eigensystems .1. Let Since this matrix is orthonormal. then cos 9 will evaluate to 1 in IEEE double-precision arithmetic. Then the singular values of Y^X are the sines of the canonical angles between X and y.12. Since the eigenvalues of CHC are the squares of the singular values of C. this procedure will give inaccurate results.2. the diagonal elements of F are the cosines of the canonical angles between X and y. if the canonical angle is small. cos 9 = 1-|02. for small 6. Chapter 1. 2. • In this chapter we will use Theorem 2. which is a generalization of Lemma 3. However. Theorem 2.

• Combinations of subspaces In Theorem 1.250 CHAPTER 4. EIGENSPACES angle between that vector and y could only be ^. Let y^y and consider the transformed vector In this new coordinate system we must have y± — 0. The following theorem does the job. by Theorem 2. When dirn(A') < dim(3. Let o~i and ^ denote the nonzero singular values ofZ and Q arranged in descending order. which. LetOi denote the nonzero canonical angles between 7Z(X) andlZ(Z). and the norm at the minimum is \\x_i ||2. Let (Y Y±) be unitary with 7£(F) = y. also arranged in descending order. Then Proof.2. Theorem 2.). Let x be a vector with \\x\\2 — 1 and let y be a subspace. and the squares of the nonzero singular values of Z are the nonzero eigenvalues of .1. In this case. where we will have to know the relations between the singular values of Q and X -f YQ. Hence This norm is clearly minimized when y — x. y). We will meet such expressions later. the number of canonical angles will be equal to the dimension of X.8 we expressed the left eigenspace corresponding to a simple eigenspace in the form X + YQ. The following theorem contains a useful characterization of the the angle between a vector and a subspace. Then and Proof. we can define canonical angles as in Definition 2. is sin Z(z.4. Let X and Y be ortho-normal matrices with XHY = 0 and let Z = X + Y Q. Theorem 2. The squares of the nonzero singular values of Q are the nonsingular eigenvalues of Q^Q.2.3. These canonical angles can be calculated as in Theorem 2.

2.1). 2. we obtain the following corollary. One such basis is where (/ + QHQ) 2 is the inverse of the positive definite square root of / + QHQ. Chapter 2. where Lk is suitably chosen.SEC. Then the singular values ofY are the secants of the canonical angles between X and y. then (Lk. the only way we have for determining convergence is to look at the residual Rk = AXk — XkLk. Chapter 2. • If we combine this theorem with Theorem 1. First we will show how to compute an optimal block L — a generalization of Theorem 1. or sec 0. We will then show that a small residual implies a small backward error— a generalization of Theorem 1.8.12). which generalizes (3. Optimal residuals Most of our algorithms will furnish an approximate eigenbasis X but will allow us the freedom to choose the eigenblock. The relation tan 0t = <r. In practice. it is hoped.XL as small as possible. Thus the cosines of the canonical angles are the singular values of which by the above observation are the reciprocals of the singular values of Z. It is natural to choose this block to make the residual R = AX . Let X bean orthonormal basis fora simple eigenspace X of A and let Y be a basis for the corresponding left eigenspace y of A normalized so that YHX = I.2.. Hence cos &i — (T~l. converge to an eigenspace of A. In this subsection we will consider the implications of a small residual.5. Chapter 1. The following theorem shows how to make this choice. X) as an approximate eigenpair. It therefore seems reasonable to stop when Rk is sufficiently small. Xk) is an eigenpair of A. Vol II: Eigensystems .3. PERTURBATION THEORY Hence 251 By definition the cosines of the canonical angles between K(X) and 7£( Z) are the singular values of XHZ. where Z is any orthonormal basis for K(Z). Finally. If Rk is zero. RESIDUAL ANALYSIS Most methods for computing eigenspaces of a matrix A produce a sequence Xk of bases whose spans. We will actually prove a little more. In particular. we will develop residual bounds for the error in (Z. now follows from (2.= 0{.4. Corollary 2.

Consequently. but in fact the division is implicit in X1. then the Rayleigh quotient is an eigenblock. To make it explicit. Set Then The Rayleigh quotient The quantity L = XHAX is a generalization of the scalar Rayleigh quotient of Definition 1.2. If X is an eigenbasis of A. Let(X XL) beunitary. if X is near an eigenbasis of A.suggests the following definition. Moreover.2 we may expect that if u is an eigenvector of L then Xu will be near an eigenvector of A. where we consider the Rayleigh-Ritz method. Then in any unitarily invariant norm \\R\\ and \\S\\ are minimized when in which case Proof. It may seem odd to have a quotientless Rayleigh quotient. Theorem 1. The matrix X^AX is not the most general matrix that will reproduce an eigenblock when X is an eigenbasis. Then the quantity X1 A X is a RAYLEIGH QUOTIENT of A. by Theorem 1. LetR = AX-XLandSH = XHA-LXH. the proof for S being similar.6. . We will prove the result for J?.5. let X be of full rank and let YHX be nonsingular.7. Let X be of full column rank and letX1 be a left inverse ofX. Then (Y^X)'1 YH is a left inverse of X and (YHX)-1YH AX is a Rayleigh quotient. Chapter 2.252 CHAPTER 4. we may expect the Rayleigh quotient to contain good approximations to the eigenvalues of A. We will give substance to these expectations in §4. Definition 2. EIGENSPACES Theorem 2.

3. 2. The following theorem generalizes this result.Ax. This does not guarantee that (I.3. then (L. but for it not to be (£.3. The proof is a generalization of the proof of Theorem 1.6. Briefly. sensitive to small changes in A + E. The singular values of this matrix are ±<T. X) must be ill conditioned—that is. Moreover. x) is an exact eigenpair of a perturbed matrix A + E. Proof. which can be effected by observing that in the notation of the proof of Theorem 2. Let X be orthonormal and let Then there is a matrix such that an/"f If A is Hermitian and then there is a Hermitian matrix satisfy ing (2 A}. X) is an exact eigenpair of a sightly perturbed matrix A -f E. VolII: Eigensystems . X) is a good approximation to the eigenpair. • The import of this theorem is discussed just after Theorem 1. PERTURBATION THEORY Backward error 253 In Theorem 1. Chapter 2. Since the singular values of G and R are the same. then (A. Theorem 2. and the square of the Frobenius norm is twice the sum of squares of the singular values of R. where the az are the singular values of G. Thus basing convergence on the size of the residual guarantees that wellconditioned eigenpairs will be determined accurately..8. we saw that if r = Ax . The only tricky point is the verification of (2. the 2-norm of this matrix is the largest singular value of R. Chapter 2. Chapter 2.SEC.6). if the residual is small. where E is of a size with r.

EIGENSPACES Residual bounds for eigenvalues of Hermitian matrices By combining Theorems 3. We will see in §2. A(Z) C A(A+ E).9. of A such that and The factor \/2 in (2.7) can be eliminated (see the notes and references)..10 in Chapter 1 with Theorem 2.8 we can obtain residual bounds on the eigenvalues of a Hermitian matrix. A.. Specifically.8) is almost block triangular. consider the similarity transformation For the (2. . Corollary 2. . Let (X X±) be unitary and let Then by Theorem 2.254 CHAPTER 4. We will derive bounds by nudging it into full block triangularity. we must have PL — MP = G . Block deflation Although the small backward error associated with a small residual is a powerful reason for using the residual norm as a convergence criteria. ||J?|| = ||^4^— XL\\ .l\ be the eigenvalues of L.8 and 3. Then there are eigenvalues A 9 1 .1) block of the right-hand side to be zero. which can be written in the form . if R is small. theright-handside of (2. Specifically. . Consequently. Since the eigenvalues of A + E and A. arranged in descending order.PHP.. We will derive such bounds by a process of block deflation.6 we know that in any unitarily in variant norm ||-||. Let i\. . cannot be separated by more than \\E\\2. we have the following corollary. the question remains of how accurate the approximate eigenspace is. .\\G\\.4 that if we know more about the eigensystem of A then we can get tighter residual bounds..

Let Then ( X .SEC. The theorem shows that (L + HP. In the matrix let where S is defined by If there is a unique matrix P satisfying PL — MP = G — PHP and such that The spectra of L + HP and M — PH are disjoint. PERTURBATION THEORY 255 where S is the Sylvester operator Equation (2. L) is an approximate eigenpair of B with residual norm \\G\\. Theorem 2.4 the singular values of P are the tangents of the canonical angles between X and X. However. the largest canonical angle Q\ between X and X satisfies Vol II: Eigensystems . X) is a simple eigenpair of B with complementary eigenblock M . We will first discuss this theorem in terms of the matrix B and then. For the proof of this theorem see the notes and references.10. By Theorem 2.9) is nonlinear in P and has no closed form solution. under appropriate conditions its solution can be shown to exist and can be bounded. Consequently.8). in terms of the original matrix A in (2. if || • || is the spectral norm. later. Let || • || denote a consistent norm. 2.PH.

.M) > 0. It is useful in assessing the separation of diagonalizable matrices.15. . M + F) . 2.Z/. Finally. . 4. 1. Item 2 is a continuity property. It says that small changes in L and M make like changes in sep. which are stated without proof in the following theorem.and L = diag(ii . . Ifthenorm \\ • \\ is absolute (i. where S is the Sylvester operator defined by (2.10). Chapter 1. £ < ) then and M = diag(MI . as Example 3. Ifsep(Z. Item 1 provides an alternative definition of sep.11. .. which appears in (2. M} can be much smaller than the physical separation. . By Corollary 1. Then the following statements are true. M m ).256 The quantity sep CHAPTER 4. Thus the poorer the separation.11). . The quantity sep has a number of useful properties. Chapter 1. \\A\\ = |||A|||. M).t It is a (not entirely trivial) exercise in the manipulation of norms to establish these properties. .16. If L and M are Hermitian.17. Unfortunately. Theorem 2. Chapter 1. so that sep is a lower bound for the physical separation 6 of the spectra of L and M. |sep(£ + £. Items 3 and 4 exhibit sep for matrices of special form. By Theorem 1. item 5 shows how far sep may be degraded under similarity transformations. M) = US"1]]"1. it is nonzero if and only if the spectra of L and M are distinct. This fact will be important later. Let sep be defined by a consistent norm \\-\\. shows. EIGENSPACES The quantity sep(.e. 3.11. the larger the canonical angle between X and X. Chapter 1. then in the Frobenius norm 5. . when we derive perturbation bounds for eigenspaces. N) introduced in Theorem 3. is a generalization of the quantity sep(A.sep(£. then sep(£. sep(i. M)| < \\E\\ + \\F\\.

and let If in any unitarily invariant norm then there is a simple eigenpair (£.6 \\R\\ = \\G\\ and \\S\\ = ||#||. PERTURBATION THEORY 257 Residual bounds We now return to our original matrix A and derive residual bounds for the approximate subspaces. 2. X) of A satisfying and Proof. which is large and generally dense. The bound (2. We shall see later that we can approximate a condition number for an eigenblock.X^AX± be the Rayleigh quotients corresponding to X and X±. X + A^P). • From a mathematical standpoint. Theorem 2.10 accumulating transformations.SEC. Let L = XHAX and M . M). provided we have an approximation to the corresponding left eigenbasis. Let X be an approximate eigenbasis for A and let(X X±_) be unitary. If we now apply Theorem 2. one must have additional information about the disposition of the eigenvalues to use the theorem. we see that if (2. the size n of the matrix will be very much greater than the number k of columns of the matrix X.X) approximates an actual eigenpair of the matrix in question. It provides conditions under which a purported eigenpair (L. this theorem does exactly what we need.14) follows from Theorem 2. Write Then by Theorem 2. where The bound (2. however. we cannot compute M. Although. we can compute L. Vol II: Eigensystems . and S easily enough. R.13) follows on taking the norm of the difference of L and L + HP. In large eigenproblems. or the quantity sep(£.4.12.12) is satisfied then there is an eigenblock of A of the form (L + HP. but when it comes to eigenspaces of very large matrices.

we can do this by regarding the exact eigenspace of A as an approximate eigenspace of A and applying Theorem 2.11 the quantity 6 is a lower bound on sep(L+Fu.12.20) follow upon accumulating the transformations that reduce A to block triangular form. EIGENSPACES In the last subsection we considered the problem of bounding the error in an approximate eigenspace. In this subsection we will treat the problem of bounding the perturbation of an exact eigenspace when the matrix changes. simple.18) that block triangularizes the matrix (2.16). By item 2 of Theorem 2. The spectra ofL and M are disjoint Proof.3.258 2.10 guarantees the existence of a matrix P satisfying (2. The expressions (2.19) and (2. Theorem 2. Let A have the spectral representation Let A = A + E and set Let || • || be a consistent family of norms and set If then there is a matrix P satisfying such that and are complementary. • We will now examine the consequences of this theorem. PERTURBATION THEORY CHAPTER 4.Hence if (2. right and left eigenpairs of A. The perturbation theorem Given a simple eigenspace of a matrix A. In practice. Theorem 2. In principle. we wish to determine when there is a nearby eigenspace of a perturbation A = A + E of A and to bound the angles between the two spaces. M-\^22).17) is satisfied. we obtain better bounds by proceeding indirectly through a spectral decomposition of A. .13.

13. Specifically.13) actually represents a continuum of results. M. we will seldom have access to these matrices. There is. since the matrices L. 2. We will set Bounds in terms of E The bounds of Theorem 2. these bounds can be conveniently written in the terms of 0xy defined in (2.4). 245). Vol II: Eigensystems .SEC. In practice. the matrices Xi and ¥2 are assumed to be orthonormal. since spectral representations are only partially unique (see p. no single choice of spectral representation that is suitable for all applications. With these bounds. however. unfortunately. we may redefine and replace the condition (2.17) by so that the bound (2. In the 2-norm. and Fij also change. In consequence (Theorem 2. In Theorem 2. but the one that comes closest is the the standard representation in which X\ and Y2 are orthonormal.18) becomes Thus how X\ and y\ are situated with respect to one another affects the condition for the existence of the perturbed eigenspace. In this case we must weaken the bounds by replacing \\Fij\\ by ||y^H||||£'||||Xj||.21).13 are cast in terms of the matrices Fij because they are the primary quantities in the proof of the theorem. but it does not affect the final bound on \P\\2. What we are likely to have is a bound on some norm oi E. The variants do not say the same things. In the remainder of this subsection we will therefore make the following assumptions.10) there is a matrix Q such that The singular values ofQ are the tangents of the canonical angles between X^ = H(Xi) andyi = n(Yi) (Theorem 2. PERTURBATION THEORY Choice of subspaces 259 Theorem (2.

4 to obtain this relation.260 The Rayleigh quotient and the condition of L Let CHAPTER 4. Angles between the subspaces In the relation Xi = Xi -f X^P. from the above bounds Thus the difference between the Rayleigh quotient and the exact representation goes to zero as the square of ||£"||2We also have Hence as E —» 0 Thus the secant of the largest canonical angle between X^ and y\ is a condition number for the eigenblock L that determines how the error in E propagates to the representation L. It is natural then to ask how this measure relates to the canonical angles between the two spaces. the matrix L is a Rayleigh quotient in the sense of Definition 2. Chapter 1. From (2. Thus we have the following generalization of the assertion (3. the size of P is a measure of the deviation of X\ = K(Xi) from Xi.7.21) we have . Unfortunately. But with a little manipulation we can find a basis for X\ to which Theorem 2. This Rayleigh quotient is an especially good approximation to the eigenblock of A. EIGENSPACES Since Y^Xi = /. we could apply Theorem 2. If Xi were orthonormal and Y^Xi were zero. neither condition holds. In fact.4 does apply.24).

13. It follows that / -}.SEC. Vol II: Eigensystems . M) with sep(L. so that when E is small the bound is effectively 11E \ \ 2 / 8. Since X\ and Y2 are orthonormal and Y^Xi = 0. M) is a condition number for X\. • The denominator of (2. 6 itself approaches sep(i. • As E. approaches 6 as E —»• 0. Moreover. M) we can generalize the proof of Theorem 3. Chapter 1. Thus sep(Z.23) and (2.27) we have the following corollary.28). which is necessarily positive. it follows from Theorem 2.22) (remember sec2 0xy = 1 + tan2 #xy). In summary: Let A have the standard spectral representation X\LY^ + X^MY^. M)""1 is a condition number for the eigenspace K(Xi).14. Then in the 2-norm the quantity sep(jL. since for small 9 we have 6 = tan 0. In fact. Then where Here are two comments on this corollary. —»• 0 the matrix P(I + QP)~l —> P. On taking norms and using (2. Let 0xy be the largest canonical angle between X\ and 3^i and let Q(X\. PERTURBATION THEORY 261 the last inequality following from (2. 2. • If we are willing to replace sep(L.QP is nonsingular and We can therefore write Now the subspace spanned by Xi(I+QP) l is still X\. Corollary 2. the singular values approximate the angles themselves. for E small enough the singular values of P are approximations to the tangents of the canonical angles between X\ and X\.4 that the tangents of the canonical angles between X\ and X\ are the singular values of P(I + QP}~1. this approach has the disadvantage that it assumes rather than proves the existence of X\. Consequently. M). AI) be the largest canonical angle between X\ and X\. to obtain the following bound: However.

Let the Hermitian matrix A have the spectral representation where (X Y) is unitary. For Hermitian matrices we can often say a priori that the required eigenspaces exist.8. y) will denote the the diagonal matrix of canonical angles between the subspaces X and y. Residual bounds for eigenspaces We begin with the simpler of the two results on eigenspaces treated here. The fact that we know these eigenspaces exist before we start enables us to get direct bounds that are simpler and more useful than the ones developed above.2 addresses two logically distinct problems: the existence of certain eigenspaces of A and the bounding of the error in those eigenspaces. As usual. There we required that the spectra of two Rayleigh quotients corresponding to the approximate subspaces be disjoint. The proofs.15. If ||G||2 is less than f.4. Here the separation is between the certain eigenvalues of A (those of L) and eigenvalues associated with the . Theorem 2. suppose that the Hermitian matrix A has the form and that 6 = min |A(£) . will be omitted. EIGENSPACES 2. which are not needed in this work. Chapter 1. The eigenspaces corresponding to these two groups of eigenvalues are candidates for bounding.12. the eigenvalues of A must divide into two groups—those that are within ||G||2 of A(L) and those that are within ||<7||2 of A(M). ©(A'. RESIDUAL BOUNDS FOR HERMITIAN MATRICES The residual bound theory of §2.262 CHAPTER 4. then by Theorem 3. Let the orthonormal matrix Z be of the same dimensions as X.A(M)| > 0. If then The separation hypothesis (2. The purpose of this subsection is to state theorems containing such direct residual bounds.30) is different from that required by the residual bounds of Theorem 2. For example. and let where N is Hermitian.

For individual angles the bound will generally be an overestimate. 7£(Z)) — i.30) requires only that the eigenvalues values of M and N be decently separated. PERTURBATION THEORY 263 approximating subspace (those of N). The price we pay is that we can only bound the Frobenius norm of sin 0(7£(JQ. The condition (2.15 than that of Theorem 2.9 we showed that if R — AX — XL is small then there are eigenvalues of A lying near those of L. Residual bounds for eigenvalues In Corollary 2. In the notation of Theorem 2.32) bounds the largest canonical angle between the two spaces. they are free to interlace with eigenvalues of M lying between those of N and vice versa. Let (Z Z_\_] be unitary. where Z has k columns. the separation criterion is less stringent in Theorem 2.. The theorem reflects an important tradeoff in bounds of this kind. in which case (2. The bound is of order \\R\\. Theorem 2. Let Let VolII: Eigensystems . We can resolve this problem by restricting how the eigenvalues of M and N lie with respect to one another.The following shows that if we can specify a separation between the eigenvalues of L and a complementary set of eigenvalues of A then we can reduce the bound to be of order \\R\\2Theorem 2.SEC. the square root of the sum of squares of the sines of the canonical angles.17. Moreover. Otherwise. and the final bound is smaller—the benefits of being Hermitian. The natural norm to use is the 2-norm. 2.e.16.15 suppose that and for some 8 > 0 Then for any unitarily invariant norm The conditions on the spectra of N and M say that the spectrum of N lies in an interval that is surrounded at a distance of 6 by the spectrum of M.12. The bound is now in an arbitrary unitarily invariant norm.

R|| < 6 and righthand side of (2. Optimal residuals and the Rayleigh quotient Theorem 2. then for R small enough. But if we know that (in the notation of the last paragraph) there is a set C± of eigenvalues of A with rnin{|A(M) .6 for Hermitian matrices is due to Kahan [142. However. Wilkinson [300. NOTES AND REFERENCES General references For general references on perturbation theory see §3. in sparse applications it has the disadvantage that we do not know N. Earlier. Stewart [252. EIGENSPACES If then The theorem basically says that if A has a set £ of eigenvalues that are close to A(£) and that are removed from A(M) by 6.1875].1973] stated the result explicitly for nonHermitian matrices and gave the formula (2. Chapter 1. In this case Yi is very large.2).31) on the spectra. given a basis for y one can compute an orthogonal matrix whose first k columns span y as a product of k Householder transformations (Algorithm 1:4. For a more complete treatment of this subject see Volume I of this series or [269. and has often been rediscovered.2) for the optimal residual R. In our applications n will be very large compared with the common dimension k of the subspaces. and it is impossible to compute Y±X directly. who recommend computing the sines of the angles. 2. 264] proved the result for vectors.264 CHAPTER 4. These transformations can then be applied to X to give Y±X in the last n—k rows of the result.5.2|| J2||2).1. . However. Canonical angle The idea of canonical angles between subspaces goes back to Jordan [138. p. the eigenvalues £ and A(i) can be paired so that they are within O(||J?||2) of each other. I]. 1967] (his proof works for non-Hermitian matrices). This theorem has the great advantage that we do not have to put restrictions such as (2.£_L|} = <!>.1965. Ch. The definitive paper on computing canonical angles is by Bjorck and Golub [26]. then we can write Hence the condition \\R\\2 < # can be replaced by the condition 3||.3.33) by \\R\\%/(6 .

3. The greater accuracy of Y^AX\ is due to the fact that we use approximations to both the left and right eigenbases in its definition. is accurate only up to terms of order \\E\\. Perturbation theory The passage from residual bounds to perturbation bounds was given in [254]. and Jiang [144] produce perturbations that are optimal in the spectral and Frobenius norms.5) is optimal (also see [64. and of course they have trivial variants that deal with left eigenspaces. 244]. Residual bounds for eigenspaces and eigenvalues of Hermitian matrices The residual bounds for eigenspaces are due to Davis and Kahan [54. for example. in which the transforming matrix is triangular. Residual eigenvalue bounds for Hermitian matrices Corollary 2. is simpler than the original.SEC. If we have small right and left residuals R = AX .11]).29) is due to Ipsen [126]. 1977] shows that (2. we can generate a backward error and apply a perturbation theorem to get a residual bound. The theorems stated here all have to do with approximate right eigenspaces. For more on the history of such bounds see [269.9 is due to Kahan [142. KRYLOV SUBSPACES Backward error 265 For Hermitian matrices in the vector case. who prove much more in their ground-breaking paper.4.1970].14]. Vol II: Eigensystems . we can regard the greater accuracy as due to the fact that the matrix (2. The Rayleigh quotient X^AXi. A treatment of other theorems of this kind may be found in [206]. Kahan. The approach taken here.7).1967]. It is worth noting we can go the other way. Theorem 9. 1971]. The result can be generalized to unitarily invariant norms [269.16) is an off-diagonal perturbation of the block diagonal matrix consisting of the two Rayleigh quotients Y^AXi and F2H AX2The bound (2. Block deflation and residual bounds The use of block triangularization to derive residual bounds is due to Stewart [250. The fact that the Rayleigh quotient (2. p. a result of Dennis and More [63. who showed that the factor \/2 can be removed from the right-hand side of (2.24) is accurate up to terms of order \\E\\2 shows that not all Rayleigh quotients are equal. The residual bounds for eigenvalues are due to Mathias [174] and improve earlier bounds by Stewart [261]. Parlett. which used an orthogonal transformation. who also introduced the quantity sep. Given a residual. Theorem FV. Alternatively.XL and 5H = YUA ~ Z H y H it is natural to ask if there is a perturbation that makes X a right eigenbasis and Y a left eigenbasis.

Let A be of order n and let u ^ 0 be an n vector. In the first subsection we will introduce Krylov sequences and their associated subspaces. The solid line in Figure 3.. Aui..) The swift convergence of an approximation to the dominant eigenvector in the subspace spanned by ui. The first stage generates a subspace containing approximations to the desired eigenvectors or eigenspaces. .. since the ratio of the subdominant eigenvalue to the dominant eigenvalue is 0. EIGENSPACES 3...1... KRYLOV SEQUENCES AND KRYLOV SPACES Introduction and definition The power method (§ 1. Definition 3. Chapter 2) is based on the observation that if A has a dominant eigenpair then under mild restrictions on u the vectors Aku produce increasingly accurate approximations to the dominant eigenvector. A diagonal matrix A of order 100 was generated with eigenvalues 1. in §3. The convergence is much more rapid. It turns out that this information is valuable.3 we will introduce block Krylov sequences.95". of which more later. as it must be... KRYLOV SUBSPACES Most algorithms for solving large eigenproblems proceed in two stages. The dashed line gives the tangent between the dominant eigenvector and the space spanned by «i. We begin with a definition.. In this section we will be concerned with the most widely used method for computing subspaces containing eigenvectors—the method of Krylov sequences. (The dash-dot line shows the convergence to the subdominant eigenvector. The second stage extracts approximations to the eigenvectors from the subspace generated in the first stage. reducing the error by eight orders of magnitude. — Aku\ were generated. Starting with a random vector ui.. Then the sequence is a KRYLOV SEQUENCE BASED ON A AND u. the vectors uj..266 CHAPTER 4. Ak~lu\ justifies studying such subspaces.2. We call the matrix . as the following example shows..95. If the approximations are not satisfactory. Uk.952. Finally. they may be used to restart the first stage. It is seen that the convergence is slow.1 shows the tangent of the angle between Akui and the dominant eigenvector of A. However..953. which amounts to throwing away the information contained in the previously generated vectors. at each step the power method considers only the single vector Aku. 3.1. Example 3..95.2. We will then consider the accuracy of the approximates produced by the method in §3.

1: Power method and Krylov approximations the kth KRYLOV MATRIX. 3. The proof is left as an exercise.3. When A or u are known from context. Elementary properties The basic properties of Krylov subspaces are summarized in the following theorem. Thus we may write ICk(u) or simply /C^. I f t r ^ O .SEC. Let A and u ^ 0 be given. we will drop them in our notation for Krylov matrices and subspaces. Theorem 3. Vol II: Eigensystems . Then 1. KRYLOV SUBSPACES 267 Figure 3. The space is called the kth KRYLOV SUBSPACE. The sequence of Krylov subspaces satisfies and 2.

if v — p(A)u for any polynomial of degree not exceedingfc—1. and In other words the Krylov sequence terminates in the sense that the Krylov vectors after the first one provide no new information.k(A.( A. then The second item in the above theorem states that a Krylov subspace remains unchanged when either A or u is scaled.4. then Aku — Xku. we will say that a Krylov sequence terminates at t if I is the smallest integer such that The following theorem states the basic facts about termination. The space ICk(A. The polynomial connection Krylov subspaces have an important characterization in terms of matrix polynomials. In particular if A is Hermitian or nondefective. For any K. u).268 3. u] can be written in the form Hence if we define the matrix polynomial p( A) then v = p(A)u. we may diagonalize it—a strategy that sometimes simplifies the analysis of Krylov sequences. u) is an eigenpair of A. Finally. . EIGENSPACES 4. the fourth says that the space behaves in a predictable way under similarity transformations. CHAPTER 4.then v G K. If W is nonsingular. By definition any vector v in £*. Theorem 3. The third says that it is invariant under shifting. More generally. Conversely. u) can be written in the form Termination If (A. Thus we have proved the following theorem.

. Let u be a starting vector for a Krylov sequence.. (3. which is impossible. it stops furnishing information about other eigenvectors. If the sequence terminates at I. KRYLOV SUBSPACES 269 Theorem 3. On the other hand. = x^u. then for some I < m. Am~l u. If u is in an invariant subspace X of dimension ra. n). The general approach In what follows.SEC.1 that a sequence of Krylov subspaces can contain increasingly accurate approximations to the certain eigenvectors of the matrix in question.£ is an invariant subspace of A. Experience has shown that these eigenvectors have eigenvalues that tend to lie on the periphery of the spectrum of the matrix.. By Theorem 3.2. then so that K. if a Krylov sequence terminates..2) is satisfied. In this subsection we will give mathematical substance to this observation—at least for the Hermitian case. 3. is the smallest integer for which If the Krylov sequence terminates at I. we will assume that A is Hermitian and has an orthonormal system of eigenpairs (A. it contains exact eigenvectors of A. where p(X) is a polynomial of degree not greater than k. Xi) (i = 1. Au. 3. CONVERGENCE We have seen in Example 3. Since p(A)x{ = p(\i)x{. that the Krylov subspace converges to the vector. the sequence terminates at I. somewhat illogically.. then /Q is an eigenspace of A of dimension t.. and write u in the form where a.4 any member of fCk can be written in the form p(A)u. On the other hand.5. the sword is especially difficult to wield in the presence of rounding error..1). As we shall see. then so are the members of the sequence w.. ifu lies in an eigenspace of dimension m. • Termination is a two-edged sword. we have Vol II: Eigensystems . In this case we will say.2) follows from (3..1) follows from the fact that £/ C /Q+i. A Krylov sequence terminates based on A and u at I if and only iff. Proof. On the other hand if (3. Now if the sequence terminates at I > m it follows that m = dim(/Q) > dim(/Cm) = dim(A'). Equation (3. On the one hand.

We begin by observing that if u has the expansion (3. In the above notation. assume that . we will have a bound on the accuracy of approximations to X{ in the sequence of Krylov subspaces.270 CHAPTER 4.6. X{) is an upper bound for tanZ(xj. The following theorem quantifies this observation. then p(A)u £ fCk and tan /. Hence Thus if we can compute as a function of k. Hence It follows from (3.3) then |aj|/||w||2 is the cosine of the angle 9 between u and Xi and -y/Sj^i |aj |2/||w||2 is the sine of 0. We can therefore assume that p(Xi) = 1 and write If deg(p) < k — 1.(p(A)u.4) that To apply this theorem note that the right-hand side of (3. EIGENSPACES Thus if we can find a polynomial such that p(\i) is large compared withp(Aj) (j ^ i). As it stands.5) does not depend on the scaling of p. To simplify it. Theorem 3. then Proof./C&). then p( A)u will be a good approximation to X{. this problem is too difficult. if x^u / 0 andp(Xi) / 0.

Then the following assertions are true. Thus we are lead to the problem of calculating the bound This is a well-understood problem that has an elegant solution in terms of Chebyshev polynomials. to which we now turn. Theorem 3. Chebyshev polynomials The Chebyshev polynomials are defined by Chebyshev polynomials are widely used in numerical analysis and are treated in many sources. Then 271 is an upper bound for (3. KRYLOV SUBSPACES and that our interest is m the eigenvector x\. we collect them in the following theorem.7. Vol II: Eigensystems . Rather than rederive the basic results. 3. Let Ck(i) be defined by (3.8).6).SEC.

we have the following theorem. the bound hp. xi). /C^).(x\.rnmps .1].8. Theorem 3.272 CHAPTER 4. For if 77 is small. EIGENSPACES Convergence redux We now return to the problem of bounding tan /.9).k and write • As with the power method. It is easily verified that the change of variables is The value of/^ at AI is Hence the value of a\^ is l/Cfc_i(//i). However. Let theHermitian matrix A have an orthonormal system ofeigenpairs (Af. which by (3. For k large we can ignore the term with the exponent 1 .t Then Three comments. In view of (3. in this case the bound gives us a bonus. with its eigenvalues ordered so that T.10) to compute <7jt. provided we change variables to transform the interval [A n .7) is equivalent to finding We can apply (3.p. a nearby subdominant eigenvalue will retard convergence. • The bound appears to be rather complicated. AI] to [-1. but it simplifies—at least asymptotically.

1.33-10~4. There 77 = 0. Multiple eigenvalues The hypothesis AI > A 2 in Theorem 3.053. Hence Theorem 3.23) = 0.95 = 0. it is important to keep in mind that the smallest eigenvalues of a matrix—particularly a positive definite matrix—often tend to cluster together. suppose that Aj = \2 > AS.(t > 2) can be derived by considering a polynomial of the form where A is the Lagrange polynomial that is zero at AI ._i and one at A.5) that where Bounds for A. .23. — For example. to derive a bound for x 2 .. The square root of this number is about 0. To see this. the value of 77 corresponding to the smallest eigenvalue is about 0. Note thatp(Ai) = 0 andp(A 2 ) = 1-It: tnen follows from (3. 3.. For example. consider the matrix in Example 3.3. let where q(t) is a polynomial of degree not greater than k — 2 satisfying q( A 2 ) — 1. Let u be given. However.69.97.SEC. and expand u in the form Vol II: Eigensystems .1.5/0..95 for the power method. so that the bound will be unfavorable. and the bound on the convergence rate is 1/(1 + 2-0. In Example 3.8 can be used to bound convergence to the eigenvector corresponding to the smallest eigenvalue of A. £3.. so that the bound on the convergence rate is about 0. • By item 2 in Theorem 3.8 can be relaxed. the Krylov subspaces of A remain unchanged when A is replaced by —A. A. The general approach can be used to derive bounds for the remaining eigenvectors z 2 . KRYLOV SUBSPACES 273 The square root increases the influence of rj on the bound.. Thus the square root effect gives a great improvement over the rate of 0.

The same procedure applied to the dot-dash line yields an optimal bound of about 6-10~3. we may not have a complete system of eigenvectors to aid the analysis. if the matrix is nonnormal. the graphs show that any exponentially decreasing bound of the form apk must be weak. the bound (3. Example 3.9.8 continues to hold when AI is multiple provided we adopt the convention that only one copy of AI is listed in (3.2-10"6. Actually. Thus part of the problem is with the mathematical form of our bounds. Simply take a ruler and draw a line from the common initial point that is tangent to the dashed line (at about k = 12).0-10"3.274 CHAPTER 4. a poor prediction. Non-Hermitian matrices The theory of Krylov subspaces for non-Hermitian matrices is less well developed. Immediately after the first step they range from 0. the bounds provide support for the empirical observation that Krylov sequences generate good approximations to exterior eigenvalues. while a glance at the graph in Figure 3.0-10~3.1 shows that the convergence ratios are steadily decreasing. I* is just as though (Ai. The following example shows that a change in geometry can have a major effect on the rates of convergence. To summarize. which is the best we can expect and compares favorably with the bound 2. The bound predicts a value for tan Z. = 25isabout7-10~ 4 . Its value at A. The last three convergence ratios are 0.55 to 0. where the actual value is 1. but then they drop off. The largest eigenvalue was recomputed . which does not reflect the true behavior of the Krylov approximations. £20) °f 2. This phenomenon is not accounted for by our theory. u) contain only approximations to ai^i -f 0:2X2. the geometry of the eigenvalues is generally more complicated.5-10"7. Then This shows that of all the vectors in the two-dimensional invariant subspace that is spanned by xi and #2. The dot-dash line in Figure 3. The problem is that the theory predicts a convergence ratio of 0. The reason is that the bounds must take into account the worst-case distribution of the eigenvalues./C 20 ).0. the spaces ICk(A.11).1. Second.22.31.12) continues to hold when A! and A 2 are multiple. Thus Theorem 3.1 is tan Z(£ 2 . if we retain only one copy of AI and A2.67.0-10"2 — once again. otiXi + #2X2) were a simple eigenpair of A. A normal (actually diagonal) matrix was constructed whose eigenvalues were those of a random complex matrix.(x\. Similarly. whereas the bound predicts a value of 4.75. First. and 0. We will return to this point in a moment. Assessment of the bounds It is instructive to apply the bounds to the Krylov sequence generated in Example 3.26. The final value is 5. EIGENSPACES where the Xi are the normalized eigenvectors corresponding to A ? . but they can underestimate how good the approximations actually are — sometimes by orders of magnitude. There are two reasons.

the eigenvalues were altered so that the ratio of the absolute values of the successive eigenvalues became 0. The convergence.2: Power method and Krylov approximations so that the absolute value of its ratio to the second largest eigenvalue was 0. In a third experiment. Recall that the bound failed to predict the swift convergence to xi for the matrix from Example 3.95 white their arguments remained the same. Vol II: Eigensystems .95 (as in Example 3. There is a polynomial connection for non-Hermitian matrices. the bounds cannot predict swifter convergence. illustrated by the dashed line. To vary the geometry.SEC. It turns out that the third experiment in Example 3. Since it converges more slowly.1.1. has exactly the same bounds. KRYLOV SUBSPACES 275 Figure 3. so that the matrix is nowHermitian. The solid line in Figure 3. is intermediate.1.2 shows the tangent of the angle between the kth Krylov subspace and the dominant eigenvector.8.1). The example does have something to say about the convergence bounds from Theorem 3. the eigenvalues from the first experiment were replaced by their absolute values. in which the matrix is Hermitian. 3. The convergence is swift as in Example 3. The convergence is not much better the power method in Example 3. It would be foolish to conclude too much from these examples.9. The dash-dot line shows how well the Krylov subspace approximates the eigenvector. It is clear that geometry matters and perhaps that exponentially decaying eigenvalues are a good thing.

Let A be a simple eigenvalue of A and let be a spectral representation.3 Hence for any polynomial p of degree not greater than k-1 with p(\i) = 1. Actually. EIGENSPACES Theorem 3. This is not. /j ^ A is an eigenvalue of A with a Jordan block of order ra. in which case the bound becomes (after a little manipulation) where A 2 .10. it is not sufficient to simply make p small at //. . Suppose. What results exist suggest that under some conditions the Krylov sequence will produce good approximations to exterior eigenvectors. for example. Let where Then Proof. an easy task.276 CHAPTER 4. it is a continuum of theorems. . By Theorem 2. in general. if A is nondefective we can take L to be diagonal and the columns of X to have norm one. The theorem also illustrates the difficulties introduced by defectiveness and nonnormality. since spectral representations are not unique. For example. Then for p(L) to be zero. The theorem suggests that we attempt to choose polynomials p with p(\) = 1 so that p(L) is small. For more see the notes and references.n)m. In other words.6. . . the polynomialp(r) must have the factor (r . An are the eigenvalues of A other than A. we have This theorem is an analogue of Theorem 3. since the geometry of the eigenvalues can be complicated. This implies that we cannot generalize the approach via Chebyshev polynomials to defective matrices. .

BLOCK KRYLOV SEQUENCES We saw above [see (3. A cure for this problem is to iterate with a second vector.13)] that when it comes to multiple eigenvalues. Theorem 3. say.Since the sequence in u produces approximations to otiXi -}.U) spanned by the first k members of this sequence is called the kth block Krylov subspace. Second. it has to be retrieved only once to form AU. if the matrix is stored on disk. The foregoing suggests that we generate a block Krylov sequence by starting with a matrix U with linearly independent columns and computing The space JCk(A. what we would like to approximate is the entire invariant subspace corresponding to that eigenvalue. if AI is double and the generating vector is expanded in the form the Krylov sequence can only produce the approximation a\x\ -f 0:2^2 to a vector in the eigenspace of AI. with the eigenvalues ordered so that and assume that the multiplicity of AI is not greater than m. ra vectors takes less than ra times the time for a matrix-vector product. there are two compensating factors (aside from curing the problem of multiple eigenvalues). a Krylov sequence has tunnel vision.2X2. For example.SEC. Specifically. passing to block Krylov sequences improves the convergence bound. 3.11. say Then so that ICk(A. If VolII: Eigensy stems . Let A be Hermit/an and let (Az-. Xi) be a complete system oforthonormal eigenpairs of A. v) can approximate the linear combination faxi + /32^2. First. Of course. Although it requires more storage and work to compute a block Krylov sequence than an ordinary Krylov sequence. as the following theorem shows.3. KRYLOV SUBSPACES 277 3.0. it frequently happens that the computation of the product of a matrix with a block U of. unless we are unlucky the two approximations will span the required two-dimensional invariant subspace.

In general the latter will be smaller.) not tanZ(xi. v) contains no components along #2... xm. However.11 for non-Hermitian block Krylov sequences. so that the eigenvalue following AI is A ra+ j.12). Thus Theorem 3.16) is larger than the value of 77 in Theorem 3. m). there is no convergence theory analogous to Theorem 3. then 77 defined by (3.{7). An alternative to block Krylov methods is to deflate converged eigenpairs. so that initially the bound may be unrealistic. • • • . • By the same trick used to derive the bound (3. xt-) (i = 1. we can get convergence rates for the vectors #2. . The two approaches each have their advantages and drawbacks. But the reader should keep in mind that there are important block variants of many of the algorithms we discuss here. EIGENSPACES where Proof. it is as if we were working with a matrix in which the pairs (A. the order constant in the bound istanZ(xi. • If the multiplicity of A! is less than m. By construction v€K(U). m) had been deleted.11 predicts faster convergence for the block method.15).. Since deflation is vital in other ways to many algorithms. so that the Krylov sequence can converge to a second eigenpair corresponding to a repeated eigenvalue. x3-> — The argument concerning multiple eigenvalues suggests that we can use block Krylov sequences to good effect with non-Hermitian matrices.. Hence JCk(A.278 is nonsingular and we set then CHAPTER 4.-.. Hence the Krylov subpace /Cjt(A. • Three comments. we will stress it in this volume.8 to get (3. . We may now apply Theorem 3..8.t. However.U)so that From the definition of v it is easily verified that xfv = 0 (i = 2.v) C ICk(A..

NOTES AND REFERENCES Sources 279 Three major references for Krylov sequences are The Symmetric Eigenvalue Problem by B. Chatelin [39]. if the Krylov sequence terminates at i [see (3. he states Moreover. The solution of eigenvalue problems by forming and solving the characteristic equation is a thing of the past.1]. these books contain historical comments and extensive bibliographies. Parlett [206]. N. the focus of this section has been on the spectral properties of Krylov sequences. §6. in which he proposed the algorithm that bears his name (Lanczos himself called it the method of minimized iterations). then there are constants ao. The first hint of the power of Krylov sequences to approximate eigenvectors is found in Lanczos's 1950 paper [157]. In much of the literature the eigenvectors of the matrix in question are denoted generically by u (perhaps because u is commonly used for an eigenfunction of a partial differential equation). In the conclusion to his paper.1)]. • • • . Specifically. Krylov [153.SEC. 3. Krylov sequences are based on u. What makes the Krylov sequence important today is that it is the basis of the Lanczos algorithm for eigenpairs and the conjugate gradient method for linear systems as well as their non-Hermitian variants. Numerical Methods for Large Eigenvalue Problems: Theory and Algorithms^ Y. A word on notation. In addition to most of the results treated in this subsection. To maintain consistency with the first half of this book. 1931] introduced the sequence named for him to compute the characteristic polynomial of a matrix. the good convergence of the method in the case of large dispersion makes it possible to operate with a small number of iterations. Since this book is concerned with eigensy stems. KRYLOV SUBSPACES 3. <^-i such that It can then be shown that the polynomial divides the characteristic polynomial of A. Convergence (Hermitian matrices) It seems that Krylov had no idea of the approximating power of the subspaces spanned by his sequence. Saad [233]. to avoid underusing a good letter. eigenvectors are denoted by x. and Eigenvalues of Matrices by F. The term "Krylov sequence" is due to Householder.4. obtaining ra successive eigenvalues and principle axes by only m+1 iterations. and. and Krylov sequences are often based on a vector v. Vol II: Eigensystems . Krylov sequences According to Householder [121.

11. RAYLEIGH-RITZ APPROXIMATION We have seen how a sequence of Krylov subspaces based on a matrix A can contain increasingly accurate approximations to an eigenspace of A.10).8 are possible. Block Krylov sequences in their full generality were introduced by Underwood in his Ph. First there is a best approximation to the desired eigenspace in the larger subspace. who gave a direct proof later [233. 1966]. Theorem 3. The purpose of this subsection is to describe and analyze the most commonly used method for extracting an approximate eigenspace from a larger subspace—the Rayleigh-Ritz method.14) was stated without proof by Saad in [230. EIGENSPACES It is unfortunate that the first convergence results for Krylov spaces were directed toward the Lanczos algorithm. Convergence (non-Hermitian matrices) The inequality (3. p. This bound. In 1980 Saad [229] showed that more natural bounds could be derived by first bounding the eigenvector approximations contained in the Krylov sequence itself and then analyzing the Rayleigh-Ritz procedure separately. But it is one thing to know that an approximation exists and another to have it in hand. 1975] to generalize the Lanczos algorithm (also see [92]). Thus when an eigenvalue lies outside an ellipse containing the rest of the spectrum. 4. Theorem 3.D.1971] followed Kaniel's approach. shows that there is nothing sacrosanct about Chebyshev polynomials. correcting and refining his eigenvector bounds. The Rayleigh-Ritz method approximates that best approximation. The result is Theorem 3. using a technique related to the construction in the proof of Theorem 3. who gave the first analysis of the method. In his important thesis Paige [196. Theorem 3. incidentally. It is known that Chebyshev polynomials defined on a suitable ellipse asymptotically satisfy an inequality like (3. One is free to use them—or not use them—in combination with other polynomials. These results raise the possibility of using polynomials to get eigenvalue bounds.12). started by using Chebyshev polynomials to bound the eigenvalues produced by the Lanczos method and then using the eigenvalue bounds to produce bounds on the eigenvectors. It is important to keep in mind that there are two levels of approximation going on here.6. The result requires that A be diagonalizable.11 itself is due to Saad [229]. which is a combination of Krylov sequences and the Rayleigh-Ritz method to be treated in the next subsection. and the related bound (3. For more see [229]. Kaniel [145. 1974] in connection with a restarted Lanczos algorithm. Con- .280 CHAPTER 4. This is the approach we have taken here. thesis [280. Underwood established eigenvalue bounds in the style of Kaniel. 204]. Block Krylov sequences Block Krylov sequences — limited to small powers of A — were introduced by Cullum and Donath [45.10 removes this restriction. bounds analogous to those of Theorem 3. 1980].8.

By continuity. then there is an eigenpair (L. Then from the relation we obtain so that (L. Let (L. the analysis of the method is somewhat complicated.1. In this section we will develop the theory of the method and examine its practical implications.1. the last two subsections are devoted to ways to circumvent this problem. Vol II: Eigensystems . AW) is an approximate eigenpair of A with Tl(AW) = X. No other assumptions are made about U. We begin with a description of the method and then turn to its analysis. but that is irrelevant to what follows. Thus we can find exact eigenspaces contained in U by looking at eigenpairs of the Rayleigh quotient B. This suggests that to approximate an eigenpair corresponding to X we execute the following procedure. which treats the case when U contains an exact eigenspace of A. W) ofB such that (L. X) be an eigenpair corresponding to X and let X — UW.SEC. RAYLEIGH-RlTZ METHODS The essential ingredients for the application of the Rayleigh-Ritz method is a matrix A and a subspace U containing an approximate eigenspace of A. Proof. although its use in practice is straightforward. W) of 7? such that (L. 4. To keep the treatment simple. RAYLEIGH-RlTZ APPROXIMATION 281 sequently. The analysis shows that unless a certain regularity condition is satisfied. 4. LetU be a subspace and let U be a basis for U. the method may not produce satisfactory approximations. we will confine ourselves to the approximation of eigenvectors and will point out where the results generalize to eigenspaces. It may and often will be a Krylov subspace. Theorem 4. We will motivate the method by the following theorem. Let V be a left inverse ofU and set If X C It is an eigenspace of A. Consequently. UW) is an eigenpair of A with U(UW) = X. we might expect that when U contains an approximate eigenspace X o f A there would be an eigenpair (!•. W) is an eigenpair of B.

Two difficulties The procedure listed in (4. Of course. When the concern is with eigenvalues and eigenvectors so that the Ritz pair can be written in the form (A. W) (see Theorem 1. Thus the method can fail. ei). — 1) and suppose we are interested in approximating the eigenpair (0. But nearly zero is bad enough. Uw a Ritz vector. W) is chosen by looking at the eigenvalues of B.g. W) in statement 3.1) has two difficulties. M its Ritz block'. The space 1Z(UW) will be called the Ritz space. we T might end up with p = (1 1) . and UW its Ritz basis. even though the space U contains an exact approximation to the desired eigenvector. Second. For example. however. we have not specified how to choose the eigenpair (M.5). Uw)..1.2. It is therefore natural to collect the eigenvalues of B lying in the same chosen part of the spectrum and use them to compute (M. The second difficulty has no easy answer. in which case our approximate eigenvector becomes (1 \/2 -\/2) » which is completely wrong. a group of eigenvalues with largest real part. we obtained a matrix B = (UTU)~l UTAU of the T . the eigenpair (M. Instead we must specify the part of the spectrum that we are interested in—e. We will call W itself the primitive Ritz basis. This procedure. Example 4. Recall that in large eigenproblems we cannot expect to compute the entire spectrum.282 CHAPTER 4. Assume that by some method we have come up with the basis Then U is its own left inverse and we may take Since B is zero. We will call the pair (M. any nonzero vector p is an eigenvector corresponding to the part of the spectrum near zero. Let A = diag(0. Let us examine these difficulties in turn. we will call A a Ritz value. as the following example shows. In practice. UW) a Ritz pair. Since B itself gives no clue as to which vector to choose. First. and w a primitive Ritz vector. by perturbing U by a matrix of random normal deviates with standard deviation of 10~4. in real life we would not expect B to be exactly zero. will work only if we can insure that there are eigenvalues in B that approximate those of A in the the desired part of the spectrum. EIGENSPACES We will call any procedure of this form a Rayleigh-Ritz procedure. we have no guarantee that the result approximates the desired eigenpair of A.

4). 4 approximation to GI that we The problem arises because the process of forming the Rayleigh quotient creates a spurious eigenvalue of B near zero. It turns out that such a postulate combined with the convergence of the eigenvalues in the Rayleigh quotient allows us to prove that the Ritz spaces converge as the approximations in U converge. 4. In Vol II: Eigensystems . The orthogonal Rayleigh-Ritz procedure In its full generality the algorithm (4.1) is called an oblique Rayleigh-Ritz method because the subspaces spanned by U and V can differ. w) is a primitive Ritz pair. This suggests that we may have to postulate a separation between the approximations to the eigenvalues we desire and the other eigenvalues of the Rayleigh quotient. This form of the method has important applications (e. provided of course that VHU is nonsingular. If it is not.1) presupposes that V is a left inverse of U. to the bi-Lanczos algorithm). RAYLEIGH-RITZ APPROXIMATION form 283 The eigenvector corresponding to the smallest eigenvalue ofB (the primitive Ritz vector) is (9.4110e-01 3. we can convert it to a left inverse by replacing it with V = V (FH f/)~ H . then and hence Thus the primitive Ritz pairs are the eigenpairs of the generalized eigenvalue problem (4..SEC.3813e-01)T and the corresponding Ritz vector is This comes nowhere near reproducing the order 10 know lies in Tl(U}. But in the overwhelming majority of applications U is taken to be orthonormal and V is taken equal to U. If (//. Rayleigh-Ritz and generalized eigenvalue problems The algorithm (4.g. Symbolically: We will flesh out this equation in the next subsection.

284 CHAPTER 4. ask it to grind out another UQ with a smaller value of 0. x) of A. . Since X = UW.2. Theorem 4. so that the Ritz basis X — UW is also orthonormal. Let (M.Q that converges to A. the orthogonal Rayleigh-Ritz method has the obvious advantage that it preserves symmetry throughout the procedure. We will treat the orthogonal variant and for simplicity will only consider the case where we are attempting to approximate a simple eigenvalue and its eigenvector—by far the most common use of the method. we return to our Umachine. We apply the Rayleigh-Ritz process to obtain a Ritz pair (fj. we must show that the process in some sense converges as 0 -»• 0. Proof. We want to approximate the eigenpair (A. then M = WHBW (see Theorem 1. We will treat this problem in two stages. We call this procedure the orthogonal Rayleigh-Ritz method.6 all we need to show is that M = XHAX. the following theorem shows that for any matrix the orthogonal method also gives minimum residuals. By some method—call it the U-machine—we obtain an orthonormal basis U0 for which 0 — Z(ar. X) be an orthogonal Rayleigh-Ritz pair with respect to U.2). But if W is the (orthonormal) primitive Ritz basis. CONVERGENCE This subsection is concerned with the accuracy of the approximations produced by the Rayleigh-Ritz procedure. To justify this process. Uepe) • If the pair is unsatisfactory. First we will show that the Rayleigh quotient contains a Ritz value /J. However. and try the RayleighRitz process again. UQ] is small. we have 4. We will then treat the more delicate question of when the Ritz vector x$ converges. the primitive Ritz basis W is taken to be orthonormal.g. Then is minimal in any unitarily invariant norm. Setting the scene Consider the use of the Rayleigh-Ritz procedure in real life. By Theorem 2.3. EIGENSPACES addition. When A is Hermitian.

There is an eigenvalue U. Then there is a matrix EQ satisfying such that A is an eigenvalue of Be + EQ. Let (Ue U±_) be unitary and set Now Uj_ spans the orthogonal complement of the column space of UQ.5.5). Vol II: Eigensystems .SEC.4. RAYLEIGH-RlTZ APPROXIMATION 285 Convergence of the Ritz value Our approach to the convergence of Ritz values is through a backward perturbation theorem. Proof. Theorem 4. Let BQ be the Rayleigh quotient (4. Chapter 1) to give the following corollary.1. Hence ||2:||2 = sin 9 and hence or it follows from (4. 4. Corollary 4.Q ofBa such that where m is the order of Be.6) that Now define Then clearly We can now apply Eisner's theorem (Theorem 3.

U0p$) -* 0 along with 0. if sep(/£0.8. Then by the continuity of sep (Theorem 2. so does the sep(A. so that j_ Then For the proof of the theorem see the notes and references.7.5) for which IJLQ —*• A. under a certain separation condition. Then the corollary assures us that we only have to wait until 0 is small enough for a Ritz value to emerge that is near A. NO) in (4. CHAPTER 4.6.7) by \\E\\? (see Theorem 3. which relates Ritz vectors to the angle 0. we) be a primitive Ritz pair and let the matrix (we We) be unitary.8). To apply the theorem. suppose we have chosen a Ritz pair (whose existence is guaranteed by Corollary 4. If there is a constant a > 0 such that then asymptotically .7) suggests that we may have to wait a long time. let (fig. which uses a sequence of Krylov spaces in which m can become rather large.11). The factor H-E^H™ in the bound (4.286 Two comments. we have sin Z(x. EIGENSPACES • Assume the number m of columns of U& is bounded. NO) remains bounded below. This has important implications for the Lanczos algorithm. Corollary 4. In the above notation. Convergence of Ritz vectors We now turn to the convergence of Ritz vectors. But as we shall see later. we may replace the right-hand side of (4. at least in the limit. Thus we have the following corollary. the convergence will be 0(0).Q -» A. Chapter 1). we have Consequently. Let (//<?. • If A is Hermitian. Since \\hg\\2 < \\A\\2. Theorem 4. The key is the following theorem. In this case we do not have to assume m is bounded. U&WQ) be a Ritz pair for which U.

XQ] be the Ritz approximation to (A.:r) and \cr\ = sin/(£#. By direct computation of the Rayleigh quotient x^Axg we find that Thus the Ritz value converges at least as fast as the eigenvector approximation of x in K(U8). -f <jy. Ng) is uniformly bounded below. 4. Let (fig. we can say more. it contains the convergence hypothesis that JJLQ approaches A and the separation hypothesis that the quantity sep(//0. x and ||t/||2 = 1. in the second part of Example 4.#). RAYLEIGH-RITZ APPROXIMATION 287 This corollary justifies the statement (4.2 the separation of the eigenvalues of B is about 1. Let the eigenvalues of A be ordered so that and let the Ritz values—the eigenvalues of the Hermitian matrix U AU—be ordered so that Vol II: Eigensystems .7 we have a\ = sin Z(z0. then (7! = cosZ(#0. by Corollary 4. What is unusual for results of this kind is that it consists entirely of computed quantities and can be monitored. which accounts for the failure of the Ritz vector to reproduce the eigenvector ei. If we write XQ — 70.9) the uniform separation condition. is the Rayleigh quotient formed from XQ and A.SEC.5 suggests that the convergence of the Ritz value can be quite slow. If the uniform separation is satisfied. where y 1. We call the condition (4. Hermitian matrices When A is Hermitian.3) to the effect that eigenvalue convergence plus separation equals eigenvector convergence.9-10"4. Convergence of Ritz values revisited The bound in Corollary 4. Then by construction. even though it is present in U to accuracy of order 10~4. The conclusion is that the sine of the angle between the eigenvector x and the Ritz vector Uepe approaches zero. For example. x). Specifically. We will now use the fact that the Ritz vector converges to derive a more satisfactory result. x) = O(9).

Corollary 4. x) be an approximation to (A. Thus. we can use the Rayleigh-Ritz method to approximate eigenspaces with the same assurance that we use it to approximate eigenvectors. Y is an orthonormal basis for the orthogonal complement of the space spanned by x. Similarly.5. and Corollary 4. x) and let Then Proof. For more see the notes and references. we must look to the residual as an indication of convergence.#) = Y^x. Let A have the spectral representation where \\x\\% — 1 and Y is orthonormal. Then ..6. Theorem 4. Consequently. if m converges to A.11) is a spectral decomposition. Chapter 1).then it must converge from below.8. with the exception of Corollary 4. Although these results give us insight into the behavior of the Rayleigh-Ritz method. The Ritz values converge more quickly in the Hermitian case. Consequently. Residuals and convergence The results of this subsection have been cast in terms of the sine of the angle 9 between an eigenvector x and its approximating subspace U.6. in the last analysis we cannot know 0 and hence cannot compute error bounds. and hence Convergence of Ritz blocks and bases Theorem 4. Because (4. if /^converges to A n _ m+ . For in (4. and.4.7. The following residual bound actually applies to any approximate eigenpair. Theorem 4.288 CHAPTER 4. EIGENSPACES Then by the interlacing theorem (Theorem 3.7 all have natural analogues for Ritz blocks and Ritz bases. as always. Thus sin/(#. the proofs are essentially the same.10) we have x^Av = 0. Let (^. Let T = Ax — f i x . it must converge from above.

A REFINED RITZ VECTOR is a solution of the problem We will worry about the computation of refined Ritz vectors later.2. Definition 4.Q be a Ritz value associated with U$.X\ (see Theorem 2. Specifically. First let us return to Example 4.2 shows. RAYLEIGH-RlTZ APPROXIMATION 289 It follows that from which the first inequality in (4. • Since A is assumed to be simple. Vol II: Eigensystems . this theorem says that: Unfortunately.fJ>0%9 approaches zero. We therefore turn to two alternatives to Ritz vectors. By Corollary 4.2) we get the vector which is an excellent approximation to the original eigenvector ei of A. x). 4. 4.12) follows. there is a Ritz value pe that is approaching A.3.\^i . as Example 4. REFINED RITZ VECTORS Definition By hypothesis. It follows that the residual Axe . The second inequality follows from the first and the fact that sep(/z. spurious Ritz values can cause the residual to fail to converge. which showed how the Rayleigh-Ritz method could fail to deliver an accurate approximation to an accurate eigenvector contained in a subspace.5.Ve&d has the smallest norm. We have argued informally that the convergence of the refined Ritz value u-o to the simple eigenvalue A is sufficient to insure the convergence of the refined Ritz vector. L) > sep(A. the space U$ contains increasingly accurate approximations XQ to the vector in the simple eigenpair (A. L) . Hence if we define xg to be the vector whose residual Axe .SEC. Convergence of refined Ritz vectors We now turn to the convergence of refined Ritz vectors. if we solve the problem using the smallest eigenvalue of B in (4.9. Let U.11). that residual must approach zero. This suggests the following definition. The following theorem gives quantitative substance to this assertion.

jji0x0\\2 will be optimal. . are guaranteed to converge.\fj. the residual norm \\Axg . Let A have the spectral representation CHAPTER 4. EIGENSPACES where ||z||2 = 1 and Y is orthonormal.6.5 we know that fig -» A. Moreover since y and z are orthogonal. • Although the Ritz value //# was used to define the refined Ritz vector XQ. We have Hence By the definition of a refined Ritz vector we have The result now follows from Theorem 4. Let U be an orthonormal basis for U and let a: = y+z. L) . refined Ritz vectors. Let be the normalized value of y.15) that sin Z(ar. • Here are two comments on this theorem. then Proof.290 Theorem 4. by Theorem 2.10. • By Corollary 4. XQ) —> 0. It follows from (4. Let fig be a Ritz value and XQ the corresponding refined Ritz vector. In other words.g — A| > 0. where y = UUHx. Moreover. If sep(A. the Rayleigh quotient fa = x jAxg is likely to be a more accurate approximation to A.8. \\y\\\ = 1 — sin2 9. unlike ordinary Ritz vectors. Then \\z\\2 = ^U^x^ = sin#.

The situation is different when several vectors. where \\z\\2 = 1. Thus we can compute refined Ritz vectors by computing the singular value decomposition of (A . where n ~^> p. RAYLEIGH-RlTZ APPROXIMATION 291 The computation of refined Ritz vectors The computation of a refined Ritz vector amounts to solving the optimization problem Let U be an orthonormal basis for U. the cross-product Vol II: Eigensystems . which requires O(np2) operations.I)U. Chapter 3. In this case.2. Specifically.SEC. say m. the computation of a single refined Ritz vector and a single ordinary Ritz vector require the same order of work.fjLl)U corresponding to its smallest singular value. the Rayleigh-Ritz method requires only that we compute the eigensystem of H. Thus we need to solve the problem By Theorem 3. This computation must also be done by the Rayleigh-Ritz method. Hence the computation of m refined Ritz vectors will take longer than the computation of m ordinary Ritz vectors by a factor of at least m. We will assume that U is nxp. A cure for this problem is to use the cross-product algorithm (Algorithm 3. albeit with a smaller order constant. Thus all Ritz vectors can be calculated with 0(np2) operations. It is natural to compare this process with the computation of an ordinary Ritz vector. it will often happen that the formation of AU takes less time than p individual matrix-vector multiplications. which is also an 0(np2) process.fj. As with the formation of block Krylov sequences. The next nontrivial step is the computation of the singular value decomposition of W. Chapter 3) to compute the singular value decompositions. On the other hand. This suggests the following algorithm. The computation is dominated by the formation of V and the computation of the singular value decomposition. we will have to compute m singular value decompositions to get the refined vectors. Thus. The formation of V requires p multiplications of a vector by A. the solution of this problem is the right singular vector of (A . On the other hand. while the computation of each refined Ritz vector requires O(np2) operations. 4. are to be computed. Then any vector of 2-norm one in U can be written in the form Uz. the Rayleigh-Ritz method must calculate H = UEV.1.

Chapter 3. and.4. as we shall see in §1.11. Unfortunately. such an impersonation is the cause of the failure of the Rayleigh-Ritz method in the example. we expect ap to be small. Our analysis of the cross-product algorithm applies here. Uw) is a HARMONIC RITZ PAIR WITH SHIFT K if . the quality of the refined Ritz vector depends on the accuracy of ^. with i = p and j = p—ly we see that the sine of the angle between the computed vector and the true vector is of order (<J p _i/cri) 2 e M . Then by (3. we can form at negligible cost. can provide a complete set of approximating pairs while at the same time protecting pairs near a shift K against impersonators. on the other hand. Thus the cost of computing an additional refined Ritz vectors is the same as computing a spectral decomposition of a matrix of order p. The use of refined Ritz vectors solves the problem by insuring that the residual Ax — fix is small — something that is not true of the spurious Ritz vector.2 lies between the eigenvalues 1 and —1 and for this reason is called an interior eigenvalue of A. There is no really good way of introducing them other than to begin with a definition and then go on to their properties. Harmonic Ritz vectors. the computation of refined Ritz values can be considerably abbreviated. Specifically.292 matrix corresponding to A U — ull is CHAPTER 4. where u is not near an eigenvector of A. LetU be a subspace and U be an orthonormal basis for U. we note that if U is a Krylov subspace. If // is an accurate approximation to an eigenvalue A. Definition 4. as we have seen. Chapter 5. Interior eigenvalues are particularly susceptible to being impersonated by Rayleigh quotients of the form uHAu.If &p-i is not too small — something we can check during the calculation — then the refined Ritz vector computed by the cross-product algorithm will be satisfactory. In fact. EIGENSPACES If in the notation of (4. Then (K + £.20). Finally.4.17) we precompute the matrices then for any fj. HARMONIC RITZ VECTORS The eigenvalue 0 of A in Example 4. denote the singular vectors of AU — ^1 by a\ > • • • > ap. each refined Ritz vector must be calculated independently from its own distinct value of//. 4.

SEC. 4. RAYLEIGH-RITZ APPROXIMATION

293

We will refer to the process of approximating eigenpairs by harmonic Ritz pairs as the harmonic Rayleigh-Ritz method. This method shares with the ordinary RayleighRitz method the property that it retrieves eigenvectors that are exactly present in U. Specifically, we have the following result. Theorem 4.12. Let (A, x) be an eigenpair of A with x = Uw. Then (A, Uw) is a harmonic Ritz nair. Proof. Calculating the left- and right-hand sides of (4.18), we find

Hence Uw is an eigenvector of (4.18) with eigenvalue b — A — K; i.e., (A, Uw) is a harmonic Ritz pair. • This theorem leads us to expect that if x is approximately represented in U, then the harmonic Rayleigh-Ritz, like its unharmonious counterpart, will produce an approximation to x. Unfortunately, as of this writing the method has not been analyzed. But it is easy to see informally that it protects eigenpairs near K, from imposters. Specifically, on multiplying (4.18) by W H , we gel or

Consequently, if any harmonic Ritz value is within 5 of K the pair must have a residual norm (with respect to K) bounded by \S\. For example, if AC —> A and 6 —> 0, then it eventually becomes impossible for the harmonic Rayleigh-Ritz method to produce impersonators. It should be stressed that A does not have to be very near K for this screening to occur. If we return to Example 4.2 and compute the harmonic Ritz vector associated with the eigenvalue zero setting K = 0.3, we obtain the following excellent approximation to ei:

Computation of harmonic Ritz pairs When U contains a good approximation to an eigenvector whose eigenvalue lies near AC, (A - Kl)U in (4.18) will have a small singular value which will be squared when we form U^(A - K,I^(A - «/)£/, which can result in subsequent inaccuracies (see §3.2, Chapter 3). To circumvent this problem, we can compute the QR factorization

Vol II: Eigensystems

294

CHAPTER 4. EIGENSPACES

Then (4.18) assumes the form

Hence

This eigenproblem can be solved by the QZ algorithm (§4.3, Chapter 2). An alternative is to work with the ordinary eigenproblem

In this case a small singular value in R will cause R l and hence R 1Q U to be large. Fortunately, we are interested in harmonic Ritz values that are near K, i.e., values for which 6 is small. Thus we want eigenpairs corresponding to large eigenvalues in (4.23). Although the eigenvalue itself will not necessarily be accurately computed (the loss of accuracy occurs in forming R, not in foming R~1Q^U), if the eigenvalue is sufficiently well separated from it neighbors the eigenvector p will be well determined. The harmonic Ritz values are useful primarily in locating the harmonic Ritz vectors we need. When it comes to approximating the associated eigenvalue, we should use the ordinary Rayleigh quotient WH UAUw, which minimizes the residual norm. In computing this Rayleigh quotient we can avoid a multiplication by A by using quantities alreadv calculated. SDecificallv, from (4.20)

Hence

where x is the harmonic Ritz vector. When A is Hermitian, both the eigenproblems (4.22) and (4.23), which are essentially nonsymmetric, can give complex harmonic Ritz values in the presence of rounding error. However, we can use the same technique that we used for the S/PD problem (§4, Chapter 3). Specifically, we can transform (4.23) into the Hermitian eigenproblem

for the vector Rw. This approach has the slight disadvantage that having computed a solution g we must solve the system Rw = g to retrieve the primitive harmonic Ritz vector. However, the usual Ravleidi Quotient has the simpler form

SEC. 4. RAYLEIGH-RITZ APPROXIMATION

295

4.5. NOTES AND REFERENCES Historical In 1899, Lord Rayleigh [215] showed how to compute the fundamental frequency of a vibrating system. His approach was to approximate the fundamental mode by a linear combination of functions. He chose the linear combination x for which the quantity A = XT Ax/x^x was minimized and took A for the fundamental frequency. In 1909 Ritz [217] proposed a general method for solving continuous problems that could be phrased as the minimization of a functional. His idea was to choose a suitable subspace (e.g., of low-order polynomials) and determine coefficients to minimize the functional. His example of an eigenproblem—a second-order differential equation—does not, strictly speaking, fit into his general scheme. Although the mode corresponding to the smallest eigenvalue minimizes a functional, the other eigenvalues are found from the stationary points. The Rayleigh-Ritz method is sometimes called a Galerkin method or a GalerkinPetrov method. These methods solve a linear system Ax — b in the operator A by choosing two subspaces U and V and choosing x to satisfy the conditions

If U = V the method is called a Galerkin method; if not it is called a Galerkin-Petrov method. Now let V be the subspace spanned by V in (4.1). Then any Ritz pair (ju, x) satisfies the conditions

The similarity of (4.25) and (4.26) explains the nomenclature. The subject is not well served by this proliferation of unnecessary terminology, and here we subsume it all under the terms Rayleigh-Ritz method, Ritz pair, etc. The Rayleigh-Ritz method is also called a projection method because a Ritz pair (/x, x) is an eigenpair of the matrix F^APy, where P\> and PU are the orthogonal projections ontoZY and V. Convergence For earlier work on the convergence of the Rayleigh-Ritz method see the books by Parlett [206] for Hermitian matrices and by Saad [233] for both Hermitian and nonHermitian matrices. The convergence theory developed here is essentially that of Jia and Stewart for Ritz vectors [133] and for Ritz bases [134]. Theorem 4.6 is a generalization by Stewart of a theorem of Saad [231] for Hermitian matrices (also see [148]) and can in turn be extended to Ritz bases. For a proof see [268]. Refined Ritz vectors Refined Ritz vectors whose residual norms are not necessarily minimal were introduced by Ruhe [220] in connection with Krylov sequence methods for the generalVol II: Eigensystems

296

CHAPTER 4. EIGENSPACES

ized eigenvalue problem. Refined Ritz vectors in the sense used here are due to Jia [131], who showed that they converge when the Ritz vectors fail to converge. The proof given here may be found in [134]. The use of the cross-product algorithm to compute refined Ritz vectors was suggested in [132]. Harmonic Ritz vectors For a symmetric matrix, the (unshifted) harmonic Ritz values are zeros of certain polynomials associated with the MINRES method for solving linear systems. They are called "harmonic" because they are harmonic averages of the Ritz values of A~l. For a survey and further references see the paper [201] by Paige, Parlett, and van der Vorst, who introduced the appellation harmonic. Morgan [179] introduced the harmonic Ritz vectors as good approximations to interior eigenvectors. His argument (still for symmetric matrices) was that since eigenvalues near K become exterior eigenvalues of (A — Ac/)"1, to compute their eigenvectors we should apply the Rayleigh-Ritz method with basis U to (A — K I)"1. However, we can eliminate the need to work with this costly matrix by instead working with the basis (A — K,I)U. This leads directly to equation (4.18). But the resulting Ritz vector (A - K,I)Up will be deficient in components in eigenvalues lying near K. Hence we should return the vectors Up, which are not Ritz vectors, at least in the usual sense for symmetric matrices. Morgan also recommended replacing the harmonic Ritz value with the ordinary Rayleigh quotient. Since Morgan was working with Hermitian matrices, he was able to establish some suggestive results about the process. Harmonic Ritz values for non-Hermitian matrices were described by Freund [78], again in connection with iterative methods for solving linear systems. Harmonic Ritz vectors were introduced by Sleijpen and van der Vorst [240], who characterized them (in our notation) by the Galerkin-Petrov condition

The use of the inequality (4.19) to justify the use of harmonic Ritz vectors is new. The explicit reduction of the harmonic eigenproblem to the forms (4.22), (4.23), and the symmetric form (4.24) is new, although the symmetric form is implicit in the implementation of the Jacobi-Davidson algorithm in [243]. Another way of looking at these reductions is that when Tl(U} contains an eigevector of A corresponding to eigenvalue r, the pencil in the original definition (4.18) is not regular, while the transformed pencil (4.22) is. For the generalized eigenvalue problem Ax — \Bx, a natural generalization of a harmonic Ritz pair is a solution of

which was first introduced in [74]. An argument similar to that leading to (4.19) shows that this method also discourages imposters.

5
KRYLOV SEQUENCE METHODS

In this chapter and the next we will treat algorithms for large eigenproblems. There are two ways of presenting this subject. The first is by the kind of problem—the symmetric eigenproblem, the generalized eigenproblem, etc. This approach has obvious advantages for the person who is interested in only one kind of problem. It has the disadvantage that, at the very least, a thumbnail sketch of each algorithm must accompany each problem type. The alternative is to partition the subject by algorithms, which is what we will do here. In the present chapter we will be concerned with methods based on Krylov sequences— specifically, Arnoldi's method and the Lanczos method. In the first subsection we will introduce certain decompositions associated with Krylov subspaces. The decompositions form the basis for the the Arnoldi and Lanczos algorithms to be treated in §2 and §3.

1. KRYLOV DECOMPOSITIONS
For a fixed matrix A, not every subspace of dimension fc is a Krylov subspace of A. This means that any basis of a Krylov subspace must satisfy certain constraining relations, which we will call Krylov decompositions. It turns out that a knowledge of these decompositions enables us to compute and manipulate Krylov subspaces. Consequently, this section is devoted to describing the relations that must hold for a basis to span a Krylov subspace. The most widely used Krylov decompositions are those that result from orthogonalizing a Krylov sequence. They are called Arnoldi decompositions for non-Hermitian matrices and Lanczos decompositions for Hermitian matrices. We will treat these special decompositions in the first two subsections. Unfortunately, these decompositions are not closed under certain natural operations, and in §1.3 we introduce more general decompositions that are closed — decompositions that will be used in our computational algorithms.

297

298

CHAPTER 5. KRYLOV SEQUENCE METHODS

1.1. ARNOLDI DECOMPOSITIONS
An explicit Krylov basis of the form

is not suitable for numerical work. The reason is that as k increases, the columns of Kk become increasingly dependent (since they tend increasingly to lie in the space spanned by the dominant eigenvectors). For example, the following table gives the condition number (the ratio of the largest to the smallest singular values) for the Krylov basis Kk( A, e), where A is the matrix of Example 3.1, Chapter 4. k Condition 5 5.8e+02 10 3.4e+06 15 2.7e+10 20 3.1e+14 A large condition number means that the space spanned by the basis is extremely sensitive to small perturbations in the vectors that form the basis. It is therefore natural to replace a Krylov basis with a better-conditioned set of vectors, say an orthonormal set. Unfortunately, a direct orthogonalization of the columns of Kk will not solve the problem; once the columns have become nearly dependent, information about the entire Krylov subspace has been lost, and orthogonalization will not restore it. We therefore proceed indirectly through what we will call an Arnoldi decomposition. The Arnoldi decomposition Let us suppose that the columns of Kk+i are linearly independent and let

be the QR factorization of Kk+i (see §6, Chapter 2.3). Then the columns of Uk+i are the results of successively orthogonalizing the columns of Kk+i • It turns out that we can eliminate Kk+i from equation (1.1). The resulting Arnoldi decomposition characterizes the orthonormal bases that are obtained from a Krylov sequence. Theorem 1.1. Letu\ have 2-norm one and let the columns of Kk+i(A,ui) belinearly independent. Let Uk+i — (u\ • • • Uk+i) be the Q-factor ofKk+i- Then there is a (&+l)xAr unreduced upper Hessenberg matrix Hk such that

Conversely, ifUk+i is orthonormal and satisfies (1.2), where Hk is a (k+l)xk unreduced Hessenberg matrix, then Uk+i is the Q-f actor of Kk+i(A, ui).

SEC. 1. KRYLOV DECOMPOSITIONS
Proof. If we partition (1.1) in the form

299

we see that Kk = UkRk is the QR factorization of AV Now let Sk — Rk l. Then

where

It is easily verified that Hk is a (k+l)xk Hessenberg matrix whose subdiagonal elements are hi+iti = ri+ij+isa = n+i,z+i/ r n- Thus by the nonsingularity of Rk, Hk is unreduced. Conversely, suppose that the orthonormal matrix Uk+i satisfies the relation (1.2), where Hk i s a ( f c + l ) x f c unreduced Hessenberg matrix. Now if k = 1, then Au\ — huui 4- h,2iU2. Since /i2i ^ 0, the vector u2 is a linear combination of HI and Au\ that is orthogonal to u\. Hence (u\ u^} is the Q-factor of AY Assume inductively that Uk is the Q-factor of Kk- If we partition

then from (1.3)

Thus Uk+i is a linear combination of Auk and the columns of Uk and lies in JCk+iHence Uk+i is the Q-factor of Kk+i- • This theorem justifies the following definition. Definition 1.2. Let Uk+i G Cnx^k+l^ be orthonormal and let Uk consist of the first k columns ofUk+i • If there is a (&+1) x k unreduced upper Hessenberg matrix Hk such that

then we call (1.3) an ARNOLDI DECOMPOSITION OF ORDER k. If H is reduced, we say the Amoldi decomposition is REDUCED. Vol II: Eigensystems

300

CHAPTER 5. KRYLOV SEQUENCE METHODS

In this nomenclature, Theorem 1.1 says that every nondegenerate Krylov sequence is associated with an Araoldi decomposition and vice versa. An Amoldi decomposition can be written in a different form that will be useful in the sequel. Specifically, partition

and set

Then the Arnoldi decomposition (1.2) is equivalent to the relation

In what follows it will reduce clutter if we dispense with iteration subscripts and write (1.4) in the form

When we do, the compact form (1.3) will be written in the form

Thus the hats over the matrices U and H represent an augmentation by a column and a row respectively. Uniqueness The language we have used above—an Arnoldi decomposition—suggests the possibility of having more than one Amoldi decomposition associated with a Krylov subspace. In fact, this can happen only when the sequence terminates, as the following theorem shows. Theorem 1.3. Suppose theKrylov sequence Kk+\(A, u\) does not terminate at k+1. Then up to scaling of the columns of Uk+i, the Amoldi decomposition of Kk+i is unique. Proof. Let

be the Amoldi decomposition from Theorem 1.1. Then by this theorem Hk is unreduced and fa ^ 0. Let

SEC. 1. KRYLOV DECOMPOSITIONS

301

where Uk+i is an orthonormal basis for /Cjt+i(A, wi). We claim that U^Uk+\ = 0. For otherwise there is a Uj such that Uj = auk+i + Uka, where a ^ 0. Hence Aitj contains a component along Ak+l u and does not lie in /Cjt+i, a contradiction. It follows that H(Uk) = ^(Uk) and hence that Uk = UkQ, for some unitary matrix Q. Hence

or

On premultiplying (1.5) and (1.7) by Uk we obtain

Similarly, premultiplying by u^+l, we obtain

Now by the independence of the columns of Kk+i, both (3k and (3k are nonzero. It then follows from (1.8) that the last row of Q is a>£e£, where \u>k\ = 1. Since the norm of the last column of Q is one, the last column of Q is u^e^. Since Hk is unreduced, it follows from the implicit Q theorem (Theorem 3.3, Chapter 2) that Q = diag(u>i,... ,tjfc), where \MJ = 1 (j = 1,... , fc). Thus up to column scaling Uk = UkQ is the same as Uk- Subtracting (1.7) from (1.5) we find that

so that up to scaling Uk+i and tik+i are the same. •

This theorem allows us to speak of the Arnoldi decomposition associated with a nonterminating Krylov subspace. Perhaps more important, it implies that there is a one-one correspondence between Krylov subspaces and their starting vectors. A nonterminating Krylov subspace uniquely determines its starting vector. The nontermination hypothesis is necessary. If, for example, /Cfc+i = /C^+2, then AUk+i = Uk+iHk+i> and for any Q such that Q^Hk+iQ is upper Hessenberg, the decomposition A(Uk+\Q) = (Uk+iQ}(Q^Hk+iQ] is another Arnoldi decomposition. The Rayleigh quotient In the proof of Theorem 1.3 we used the fact that Hk = U^AUk- But this matrix is just the Rayleigh quotient with respect to the basis Uk (see Definition 2.7, Chapter 4). Thus the Arnoldi decomposition provides all that is needed to compute Ritz pairs. Vol II: Eigensystems

First suppose that Hk is reduced. say that hj+ij — 0. It is natural to ask under what conditions an Arnoldi-like relation can be satisfied with a reduced Hessenberg matrix. KRYLOV SEQUENCE METHODS Reduced Arnold! decompositions In Definition 1. The following theorem answers that question. . Then Hk is reduced if and only ifR. Write the fcth column of (1. Let the orthonormal matrix Uk+i satisfy where Hk is Hessenberg.302 CHAPTER 5. a contradiction.2 we have required that the Hessenberg part of an Arnoldi decomposition AUk = Uk+iHk be unreduced. . Then lZ(Uk) contains an eigenvector of A. along with Theorem 3. the result is a reduced Arnoldi decomposition. Chapter 4.4.4) in the form Then from the orthonormality of Uk+i we have . Uk. shows that the only way for an Arnoldi decomposition to be reduced is for the Krylov sequence to terminate prematurely. the sequence can be continued by setting wj+i equal to any normalized vector in the orthogonal complement of Uj. • This theorem. Let A be the corresponding eigenvalue and let Then Because H\ is unreduced. It follows that w = 0. suppose that A has an eigenspace that is a subset ofR. which we can write in the form x = UkW for some w. . if hj+ij = 0. Theorem 1. . Conversely. However.(Uk) and suppose that H k is unreduced. the matrix Uk+iH\ is of full column rank. If this procedure is repeated each time the sequence terminates. Computing Arnoldi decompositions The Arnoldi decomposition suggests a method for computing the vector Uk+i from HI .5. Proof. so that U is an eigenbasis of A. Let H be the leading principal submatrix of Hk of order k and let U be the matrix consisting of the first j columns of Uk+i • Then it is easily verified that AU — UH.(Uk) contains an eigenspace of A.

Since the vector v in statement 3 need not be orthogonal to the columns of Uk. We call this procedure the Arnoldiprocess.9). The cure is a process called reorthogonalization.9) from the Arnoldi decomposition. This algorithm has been treated extensively in §1:4. accumulating any changes in h^. It is worthwhile to verify that this process is the same as the reduction to Hessenberg form described in §3. VolII: Eigensystems . where it is shown that in the presence of inexact arithmetic cancellation in statement 3 can cause it to fail to produce orthogonal vectors.2. the computation of wjt+i is actually a form of the well-known Gram-Schmidt algorithm. 1. KRYLOV DECOMPOSITIONS Since we must have 303 and These considerations lead to the following algorithm for computing an expanding set of Arnoldi decompositions. Reorthogonalization Although we derived (1.SEC. the extended algorithm therefore orthogonalizes it again. Chapter 2. It is illustrated in the following extension of (1.1.4.

. The algorithm uses an auxiliary routine gsreorthog to perform the orthogonalization (Aleorithml:4. & — 1 . this algorithm is quite satisfactory—usually no more than the one reorthogonalization performed in statements 4-6 is needed. one must apply the algorithm for orders i = 1. Chapter 2.1. e. For more see Example 1. The cost of the orthogonalization will be 2nkr flam.304 Given an Arnoldi decompositioi CHAPTER 5. KRYLOV SEQUENCE METHODS of order k..1 —uses a function gsreorthog (Algorithm 1:4. we can often dispense with the reorthogonalization entirely. there are some tricky special cases. Hence: To compute an Arnoldi decomposition of order k from scratch.1: The Arnoldi step In general. Practical considerations The work in computing an Arnoldi step is divided between the computation of the vector-matrix product Au and the orthogonalization. and its cost will vary.1 repeatedly. In general r will be one or two. ExpandArn extends it to an Arnoldi decomposition of order k + 1. .13). what to do when v in statement 3 is zero. . our final algorithm—Algorithm 1. However.1. By applying Algorithm 1. Algorithm 1. one can compute an Arnoldi decomposition of any order.3) that handles exceptional cases and tests whether to reorthogonalize.g. The user must provide a method for computing the former. . Thus: . where r is the number of orthogonalization steps performed by gsreorthog.2. More important. Hence.

we have taken A{ to be diagonal. In this example we have simulated a shift-and-invert transformation by giving A{ a large eigenvalue of 10*. Hence if there is an accurate approximation to any vector in /Cjt. although more slowly. If (A — ft/)""1 is used to form a Krylov sequence. The bound (3.2. then v. this means that the first component of Uk will rapidly approach zero as good approximations to ei appear in £*. Of course we can move the columns of U to a backing store—say a disk. But each Arnoldi step requires all the previous vectors. The convergence of sin 9 toward zero proceeds typically up to a point and then levels off. where 0 = ^[G2. but what seems to be happening is the following. the closer the shift is to the eigenvalue the better the performance of the method. The larger the value ofi. For the inverse power method.SEC. at least in part. where the concern is with one eigenpair. 1. For our example. Thus shift-and-invert enhancement can be used to steer a Krylov sequence to a region whose eigenvalues we are interested in. is in the Arnoldi procedure. For i > 1. If n is very large.fck(Ai. Let be of order 30 with the last 29 eigenvalues decaying geometrically. No formal analysis of this phenomenon has appeared in the literature. Chapter 2) that replacing A by (A — K!)~I will make eigenvalues near K large. so that the effects of rounding error in the formation of the matrix-vector product AiUk are minimal. e) can be expected to quickly develop very good approximations to GI.12). The problem. then as k grows we will quickly run out of memory. In fact. the Krylov subspaces /C&(A t -.k+i must be almost orthogonal to it.1 plots sin 0. Moreover. Example 1.5. restricts our ability to compute Arnoldi decompositions. In any Krylov sequence the Arnoldi vector u^+i is orthogonal to /Cjfe. as the following example shows. as good approximations to e2 appear. Thus the effects observed in the figure are due to the combination of the large eigenvalue and the orthogonalization step. Similarly. Now in the presence of rounding error. Chapter 4 suggests that it will also produce approximations to 62. KRYLOV DECOMPOSITIONS 305 It requires nk floating-point words to store the matrix U. the sequence will tend to produce good approximations to those large eigenvalues. With Krylov sequences. Figure 1. the second component of Uk will approach zero. more than operations counts. Orthogonalization and shift-and-invert enhancement We have seen in connection with the inverse power method (§1. the sooner the convergence stagnates. This fact. and transfers between backing store and main memory will overwhelm the calculation. an excessively accurate approximation to one eigenvalue can cause the convergence to the other eigenpairs to stagnate. e)] versus k for several values ofi. the first component of Uk will not become exactly zero but will be reduced to Vol Hi Eigensystems . where the concern is often with several eigenpairs.

let A be Hermitian and let be an Arnoldi decomposition. Thus we have effectively changed our rounding unit from 10~16 to 10l~16. this spurious first component will introduce errors of order 10*~16 in the other components. Specifically. 1.306 CHAPTER 5. But since Tk = U^AUk. The moral of this example is that if you wish to use shift-and-invert enhancement to find eigenpairs in a region. But this nonzero first component is magnified by 10* when Auk is formed. and we can expect no greater accuracy in the approximations to the other eigenvectors. Then the Rayleigh quotient Tk is upper triangular. LANCZOS DECOMPOSITIONS Lanczos decompositions When A is Hermitian the Amoldi decomposition and the associated orthogonalization process simplify. When Auk is reorthogonalized. do not let the shift fall too near any particular eigenvalue in the region.2.1: Stagnation in a Krylov sequence the order of the rounding unit—say 10~16. It follows that Tk is tridiagonal and can be . Tk is Hermitian. KRYLOV SEQUENCE METHODS Figure 1.

we do not have to conjugate it in statement 5. note that the first column of the Lanczos decomposition (1. • The body of the loop requires one multiplication by A.1. two inner products. and two subtractions of a constant times a vector. w^. Hence: Vol II: Eigensystems . from the orthonormality of HI and w 2 it follows that «i = v^Aui and we may take0i = \\Au\ — ^ui\\2. see the notes and references. • A frequently occuring alternative is to calculate This is the equivalent of the modified Gram-Schmidt algorithm (see §1:4. which is implemented in Algonthm 1. Both work well.4). provided that the matrix Tk is unreduced. 1.2...SEC.12) is or Moreover. . To see this.12) we get the relation where This is the Lanczos three-term recurrence. More generally. the Lanczos decomposition is essentially unique. in contrast to the Arnoldi procedure where Uk depends on u\. from the jth column of (1. Here are three comments. Like the Arnoldi.. • Since /2j_i is taken to be real. The Lanczos recurrence The fact that Tk is tridiagonal implies that the orthogonalization Uk+i depends only on Uk and t/fc-i. For more on variants of the basic algorithm.12) a Lanczos decomposition. KRYLOV DECOMPOSITIONS 307 We will call the decomposition (1. This will always be true if the sequence does not terminate.

A more serious problem is loss of orthogonality among the vectors Uj.3.308 CHAPTER 5. we can discard the Uj as they are generated and compute Ritz values from the Rayleigh quotient T^. 1. However. we can do little more with it. KRYLOV DECOMPOSITIONS An Arnoldi (or Lanczos) decomposition can be regarded as a vehicle for combining Krylov sequences with Rayleigh-Ritz refinement. because of its uniqueness. However. It exhibits an orthonormal basis for a Krylov subspace and its Rayleigh quotient—the wherewithal for computing Ritz pairs. As we shall see. if we reorthogonalize Uj+\ in the Lanczos against its predecessors. one must retain all the vectors u3. where we will treat the Lanczos algorithm in detail.13). We will return to this problem in §3. This algorithm generates the Lanczos decomposition where I\ has the form (1. For . KRYLOV SEQUENCE METHODS Let HI be given. Although the simplicity and economy of the Lanczos recurrence is enticing. there are formidable obstacles to turning it into a working method. In the Arnoldi process we were able to overcome this problem by reorthogonalization. If only eigenvalues are desired. and for large problems it will be necessary to transfer them to a backing store. it also permits the efficient computation of the residual norms of the Ritz pairs. The above is less by a factor of about k than the corresponding count (1.11) for the Arnoldi process. Algorithm 1. we lose all the advantages of the Lanczos recurrence.. But if one wants to compute eigenvectors.2 is k matrix-vector multiplications and 4nk flam.2: The Lanczos recurrence The operation count for Algorithm 1.

we can write IfUk+i is orthonormal. there is no unreduced Arnold! form whose Rayleigh quotient is triangular. It is uniquely determined by the basis Uk+i> For if (V v) is any left inverse for Uk+i. Bk is a Rayleigh quotient of A. Thus a Krylov decomposition has the general form as an Arnoldi decomposition. • • . Here is a nontrivial example of a Krylov decomposition.. Definition A Krylov decomposition. We call the H(Uk+i) the SPACE SPANNED BY THE DECOMPOSITION and Uk+i the basis for the decomposition. Let u\. then it follows from (1. 1^2. but the matrix B^ is not assumed to be of any particular form.SEC. like an Arnoldi decomposition.16) that IT In particular. Example 1. Definition 1. KRYLOV DECOMPOSITIONS 309 example. we say that the Krylov decomposition is ORTHONORMAL. In this subsection we will show how to relax the definition of an Arnoldi decomposition to obtain a decomposition that is closed under similarity transformations of its Rayleigh quotient.6. Uk+i be linearly independent and let A KRYLOV DECOMPOSITION OF ORDER k is a relation of the form Equivalently. 1. specifies a relation among a set of linearly independent vectors that guarantees that they span a Krylov subspace. we make the following definition.7. Two Krylov decompositions spanning the same space are said to be EQUIVALENT. Specifically. Let Vol II: Eigensystems .

we are free to reduce the Rayleigh quotient B to any convenient form. Then on postmultiplying by Q we obtain the equivalent Krylov decomposition We will say that the two Krylov decompositions are similar. Translation The similarity transformation (1.310 CHAPTER 5. the transforming matrix Q necessarily satisfied \Q\ = I so that the two similar decompositions were essentially the same. however. KRYLOV SEQUENCE METHODS be of order k. There is also a Krylov-Schur decomposition. We have already met with a similarity transformation in the proof of the uniqueness of the Arnoldi decomposition [see (1. They are called similarity transformations and translations. However. Every Arnold! decomposition is a Krylov decomposition.6)]. If the Arnoldi decomposition is unreduced. we could reduce B to Jordan form. With a general Krylov decomposition. and the subspace of the original Krylov decomposition must be a nondegenerate Krylov subspace. as the above example shows. In the Krylov decomposition let . then is a Krylov decomposition whose basis is the Krylov basis. in which case we will say that the decomposition is a KrylovJordan decomposition.2). In this terminology an Arnoldi decomposition is an orthonormal Krylov-Hessenberg decomposition and a Lanczos decomposition is an orthonormal Krylov-tridiagonal decomposition.18) leaves the vector Uk+i unaltered. HI) are independent. If the columns ofKk+i(A. In that case. For example. however. then it is essentially unique. To establish these facts we will introduce two classes of transformations that transform a Krylov decomposition into an equivalent Krylov decomposition. Similarity transformations Let be a Krylov decomposition and let Q be nonsingular. which will play an important role in our treatment of the Arnoldi method (see §2. but not every Krylov decomposition is an Arnoldi decomposition. every Krylov decomposition is equivalent to a (possibly unreduced) Arnoldi decomposition. We can transform this vector as follows.

then by Theorem 1.4 71 (U) contains no invariant subspaces of A. these algorithms will be reasonably stable. There are two points to be made about this theorem. Note that translation cannot produce an arbitrary vector Uk+i. Given the original decomposition.Instead. each step in the proof has an algorithmic realization. Equivalence to an Arnold! decomposition We will now use the transformations introduced above to prove the following theorem. Next let be a vector of norm one such that UHu = 0. and by Theorem 1. Then the equivalent decomposition is a possibly reduced Arnoldi decomposition. KRYLOV DECOMPOSITIONS 311 where by the independence of the vectors Uj we must have 7 ^ 0 . 1. Proof. Let U = UR be the OR factorization of U. • The proof of Theorem 1. We begin with the Krvlov decomposition where for convenience we have dropped the subscripts in k and k+1. • If the resulting Amoldi decomposition is unreduced. let Q be a unitary matrix such that 6HQ = ||&||2eJ and QHBQ is upper Hessenberg. Then is an equivalent decomposition. Then it follows that Since K[(Uk Wfc+i)] = TZ[(Uk v>k+i)]> this Krylov decomposition is equivalent to the original. Uk+i is constrained to have components along the original vector Uk+i. Theorem 1. To summarize: Vol II: Eigensystems . If the original basis Uk+i is reasonably well conditioned.3 it is essentially unique.8.8 is constructive. so that the two decompositions span the same space. Finally. We say that it was obtained from the original decomposition by translation. Every Krylov decomposition is equivalent to a (possibly reduced) Arnoldi decomposition. in which the matrix U is orthonormal.SEC. Then the translated decomposition is an equivalent orthonormal Krylov decomposition.

so that (1. Let H\ be a Householder transformation such that 6HH\ = /tefc. This can be done by Householder transformations as follows. The course of the reduction is illustrated in Figure 1.2.Then the Krylov decomposition has a Hessenberg Rayleigh quotient.312 CHAPTER 5. . We now reduce H\BHi to Hessenberg form by a variant of the reduction described in §2.Hk-i.19) is an Arnoldi decomposition. It uses a variant ofhousegen that zeros the initial components of a vector. in which the matrix is reduced rowwise beginning with the last row. H^-\ are ei.3. Algorithm 1. This algorithm illustrates an important fact about algorithms that transform Krylov decompositions.. KRYLOV SEQUENCE METHODS Figure 1. transformations on the Rayleigh quotient are negligible . Chapter 2. Reduction to Arnoldi form As we shall see. .3 implements this scheme. Now let Q = HiH2 • • . Let the Krylov decomposition in question be where B is of order k.2: Rowwise reduction to Hessenberg form Each Krylov sequence that does not terminate at k corresponds to the class of Krylov decompositions that are equivalent to its Arnoldi decomposition. If n is large. Because the last rows of HI.. it is sometimes necessary to pass from an orthonormal Krylov sequence to an Arnoldi sequence.

20). */). This algorithm reduces it to an Arnoldi decomposition. 1. which takes an ra-vector x and returns a Householder transformation H — I — uvP such that Hx — ven [see (2. Algorithm 1. w. Chapter 2]. KRYLOV DECOMPOSITIONS 313 Let be an orthogonal Krylov decomposition of order k.SEC.3: Krylov decomposition to Amoldi decomposition Vol II: Eigensystems . The algorithm uses the routine housegen2(x.

both are somewhat expensive to compute.4.314 CHAPTER 5.iiI)U whose singular value is smallest [see (4. 1. which would give the same operation count. Thus the bulk of the work in Algorithm 1.11. the pair (K + <!>. Thus the computation of a refined Ritz vector can be reduced to computing the singular value decomposition of the low-order matrix B^. we will assume that is a orthonormal Krylov decomposition. COMPUTATION OF REFINED AND HARMONIC RITZ VECTORS In §§4. However. Chapter 4.4. But the possibilities for optimizing cache utilization are better if we calculate U*Q as above (see §1:2. Chapter 4].3 and 4. From (1. the right singular vectors of (A — ^iI)U are the same as the right singular vectors of B^. In what follows. Chapter 4. Refined Ritz vectors If \i is a Ritz value then the refined Ritz vector associated with n is the right singular vector of (A . KRYLOV SEQUENCE METHODS compared to the manipulation of U. The purpose of this subsection is to give the details. Uw) is a harmonic Ritz pair if .20) we have If we set we have Since (U u)is orthonormal. we introduced refined and harmonic Ritz vectors. their computation simplifies.16). We could have accumulated the Householder transformations directly into U. This process requires nk2 flam. if the defining subspace for these vectors is a Krylov subspace. where U is overwritten by U*Q.3 is found in the very last statement. Harmonic Ritz vectors By Definition 4.3). In general.

we can multiply by (B — K/)~ H to get the ordinary eigenvalue problem 1. of which (1. As might be expected. Lanczos also gave a biorthogonalization algorithm for non-Hermitian matrices involving right and left Krylov sequence. the Lanczos recurrence preceded the Arnoldi procedure..4). . noting that the Rayleigh quotient is Hessenberg.SEC. The two described here emerge with a clean bill of health." Vol II: Eigensystems . It is due to to Cornelius Lanczos (pronounced Lantzosh). and indeed the relation (1. NOTES AND REFERENCES Lanczos and Arnold! Contrary to our order of presentation. we have opted for the slightly vaguer word "decomposition. which can be done by the QZ algorithm (see §4.12) of the Lanczos decomposition is due to Paige [195]. Consequently. KRYLOV DECOMPOSITIONS and 315 It follows tha Thus we have reduced the problem of computing w to that of solving a small generalized eigenvalue problem. 279).1951] seems to have been to generalize the onesided Lanczos process to non-Hermitian matrices. Lanczos did not compute approximations to the eigenvalues from the matrix Tk (which he never explicitly formed).However. the factorization is concealed by the second form (1. If we are willing to assume that (B — K/)H is well conditioned. Paige [197] has given a rounding-error analysis of these variants. but instead used the recurrence to generate the characteristic polynomial of Tk. from which he computed eigenvalues. he was aware that accurate approximations to certain eigenvalues could emerge early in the procedure. Chapter 2). but the others may not perform well. What we have been calling an Arnoldi decomposition is often called an Arnoldi factorization. whose algorithm [157. He called his method the method of minimized iterations because orthogonalizing Au against u\. 1. The matrix form (1. As indicated above (p. whose ground-breaking thesis helped turn the Lanczos algorithm from a computational toy with promise into a viable algorithm for large symmetric eigenproblems. Uk minimizes the norm of the result.2) is a factorization of AUk. Arnoldi's chief contribution [5..3.1950] has had a profound influence on the solution of large eigenproblems. there are several variants of the basic three-term recursion. which is by far the most widely used..5.15) is an example.

is the way the Arnoldi method is used in practice. However. some of these ideas are implicit in some algorithms for large eigenproblems and large linear systems. some of these Ritz pairs will (we hope) approach eigenpairs of A. and deflation. convergence. for large problems our ability to expand the Arnoldi decomposition is limited by the amount of memory we can devote to the storage of Uk. Uwo requires almost 10 megabytes of memory to store in IEEE double precision. A possible cure is to move the vectors Uk to a backing store—say a disk—as they are generated.For example. in fact. then. One possible reason is that one does not generally have very accurate shifts.This. However. proceeds via an orthonormal Krylov decomposition. as Example 1. the shift has to be extremely accurate for the degradation to be significant. We then turn to practical considerations: stability. Krylov decompositions The idea of relaxing the definition of an Arnoldi decomposition is due to Stewart [267]. An alternative is to restart the Arnoldi process once k becomes so large that we cannot store Uk. we can use Algorithm 1. Unfortunately. Since disk transfers are slow. In this chapter we will introduce two restarting methods: the implicit restarting method of Sorensen and a new method based on a Krylov-Schur decomposition. Moreover. We have already noted that the Ritz pairs associated with Uk can be computed from the Rayleigh quotient Hk. to be treated in the next section.As k increases. Sorensen's implicitly restarted Arnoldi algorithm [246]. which is then truncated to yield an Arnoldi decomposition.2. In principle. 2. if A is of order 10. . THE RESTARTED ARNOLDI METHOD Let be an Arnoldi decomposition. in order to orthogonalize Auk against the previous vectors we will have to bring them back into main memory. For more. These methods are the subjects of §§2. Most notably.5 shows. see §2.316 CHAPTER 5.000.5.1 and 2. the algorithm will spend most of its time moving numbers back and forth between the disk and main memory.1 to keep expanding the Arnoldi decomposition until the Ritz pairs we want have converged. KRYLOV SEQUENCE METHODS Shift-and-invert enhancement and orthogonalization The fact that a shift that closely approximates an eigenvalue can degrade convergence to other eigenpairs does not seem to be noted in the literature. as are the systematic transformations of the resulting Krylov decompositions.

.. and suppose that /ifc+i.SEC. jLtm correspond to the part of the spectrum we are not interested in.. THE RESTARTED ARNOLDI METHOD 2. we can take p to be a Chebyshev polynomial on [a... it turns out. but two suggest themselves immediately. 6] and the eigenvalues AI. k). If the eigenvalues \k+i. Then takep(i) = (t .) (i = 1.. The emphasis is on filtering out what we do not want. n) are small compared to the values p(A.. 6] in the first of the above alternatives will not be much enhanced.pk+i) • • • ( * . A natural choice would be a linear combination of Ritz vectors that we are interested in. . 6].. Let wi be a fixed vector.7. Chapter 4). . .. desired eigenvalues near the interval [a. this polynomial has values on [a.. and suppose that we are interested in calculating the first k of these eigenpairs. By the properties of Chebyshev polynomials (see Theorem 3. For example. while what we want is left to fend for itself. Xj). There are many possible choices for a filter polynomial. This. VolII: Eigensystems . Filter polynomials For the sake of argument let us assume that A has a complete system of eigenpairs (A. .. then u will be rich in the components of the Xi that we want and deficient in the ones that we do not want. . we have If we can choose p so that the values p(\i) (i = fc-f 1.. 6]. Then u\ can be expanded in the form If p is any polynomial.. 6] that are small compared to values sufficiently outside [a..1. Such a polynomial is called a filter polynomial.. is a special case of a general approach in which we apply a filter polynomial to the original starting vector. THE IMPLICITLY RESTARTED ARNOLDI METHOD 317 The process of restarting the Arnold! method can be regarded as choosing a new starting vector for the underlying Krylov sequence. 2.. 1. 2./*m)« It should be observed that both choices accentuate the negative. Ajt lie without. A n lie in an interval [a.. /zm.. Suppose we have at hand a set of Ritz values fii.. . We will begin by motivating this process and then go on to show how the QR algorithm can be used to implement it.

However. Specifically. Before describing how implicit restarting is implemented. then p(A)ui requires ra—k matrixvector multiplications to compute. p is of degree m—k. There are five points to note about this procedure. The contraction phase works with a factored form of the filter polynomial. if HI is the starting vector of an Arnoldi decomposition of order m. KRYLOV SEQUENCE METHODS Whatever the (considerable) attractions of filter polynomials. we can use shifted QR steps to compute the Krylov decomposition of ICk[p(A)ui] without any matrix-vector multiplications. Postmultiplyingby Qi. This process of expanding and contracting can be repeated until the desired Ritz values have converged. we get Finally.1 to expand this decomposition to a decomposition of order m. At this point a filter polynomial p of degreeTO—fcis chosen. . is the matrix that results from one QR step with shift KI applied to Hm. If. let us look at how it will be used.318 Implicitly restarted Arnoldi CHAPTER 5. they are expensive to apply directly. we have where Note that H^. Starting with the shifted decomposition we write where Q\R\ is the QR factorization of Hm — «i/. for example. We begin with an Arnoldi decomposition We then use Algorithm 1. let We incorporate the factor t — Ki into the decomposition using an analogue of the basic QR step [see (2. and the implicit restarting process is used to reduce the decomposition to a decomposition of order & with starting vector p( A )ui. This technique is called implicit restarting. Chapter 2].1). The integer m is generally chosen so that the decomposition can fit into the available memory.

Thus the decomposition (2. We may now repeat this process with K^? • • • . only the last two components of b^m'+l are nonzero. which establishes the result. j)-plane (see Figure 2. 2. The matrix Hm is Hessenberg.. The matrix Q\ is upper Hessenberg. 4.e. This follows from the proof of the preservation of Hessenberg form by the QR algorithm.The result will be a Krylov decomposition with the following properties. and at most its last subdiagonal element is zero. 2. The vector b^f+l = e^Qi is the last row of Q and has the form i.SEC. Chapter 2). THE RESTARTED ARNOLDI METHOD 319 1. For on postmultiplying (2.2.K\I). r^' is nonzero. This follows from the fact that it can be written in the form where PJ-IJ is a rotation in the (j—1. 5. 3.3) has the Wilkinson diagram presented below for k = 3 and ra = 6: Vol II: Eigensystems . Km-Jt. we get Since Hm is unreduced. The matrix lib is orthonormal.1) by ei and remembering that RI is upper triangular. The first column of Um = UmQ\ is a multiple of (A .

we will only describe the implicit single step method. KRYLOV SEQUENCE METHODS where the q's are from the last row of Q. After we have applied the shifts «!. A precise operation count is not necessary. when A is real and the shift is complex. • The algorithm requires 0(m 2 ) work. A bit of terminology will help in our description of the algorithm..m consists of the first k columns of Um » HM cipal submatrix of order k of Hm~ »and g^m is from the matrb Hence if we set ls me leading prin- then is an Arnold! decomposition whose starting vector is proportional to (A—Ki 1} • • -(A— Ac m _jb/)Mi. However. since when n is large the applications ofFilterArn will account for an insignificant part . Implementation We have described the contraction algorithm for the implicitly restarted Arnoldi method in terms of the explicitly shifted QR algorithm. For simplicity. We have thus restarted the Arnold! process with a decomposition of order k whose starting vector has been filtered by p(A). Not only do we avoid any matrix-vector multiplications in forming the new starting vector. For convenience we separate the final truncation from the implicit filtering. it is natural to use the implicit double shift procedure.. The elegance and economy of this procedure is hard to overstate. we can truncate the decomposition as above to yield a decomposition of order m—j. • The algorithm is a straightforward implementation of the single implicit QR step. Algorithm 2. KJ. The only new feature is that a double shift adds an additional two subdiagonals to the matrix Q and hence moves the truncation index back by two.320 CHAPTER 5. Here are some comments.. From this diagram it is easy to see that the first k columns of the decomposition can be written in the form where Uj. For single shifts there is no reason not to use this procedure.1 is an implementation of one step of filtering. . We will call the integer m—j the truncation index for the current problem. but we also get its Arnoldi decomposition of order k for free. For large n the major cost will be in computing UQ.

SEC. this algorithm reduces the truncation index by one. The transformations generated by the algorithm are accumulated in the raxm matrix Q.1: Filtering an Arnold! decomposition Vol II: Eigensystems . THE RESTARTED ARNOLDI METHOD 321 Let be an Arnoldi decomposition of order m. Algorithm 2. Given a shift AC. 2.

If we implement the product U*Q [:. When n is large. Algorithm 2. If this is taken into . The expansion phase requires m—k matrix-vector multiplications. It is much cheaper to accumulate them in the smaller matrix Q and apply Q to U at the end of the contraction process.322 Given an Arnoldi decomposition CHAPTER 5. This is done in Algorithm 2. 1: k] naively. this algorithm expands the decomposition to one of order m and then contracts it back to one of order k. so that the expansion phase requires between n(m2-k2) flam and 2n(ra 2 -fc 2 ) flam. However. KRYLOV SEQUENCE METHODS and an integer m > k. the operation count is nm k flam. the elements of Q below the m-k subdiagonal are zero. In fact. we have ignored the opportunity to reduce the count by taking advantage of the special structure of Q. The restarted Arnold! cycle We are now in a position to combine the expansion phase with the contraction phase.2. From (1.10) we know that the Arnoldi step requires between Inj flam and 4nj flam. • Transformations are not accumulated in U. the bulk of the work in this algorithm is concentrated in the expansion phase and in the computation of U*Q[l:k].2: The restarted Arnoldi cycle of the work in the overall Arnold! algorithm.

.3). tan Z(ei.7e-05 8.0e-07 The columns contain the following information: the number of matrix-vector multiplications. The main conclusion to be drawn from this table is that it does not much matter how U*Q [:. For purposes of exposition we have assumed that m and k are fixed. There is no general analysis that specifies an optimal relation. 4. The results are summarized in the following table.2e-01 5.f fc2). so that in spite of the operation counts it may be faster (see §1:2. the naive scheme permits more blocking.9e-03 8.952.8e-02 4. and starting vector e. A 5. The implicitly restarted Arnoldi method is almost as good as the full Krylov sequence.1. tan Z (ei. The implicitly restarted Arnoldi method can be quite effective. These counts are not easy to interpret unless we know the relative sizes of m and k. . 1 :k] is formed.. 2. 2. But experience has shown it to be the best general method for non-Hermitian matrices when storage is limited. . The implicitly restarted Arnoldi method was run with k = 5.SEC.95 3 . . In fact. VolII: Eigensy stems . the restarted Arnoldi subspace). Thus if we take m = 2k.4e-03 8. It would be rash to take one offhand example as characteristic of behavior of the implicitly restarted Arnoldi method. the Ritz vector from the restarted subspace). THE RESTARTED ARNOLDI METHOD 323 consideration the count can be reduced to n(km . the full Krylov subspace). Example 2. However. but the conventional wisdom is to take m to be Ik or greater. . whereas the former requires only 1.. Axu 10 15 20 25 1. as the following example shows.9e-05 4. tan Z(ei. Let A = diag(l. m = 10. as long as m is not so large that storage limits are exceeded. 3.8e-05 8.0e-07 RV 1. we end up with the following table (in which RO stands for reorthogonalization).500 words of floating-point storage. . they can be varied from cycle to cycle.000.0e-03 1.7e-02 2.95.95") (the matrix of Example 3. Yet the latter requires 2.1. Chapter 4).8e-08 IRA 9.

Let {im. If the implicitly restarted QR step is performed with shifts /u m . then the resulting matrix H has the form where HU is unreduced. where HU is unreduced.RQ -f p.e.. • By repeated application of Theorem 2.3. Chapter 2. fik+i be eigenvalues of Hk. .2.3) has the form where T^m *) is an upper triangular matrix with Ritz values pk+i.8) is the Rayleigh quotient for the implicitly restarted Arnoldi algorithm. and let pbean eigenvalue ofH. Because Hm~ is block triangular. Proof. at the end of the expansion phase the Ritz values—i.k+i > then the matrix Hm~ ' in (2. from which it is evident that all but the last of the diagonal elements of R are nonzero. it follows from the representation of the formation of this product in Figure 2. Corollary 2. • • • .f i l ) to triangular form. the eigenvalues of Hm — are calculated and divided into two classes: .3.. that all but the last of the subdiagonal elements of RQ are nonzero. Hence H .. Moreover...2.. The matrix Hk in (2. If one QR step with shifty is performed on H. Specifically. KRYLOV SEQUENCE METHODS We have already mentioned the possibility of choosing Ritz values—the eigenvalues of H — as shifts for the implicitly restarted Arnoldi method. This fact is a consequence of the following theorem. This reduction is illustrated in Figure 2. Hence the last diagonal element of R must be zero. But H — fil is not of full rank. It follows that the last row of RQ is zero. Theorem 2. Pm on its diagonal.6). Chapter 2. Hk must contain the Ritz values that were not used in the restarting process. . The first part of the QR step consists of the reduction of (H . When this is done the Rayleigh quotient assumes a special form in which the discarded Ritz values appear on the diagonal of the discarded half of the Rayleigh quotient.2 we get the following result.324 Exact shifts CHAPTER 5. This means that we can choose to keep certain Ritz values and discard others. fJ.1 has the form (2. Let H bean unreduced Hessenberg matrix..

however. That is why ARPACK.. the matrix that one computes with rounding error can fail to deflate — i. H[k+l. as we shall see later (Theorem 2. its (TO. We must continue to work with the entire matrix H as in Algorithm 2. In cannot generate instability in the Arnoldi process itself. k] in statement 11 of Algorithm 2. if a particular step is forward unstable. For example. has been and is used with great success to solve large eigenproblems.5). when n is large.k] can be far from zero. In this subsection we describe a more reliable method for purging Ritz values. forward instability in the QR step can compromise the progress of the algorithm (though not its stability). the work saved by deflation is negligible. does not mean that we cannot use Algorithm 2.2. the Arnoldi relation AU = UH + flu will continue to hold to working accuracy.e. there is a danger of an unwanted Ritz value becoming permanently locked into the decomposition.1. Forward instability Theorem 2.3 is that in exact arithmetic the quantity H [k+l. we might retain the Ritz values having the largest real part. As we have seen. 2. so that the computation of fl and u can be simplified. In the presence of rounding error. In stability computations. if we are using shift-and-invert enhancement. We now turn to a discussion of this phenomenon. and it is natural to retain the largest Ritz values. 2. THE RESTARTED ARNOLDI METHOD 325 those we wish to retain and those we wish to discard. Fortunately. where an eigenvalue with a positive real part indicates instability.2 is a classic example of the limitations of mathematics in the face of rounding error.2 is zero. This phenomenon.2. In fact. it is easier to move eigenvalues around in a triangular matrix than in a Hessenberg matrix. TO—l)-element can be far from zero. At worst it can only retard its progress. First. In this way the restarting process can be used to steer the Arnoldi algorithm to regions of interest. Given an exact shift. we cannot guarantee that the undesirable Ritz values have been purged from the decomposition. However. which is called forward instability of the QR step. A consequence of Corollary 2. The latter are used as shifts in the restarting algorithm. It is based on two observations. Although forward instability is a real phenomenon.SEC. KRYLOV-SCHUR RESTARTING The implicitly restarted Arnoldi algorithm with exact shifts is a technique for moving Ritz values out of an Arnoldi decomposition. it should be kept in perspective. if a Krylov decomposition can be partitioned in the form Vol II: Eigensystems . we will be interested in the largest eigenvalues. the definitive implementation of the implicitly restarted Arnoldi method. In fact. Second. An important algorithmic implication of forward instability is that we cannot deflate the decomposition after the application of each exact shift.

However. and throw away the rest of the decomposition. If we have an algorithm to perform such exchanges. KRYLOV SEQUENCE METHODS then AUi = U\B\\ + ub^ is also a Krylov decomposition. Exchanging eigenvalues and eigenblocks Since the Krylov-Schur method requires us to be able to move eigenvalues around in a triangular matrix. To derive our algorithms we will consider the real case. More precisely. it is not enough to be able to exchange just eigenvalues.(i = 1.. . we will generally work with real Schur forms.326 CHAPTER 5. we will begin with a sketch of how this is done. let a triangular matrix be partitioned in the form where If we can find a unitary transformation Q such that then the eigenvalues su and s22 m the matrix will have traded places. If the eigenproblem in question is real. There are four cases to consider. move the desired eigenvalues to the beginning. In other words a Krylov decomposition splits at any point where its Rayleigh quotient is block triangular. then we can move any eigenvalue on the diagonal of a triangular matrix to another position on the diagonal by a sequence of exchanges. Thus we must not only be able to exchange eigenvalues but also the 2x2 blocks containing complex eigenvalues. Let where Su is of order n. Our restarting algorithm then is to compute the Schur decomposition of the Rayleigh quotient.2). We will call the process Krylov-Schur restarting. The first thing to note is that to move an eigenvalue from one place to another it is sufficient to be able to exchange it with either one of its neighboring eigenvalues.

Let Let y be a normalized left eigenvector corresponding to Su and let Q = (X y) be orthogonal. Then it is easily verified that Note that although Su need not be equal 611. Chapter 1). Then The last case.2). and similarly for Su.6. HI = n^ — 2.12. THE RESTARTED ARNOLDI METHOD 327 We will now sketch algorithms for each case.SEC. X) be an orthonormal eigenpair. Chapter 4. Then by Theorem 1. Let the matrix in question be where Su is of order one or two. Vol II: Eigensystems . The first two cases can be treated by what amounts to the first step in a reduction to Schur form (see the proof of Theorem 1. Let Let (522. is more complicated. 2. they are similar). Chapter 2]. and let Q = (X Y) be orthogonal. the two matrices must have the same eigenvalues (in fact. Since £22 is the Rayleigh quotient X^SX it must have the same eigenvalues as 622. Let x be a normalized eigenvector corresponding to 522 and let Q = (x Y) be orthogonal. The third case can be treated by the variant of Schur reduction introduced in connection with the QR algorithm [see (2.

In this case we do not use Algorithm 1. The procedure is to compute a nonorthonormal eigenbasis and then orthogonalize it. The algorithms for the first three cases have stable implementations. To keep the exposition simple. and it can be applied to the matrix in which S is embedded. It is possible for this system to be singular. we can compute the QR decomposition In particular. For more details on these algorithms see the LAPACK subroutine xLAEXC. Once we have computed P. Q — (X Y) can be represented as the product of two Householder transformations. These methods are stable in the usual sense. the matrix Q can be taken to be a Householder transformation (or a plane rotation when n\ — n<2 = 1). we get the Sylvester equation which can be solved for P.1. the matrix E must be set to zero. is to compute the eigenbasis X. Chapter 1. Consequently.9. where P is to be determined. Once the eigenvector has been computed. although it works well in practice and its failure can be detected. to compute P but instead note that (2. For the first case. The algorithm for the fourth case is not provably stable. KRYLOV SEQUENCE METHODS The problem. after the transformation the matrix Q^SQ will have the form To complete the algorithm. The Krylov-Schur cycle We will now describe the basic cycle of the Krylov-Schur method. Specifically. Chapter 2). Specifically let the eigenbasis be (PT /)T. they can be computed stably by solving a 2 x 2 system of equations. . but replacing zero pivot elements with a suitably small number will allow the computation to continue (see Algorithm 2. For the other two.328 CHAPTER 5. we will assume that the matrix in question is complex. Then From the first row of this equation. then. the eigenvector can be computed by inspection. if E is too large—say ||£||oo > lOH^Hc^M— tnen me algorithm is deemed to have failed. so that we do not have to be concerned with real Schur forms.9) is a set of four linear equations in four unknowns which can be solved by any stable method.

However. • The work involved in the expansion phase is the same as for the Arnoldi process. As (2. When n is large.3 implements this scheme. In fact. Here are two comments. Vol II: Eigensystems . The expansion phase is exactly like the expansion phase for the Amoldi method. • In the computation of the Schur decomposition in statement 7. expands it to an Arnoldi decomposition. !:&]. To get such a decomposition initially.5) shows this is about the same as for implicit restarting. Algorithm 2. 2. the savings are not worth the trouble. one can use ExpandArn (Algorithm 1.1) to implement it. minor savings can be had in the initial reduction of S to Hessenberg form. one chooses a starting vector. since by (2. THE RESTARTED ARNOLDI METHOD The Krylov-Schur cycle begins with a Krylov-Schur decomposition 329 where Uk+i is orthonormal and Sk is upper triangular. and transforms it by a unitary similarity to triangular form. the expanded Rayleigh quotient now has the form illustrated below for k = 4 and m = 8: Writing the corresponding Krylov decomposition in the form we now determine an orthogonal matrix Q such that Sm = Q^TmQ\s upper triangular and transform the decomposition to the form Finally.10) S is already partially reduced.SEC. For large n the work in the contraction phase is concentrated in the formation of the product ?7*Q[:. we use the exchange algorithms to transform the eigenvalues we want to keep to the beginning of Sm and truncate the decomposition. Thus the implicitly restarted Arnoldi cycle and the Krylov-Schur cycle are essentially equivalent in the amount of work they perform.

this algorithm expands the decomposition to one of order m and then contracts it back to one of order k.330 CHAPTER 5. KRYLOV SEQUENCE METHODS Given a Krylov-Schur decomposition of order k and an integer ra > k. Algorithm 2.3: The Krvlov-Schur cycle .

2.SEC. since ll(U) = 11>(V) and in both cases we are orthogonalizingthe the vectors Au.!:*].4. A2u. ^m+i are the same up to multiples of modulus one. Theorem 2. although it requires some proving. We must show that the subspaces associated with the final results are the same. it is natural to conclude that if we start from equivalent decompositions and remove the same Ritz values then we will end up with equivalent decompositions. Let P be the unitary transformation applied to the Rayleigh quotient in P and let Q be the one applied to the Rayleigh quotient of Q. • • • . THE RESTARTED ARNOLDI METHOD The equivalence of Arnold! and Krylov-Schur 331 Given that the implicitly restarted Arnoldi method with exact shifts and the KrylovSchur method both proceed by moving Ritz values out of their respective decompositions. Let be an unreduced Arnoldi decomposition and let be an equivalent Krylov-Schur form. the vectors Uk+2 • • • ^m+i and Vk+2. set Pk = P[:. l:k] are the same (see statement 14 in Algorithm 2. Suppose that an implicitly restarted Arnoldi cycle is performed on P and a Krylov-Schur cycle is performed on Q. Thus it makes sense to say that both methods reject the same Ritz values. we have Thus H and 5 are similar and have the same Ritz values.3). Proof. First note that the expansion phase results in equivalent decompositions. At this point denote the first decomposition by and the second by Note that for both methods the final truncation leaves the vector u unaltered. Then we must show that the subspaces spanned by UP[:. l:k] and Qib = <?[:. Now assume that both algorithms have gone through the expansion phase and have moved the unwanted Ritz values to the end of the decomposition. Since V = UW for some unitary W. l:k] and VQ[:. In fact. That is indeed the case. If the same Ritz values are discarded in both and those Ritz values are distinct from the other Ritz values. Vol II: Eigensvstems .2 and statement 11 in Algorithm 2. For brevity. then the resulting decompositions are equivalent.

We will first treat the behavior of the methods in the presence of rounding error. We will then discuss tests for the convergence of approximate eigenpairs. We will see in the next subsection that a similar procedure can be used to deflate converged Ritz pairs. This fact is not at all obvious from the algorithmic description of the Krylov-Schur method.3 used to return to an Arnoldi decomposition. the offending eigenvalue swapped out. Hence there is a unitary matrix R such that Qk = W^PkR. Although this seems like a lot of work. For example. On the other hand. the algorithms of this section are stable in the usual sense. we will show how to deflate converged eigenpairs from a Krylov decomposition. if the latter produces a persistent unwanted Ritz value. Since the Arnoldi and Krylov-Schur decompositions are both orthonormal Krylov decompositions.3.332 CHAPTER 5. STABILITY. if n is large and we delay multiplying transformations into the decomposition until the end of the process. Likewise. The Krylov-Schur method can be used as an adjunct to the implicitly restarted Arnoldi method. AND DEFLATION In this subsection we will consider some of the practicalities that arise in the implementation of the Arnoldi method. Comparison of the methods The implicitly restarted Arnoldi method and the Krylov-Schur method each have their place. • This theorem says that the implicitly restarted Arnoldi method with exact shifts and the Krylov-Schur method with the same shifts are doing the same thing. the matrix W^Pk spans Q. KRYLOV SEQUENCE METHODS By construction Tl(Pk) spans the eigenspace P of Schur vectors of H corresponding to the retained Ritz values. CONVERGENCE. something the latter cannot do. T^(Qk) spans the eigenspace Q of Schur vectors of S corresponding to the retained Ritz values. 2. when it comes to exact shifts the Krylov-Schur method is to be preferred because exchanging eigenvalues in a Schur form is a more reliable process than using implicit QR steps to deflate. Since W^W = H. In particular. we will present our results in the more general notation of the latter. . and Algorithm 1. it effectively costs no more than a contraction step. the decomposition can be reduced to Krylov-Schur form. We then have It follows that VQk and UPk span the same subspace. the Krylov-Schur method is restarting with a vector filtered by a polynomial whose roots are the rejected Ritz values. Stability Provided we take due care to reorthogonalize. By hypothesis each of these eigenspaces are simple and hence are unique (page 246). Finally. The former can restart with an arbitrary filter polynomial.

13) — that depends on the size of the problem and the number of cycles performed by the algorithm. the eigenpairs (//. Let 333 be an orthogonal Krylov decomposition computed by the implicitly restarted Arnoldi method or the Krylov-Schur method. • The vectors Uk^i are near the Ritz vectors UkwkIt follows that unless the Ritz pairs are very sensitive to perturbations in A. 2. • Since B = UHAU is a Rayleigh quotient of A. • The generic constant 7 grows only slowly as the algorithm in question progresses. Then there is an orthonormal matrix Uk+i satisfying and a matrix E satisfying such that Here 7 is a generic constant—in general different in (2. see the notes and references. This theorem has three implications. THE RESTARTED ARNOLDI METHOD Theorem 2. Thus we can take the output of our algorithms at face value. w. Alternatively.11) and (2. they will be accurately computed.SEC.-) of B are primitive Ritz pairs of A + E. there is a matrix F satisfying such that For a sketch of the the proof.5. Thus the algorithm can be continued for many iterations without fear of instability.. Vol II: Eigensystems .

can further magnify the error. The treatment of each of these methods is worth a volume in itself. instability in a direct solver or a lax convergence criterion in an iterative method can result in a comparatively large backward error in solving the system AKv = u: 111 conditioning of A K .14) to A"1. and the system must be solved by a direct sparse method or by an iterative method. Unfortunately. To relate the backward error in (2. due to the shift being near an eigenvalue of A. then the condition number \\AK\\ \\A~l \\ will be large and will magnify the multiplier (7||AK||||A~1||) in (2. and we must take into account the errors generated in solving this system. First.12). we can say that the computed solution will satisfy where Depending on the stability properties of the direct solver or the convergence criterion used by the iterative method. as we have already pointed out.12). is combined with the Arnoldi process. Convergence criteria We have already stressed in connection with the power method (p. the above analysis is quite crude. 61) that an appropriate criterion for accepting a purported eigenpair (/^. However. which replaces A in (2. From the first-order expansion we see that the backward error in AK l is approximately bounded by This bound compares unfavorably with (2. if K is very near an eigenvalue of A. something we have already noted in Example 1.334 CHAPTERS. To put things less formally.12) in two ways. z] is that the residual r = Az — JJLZ . the vector v = (A — K!)~IU will be computed by solving the system where AK = A — K/. it is possible for the constant 7 to be large. In most applications AK will be large and sparse.14) propagates to A~l.15). the constant 7 can be larger than its counterpart in (2. Second. KRYLOV SEQUENCE METHODS Stability and shift-and-invert enhancement When shift-and-invert enhancement with shift K. All this suggests that one must be careful in using shifts very near an eigenvalue of A. there is little in the literature to guide us. However.5. we must show how F in (2. and in practice accurate shifts may be less damaging than it suggests.

Thus if for some tolerance tol. the residual is cheap to compute. More generally. W) is a primitive Ritz pair. At first glance. Z) is an approximate orthonormal eigenpair with a small residual then by Theorem 2. If the decomposition is an Arnoldi decomposition. Then Proof. there is a matrix E satisfying such that (M. the residual (2. • If (M.SEC. But for the most part. and the expression (2.6. 2. then BW — WM = 0. THE RESTARTED ARNOLDI METHOD 335 be sufficiently small. Z) is an exact eigenpair of A + E. Thus. It is easily verified that Since There are three comments to be made about this theorem. Chapter 4.19) reduces to ||6HW||2. For if Z has p columns. we will be concerned with ordinary eigenvalues and their eigenvectors. However. the computation requires p matrix-vector multiplications. Moreover: Vol II: Eigensystems . Z) = (M. Z) is an exact eigenpair of A.18) provides a convergence criterion for determining if eigenpairs produced by an algorithm have converged. where U is from an orthonormal Krylov decomposition. if Z = UW. (2. so that 6H = /tejj-. then (M.16) would appear to be expensive to compute. if (M.8. as the following theorem shows. Let be an orthonormal Krylov decomposition and let (M. where the (normwise) relative error in A is tol. Theorem 2. then ||^||F is just the norm of the last row of W. UW) be an orthonormal vair. • We have chosen to work with eigenspaces because that is what is produced by complex eigenvalues in a real Schur form. not to mention the arithmetic operations involved in computing ZM.

it will be minimized when M is the Rayleigh quotient W^BW. we may not be able to compute \\A\\p—especially if we are using shift-and-invert enhancement. if we are concerned with the approximate eigenpair (^.6.17) relates to the shifted matrix. KRYLOV SEQUENCE METHODS where Uk is the last component ofw. However. Thus we might say that a Ritz pair (Af. In practice.22) we have the residual It follows that for the relative backward error in A not to be greater than tol. Specifically. and from (2. since we are generally concerned with exterior eigenvalues. In particular.336 CHAPTER 5. then In terms of the original matrix A. one should use the Rayleigh quotient. However. • If (M. the approximate eigenvalue is K + // l.21) for Ritz pairs . Chapter 4. then the formula (2. Hence our convergence test becomes This is exactly the same as the criterion (2. not the refined or harmonic Ritz value. the norm of the Ritz block M is often a ballpark approximation to m||p. when shift-and-invert enhancement is used—say with shift K — the residual R in (2. UW) has converged if When the concern is with eigenvectors rather than eigenspaces. W) is not a Ritz pair. we must have Now it is reasonable to assume that ||A — K. the criterion becomes Convergence with shift-and-invert enhancement As in the case of stability.19) cannot be simplified. z). when checking the convergence of refined or harmonic Ritz vectors.I\\p and ||A||F are of the same order of magnitude. by Theorem 2.

but as its size increases during the course of the algorithm. Theorem 2. and let (Af. 2.7. The following theorem relates the residual norm of an approximate subspace to the quantities we must set to zero to deflate. the savings will be marginal. In particular. The second advantage of the deflated decomposition is that we can save operations in the contraction phase of an Arnoldi or Krylov-Schur cycle. the savings become significant. First by freezing it at the beginning of the Krylov decomposition we insure that the remaining space of the decomposition remains orthogonal to it. we will never have an exact eigenspace in our decompositions. and set and Then Vol El: Eigensystems . There are two advantages to deflating a converged eigenspace. THE RESTARTED ARNOLDI METHOD Deflation 337 We say a Krylov decomposition has been deflated if it can be partitioned in the form When this happens. If the order of BU is small.SEC. Z) — (M. The expansion phase does not change. we can apply all subsequent transformations exclusively to the eastern part the Rayleigh quotient and to (C/2 Us). Let be an orthonormal Krylov decomposition. Let (W W_\_) be unitary. UW) be an orthonormalpair. and we end up with a decomposition of the form Now since B\\ is uncoupled from the rest of the Rayleigh quotient. Of course. we have AU\ = UiB\\. this gives algorithms the opportunity to compute more than one independent eigenvector corresponding to a multiple eigenvalue. so that U\\ spans an eigenspace of A.

If we restore the quantities that were zeroed in forming (2.26) is backward stable. suppose that AZ — ZM is small. KRYLOV SEQUENCE METHODS From (2. we transform the Krylov decomposition AU — UB = ub** to the form tHEN BY (2. we get the following relation: . Let CHAPTER 5. • To see the consequences of this theorem.ZM\\p is sufficiently small. Stability of deflation The deflation procedure that leads to (2.20).23) with equality if and only if M is the Rayleigh quotient W^BW. Now if the residual norm \\AZ . we have The theorem now follows on taking norms. using (W W±).26). and.338 Proof. we may set #21 and 61 to zero to get the approximate decomposition Thus we have deflated our problem by making small changes in the Arnoldi decomposition.

Since the deflation procedure only requires an orthonormal basis for the approximate eigenspace in question. write the deflated decomposition (2. Z{) (i = 1. the residual for Z becomes VolII: Eigensystems .. normalized Ritz pairs (/^-.26) in the form Then there is an E satisfying such that Equality holds in (2.8.. we can compute a QR factorization of Z and use the orthonormal basis Z in the deflation procedure. and the vectors in these pairs cannot be guaranteed to be orthogonal.then(A + E)U = UB + utP.7. Theorem 2. .. p) of one kind or another. THE RESTARTED ARNOLDI METHOD If we write this decomposition in the form 339 then IfwesetE = -RUH. . Unfortunately. If we arrange the vectors in a matrix Z and set M = diag(//i.21) if and only if M is the Rayleigh quotient ZHAZ = WR B W.p).. Instead we will be confronted with converged. (J. Nonorthogonal bases In practice we will seldom encounter a converging subspace unless it is a two-dimensional subspace corresponding to an imaginary eigenvalue in a real Schur decomposition. 2.SEC. the residual R — AZ — ZM must be small because the individual residuals are small.. Deflating an approximate eigenspace with residual norm p corresponds to introducing an error of no more than p'm A. We may summarize these results in the following theorem. Under the hypotheses of Theorem 2.. This theorem says that the deflation technique is backward stable.

Specifically.4 implements this scheme. This procedure can be repeated as often as desired. we accumulate the transformations in an auxiliary matrix Q and apply them to U only at the end of the deflation process. and 622 ls tne Schur form that remains after the contraction phase. It may seem paradoxical that we could have. they must be very accurate to determine their common subspace accurately. After a cycle of the algorithm. If some other diagonal element of 622 ls a candidate for deflation. at the end of the contraction phase the Arnoldi decomposition will have the form where #22 is upper Hessenberg. Deflation in the Krylov-Schur method The deflation procedure is not confined to eigenpairs calculated by a Rayleigh-Ritz procedure. KRYLOV SEQUENCE METHODS where M = SMS~l. and use Algorithm 1. this is an inexpensive algorithm. it can be used to deflate harmonic Ritz vectors or refined Ritz vectors. Although the procedure is roundabout. deflate as in the Krylov-Schur form. but which taken together cannot be deflated. if Ritz vectors are the concern. let the current decomposition have the form Here U\ represents a subspace that has already been deflated. two vectors each of which we can deflate. . if n is large.3 to bring the decomposition back to an Arnoldi form. it is not expensive. Consequently. The procedure is to reduce #22 to Schur form.340 CHAPTER 5. The resolution of this paradox is to remember that we are not deflating two vectors but the subspace spanned by them. To deflate the the Ritz pair corresponding to the (1. there is an easy way to deflate Ritz vectors in the Krylov-Schur method. If. all we have to do to deflate is to verify that that component satisfies our deflation criterion.1)-element of £22 we must set the first component of 62 to zero. \\S\\2 will be large. Deflation in an Arnoldi decomposition Deflating an Arnoldi decomposition proceeds by reducing it to a Krylov-Schur decomposition. we can exchange it into the (1. If the columns of Z are nearly dependent. Imposition of 622 > and test as above. For example. If the vectors are nearly dependent. and the residual may be magnified—perhaps to the point where the deflation cannot be performed safely. However. Algorithm 2. say. as usual.

allow both a user-specified option and a default opVol II: Eigensystems . 2. In that case. The advantage of this choice is that one may save calculations by not demanding too much. Algorithm 2. It is therefore necessary to say how such a criterion should be chosen.SEC.18) suggests stopping when the residual norm of the eigenpair in question is less than tol-a. There is no point in choosing tol smaller. There are two possibilities for choosing tol. the eigenpair comes from a perturbed matrix with relative error tol. The difficulty is that the choice is highly problem dependent and must be left to the user. since the operations in the algorithms already impose a backward error of the same order (see Theorem 2. For some i > j. such as ARPACK. where a is some convenient estimate of the size of A. The reason is that it is comparatively inexpensive to do so. Choose tol to be a modest multiple of the rounding unit. usually allow the user to furnish a tolerance to decide convergence. 2. This algorithm determines if the value can be deflated and if so produces a deflated decomposition. The reason is that the computations are potentially so voluminous that it is unreasonable to continue them beyond the accuracy desired by the user. Choose tol to be something that gives a backward error that one can live with.5). Instead they iterate until the limits set by finite precision arithmetic are reached. Algorithms for large eigenproblems. their implementations seldom use an explicit parameter to determine convergence. let the Ritz value sa be a candidate for deflation.4: Deflating in a Krylov-Schur decomposition Choice of tolerance Although algorithms for dense eigenproblems are necessarily iterative. Good packages. on the other hand. THE RESTARTED ARNOLDI METHOD Let the Krylov-Schur AU = US + ubH have the fora 341 where Su is of order j. 1. The criterion (2.

when the new shift is not far removed from the old. RATIONAL KRYLOV TRANSFORMATIONS The principle use of shift-and-invert transformations in Arnoldi's method is to focus the algorithm on the eigenvalues near the shift K.KI!)~I to one in (A . Krylov sequences. the tolerance for deflation should of the same order of magnitude as the tolerance for convergence. when the desired part of the spectrum is spread out.4. inverses. Chapter 4) It follows that Equation (2. We will now describe an algorithm for doing this. Since deflation is backward stable (see Theorem 2. it should be made somewhat larger to allow multiple vectors to be deflated. It is a surprising fact that we can change a Krylov decomposition from one in (A . it is interesting to see what lies behind it. then the sequence with its terms in reverse order is so that Moreover by the shift in variance of a Krylov sequence (Theorem 3.29) says that the Krylov subspace in (A — KI/) 1 with starting vector u is exactly the same as the Krylov subspace in (A — K^)"1 with a different starting vector w. Thus the problem posed above amounts to restarting the Krylov sequence with the new vector w. it may necessary to use more than one shift. However. and shifts Although the computational algorithm is easy to derive. 2. However.8). However. . the Amoldi decomposition may still contain useful information about the eigenvectors near the new shift—information that should be kept around.3.K/2/)"1 without having to perform an inversion. This subsection is devoted to describing how this is done. See the discussion on page 339. KRYLOV SEQUENCE METHODS tion based on the rounding unit. One way of doing this is to restart with a new shift and a new vector.342 CHAPTER 5. Suppose we have a Krylov sequence If we set v = (A — Ki/) 1 u.

As one gets better approximations to an eigenvalue Vol II: Eigensystems . the transformations to U can be saved and applied at the end of the algorithm. Algorithm 2. In particular. let be a QR decomposition of / + (KI . • Since /+ («i — KZ )H is upper Hessenberg. At first sight it might appear that the rational Krylov transformation is a good way of getting something for nothing. this structure makes it easy to preserve deflated vectors in the initial part of the decomposition. the application of R~l can magnify the residual error in the decompositions. Then This is a Krylov decomposition. As usual.K2H). which can be reduced to an Arnoldi decomposition by Algorithm 1. 2. If R is ill conditioned. • The algorithm is quite economical involving no manipulations of A. Here are some comments. • The algorithm is not unconditionally stable.5 implements this procedure.3. the Q in the factorization in statement 1 can be computed as the product of k plane rotations. THE RESTARTED ARNOLDI METHOD The rational Krylov method Let us begin with an orthonormal Arnoldi decompositioi 343 Then Henc where To reduce this relation to a Krylov decomposition.SEC.

the closer the new shift is to an eigenvalue. Equivalent block methods for several eigenvalues exist. provided the number of eigenvalues is less than the block size. the argument goes. Nor does the method save the trouble of computing a factorization. restarting. . NOTES AND REFERENCES Arnoldi's method. Consequently.K2/)-1 to a vector. If one wants to continue the sequence. §FV. See [45] and [93]. one can use them as shifts and get a sequence of Krylov decompositions with increasingly effective shift-and-invert enhancers.5. the accuracy of eigenvector approximations is not affected. 2. is was Saad [230. this algorithm computes an Arnoldi decomposition Algorithm 2. and filter polynomials Although the Arnoldi method is indeed due to Arnoldi [5]. one must be able to apply (A . To put it another way. 584-602] and [233. Once the eigenvector has been found it can be removed from the problem by various deflating techniques. the poorer the new starting vector. which have been surveyed in [300.344 Given an Arnold! decomposition CHAPTER 5. It is important to distinguish between restarting the Arnoldi process to compute a single eigenpair and to compute several eigenpairs simultaneously. The former problem is relatively easy: just restart with the best available approximation to the desired eigenvector. pp.2].5: The rational Krvlov transformation of A. 1980] who first presented it as an effective algorithm for computing eigenpairs of large nonHermitian matrices. KRYLOV SEQUENCE METHODS of order k and a shift ^2. The flaw in this argument is that the rational Krylov transformation does not change the Krylov subspace.

An alternative — called reverse communication—is for the general-purpose routine to return to the calling program with a flag specifying the desired operation.) ARPACK New algorithms for large eigenproblems are usually implemented in MATLAB for testing but seldom end up as quality mathematical software. Lehoucq.) He also gives convergence results for the case of a fixed filter polynomial and the case of exact shifts for a symmetric polynomial. 180] he pointed out that a good combination is hard to find. it must get data from the original program. Saad [230] proposed restarting with a linear combination of approximate eigenvectors from the Krylov subspace. The breakthrough came with Sorensen's 1992 paper [246] entitled "Implicit Application of Polynomial Filters in a fc-Step Arnoldi Method.1996] showed why. THE RESTARTED ARNOLDI METHOD 345 The problem is far different when it is desired to restart the Arnoldi process while several eigenvectors are converging. but in 1990 [233. including exact and inexact shifts. In 1982. One solution is for the caller to specify a subroutine to perform the operation in the calling sequence. Vol II: Eigensystems . The problem with this procedure is that once again one must choose a vector to purify. p." In it he described the implicitly restarted Arnoldi algorithm as presented here. (For other convergence results see [160]. however. performs the operation and returns the results via another call to the general-purpose routine. the "implicit" in implicit restarting does not refer to its use of implicit QR steps but to the fact that the starting vector is chosen implicitly by the shifts in contraction procedure.g. A happy exception is the implicitly restarted Arnoldi algorithm.1984]. The problem for a general-purpose routine is how to get the caller to perform the operations in question.SEC. Sorensen. have produced ARPACK [162]. However. a more complicated operation may be required — e. a well-coded and well-documented package that bids fair to become the standard for large non-Hermitian eigenproblems. for shift-and-invert enhancement. which can only be done with messy global variable structures.. In an elegant paper Morgan [180. The idea of following a sequence of Arnoldi steps with the application of a filter polynomial to purify the restart vector is due to Saad [232. which knows its own data. To restart with a single approximate eigenvector would destroy the information accumulated about the others. The calling program. once this subroutine is called. (By the way. building on a design of Phuong Vu. The simplest example is the computation of Ax. and Yang. Exchanging eigenvalues Stewart [256] produced the first program for exchanging eigenvalues in a real Schur form. Reverse communication All routines for solving large eigenvalue problems require that some operation be performed involving the matrix in question and a vector. 2. The method is to iterate QR steps with an exact shift until the process converges.

But it is easily proved. due to Stewart [267]. Schur vectors can also be used to restart the latter. From the properties of the Gram-Schmidt algorithm with reorthogonalization [51. Also see [181]. which is then truncated to become an Arnoldi decomposition. an unwanted Ritz pair can fail to be purged from the decomposition. Lehoucq and Sorensen [158] propose a procedure for modifying the Krylov subspace so that it remains orthogonal to the left eigenvector of a converged. both resulting from its commitment to maintaining a strict Arnoldi decomposition. precursors of various aspects can be found in the literature. Forward instability The forward instability of the QR step has been investigated by Parlett and Le [207] for symmetric tridiagonal matrices and by Watkins [289] for Hessenberg matrices. Stathopoulos. Since the method is centered around a Schur decomposition of the Rayleigh quotient. Ch. Regarding the use of Schur vectors. for symmetric matrices Wu and Simon [306] exhibit what might be called a Lanczos-spectral decomposition. solves both these problems by expanding the class of transformations that can be performed on the underlying decomposition. Lehoucq [personal communication] has written code that uses Schur vectors in deflation. The algorithm described here is due to Bai and Demmel [11]. KRYLOV SEQUENCE METHODS The method is stable but can fail to converge when the separation of the eigenvalues in the blocks is poor. The introduction of general orthonormal Krylov decompositions. Fokkema. Although the method is new. a special case of our Arnoldi-Schur decomposition. Sleijpen. Second. In fact. What distinguishes the Krylov-Schur approach is the explicit introduction and systematic manipulation of Krylov decompositions to derive and analyze new algorithms.117] and the general rounding-error analysis of unitary transformations [300.5 has not appeared in the literature. it is called the Krylov-Schur method. Stability Theorem 2. It has been implemented in the LAPACK routine xTREXC. It too can fail. The Krylov-^Schur method The iteratively restarted Arnoldi procedure has two defects. and Wu [248] point out that since the unpreconditioned Jacobi-Davidson algorithm is equivalent to the Arnoldi algorithm. First. Saad.346 CHAPTER 5. but the failures do not seem to occur in practice.8. 3]. the contraction step of the iteratively restarted Arnoldi method produces an orthonormal Krylov decomposition. it is awkward to deflate converged Ritz pairs. Closer to home. and van der Vorst [74] explicitly use Schur vectors to restart the Jacobi-Davidson algorithm. unwanted Ritz pair. which shows that they can never yield a decomposition that is not based on a Krylov sequence. To deal with the problems it poses for the iteratively restarted Arnoldi method. This lead to attempts to formulate noniterative algorithms. it is easy to show . The manipulations are justified by Theorem 1.

and throws the error back on A as in Theorem 2. then the Rayleigh quotient 1 k is a tndiagonal matrix of the form Vol II: Eigensystems . Deflation Deflating converged Ritz values from an Arnoldi decomposition is a complicated affair [158]. The deflation technique for Arnoldi-Schur decompositions is closely related to Saad's method of implicit deflation [233. this is the problem that lead me to orthonormal Krylov decompositions. and it is not clear that the procedure can be extended to arbitrary vectors.8. THE LANCZOS ALGORITHM We have seen that if A is symmetric and is an Arnoldi decomposition. and that the departure of U from orthonormality is of the same order.SEC. From this one constructs (7. The rational Krylov transformation The rational Krylov method is due to Ruhe [221]. The analysis of convergence criteria for shift-and-invert enhancement is new. §VI.3]. absorbs its error in the residual. Parlett. and Jensen [184] discuss the effect of accurate shifts on the Rayleigh quotient and conclude that the graded structure of the Rayleigh quotient makes it a nonproblem. In fact. such as refined and harmonic Ritz vectors. 3. Shift-and-invert enhancement and the Rayleigh quotient Nour-Omid. 3. The fact that it may be impossible to deflate a group of several individually deflatable eigenvectors appears to be known but has not been analyzed before.2. Chapter 4. Convergence Converge criteria for the Arnoldi processes are discussed in [161]. THE LANCZOS ALGORITHM 347 that the residual AU ~ UB — w6H of the computed decomposition is of the order of the rounding error. Ericsson. who described it for the generalized eigenvalue problem.

The alternative is to save the Lanczos vectors in a backing store and move them into main memory to reorthogonalize the new Lanczos vectors—presumably only when absolutely necessary. they can lose orthogonality.348 CHAPTER 5. since the reorthogonalization causes the Rayleigh quotient to become upper Hessenberg instead of tridiagonal. In practice. There are two problems associated with .e. in which the orthogonality of the Lanczos basis is allowed to deteriorate up to a point. strictly speaking. we reserve the right to insert a factor of \\A\\ in our bounds whenever it seems appropriate. This fact is one of the main complications in dealing with the Lanczos algorithm in inexact arithmetic. The purpose of this section is to describe the practical implementations of this method. and the chief problem in implementing a Lanczos method is to maintain a satisfactory degree of orthogonality among them. As in the Arnoldi algorithm. 3. the Lanczos vectors Uj computed by the Lanczos algorithm must be orthogonal. after which orthogonalization is performed. we can reorthogonalize a straying Lanczos vector Uk against its predecessors. we will assume throughout this sftrtinn that: However. Mathematically.1. REORTHOGONALIZATION AND THE LANCZOS ALGORITHM Lanczos with reorthogonalization In finite precision arithmetic the vectors Uk produced by the Lanczos process will deviate from orthogonality. A key to deriving Lanczos-based algorithms is knowing when quantities can be neglected.3 we will consider semiorthogonal algorithms. however. To avoid cluttering our expressions with factors of \\A\\. we get the Lanczos method. a Lanczos decomposition.. KRYLOV SEQUENCE METHODS This implies that the vectors Uk can be generated by a three-term recurrence of the form where When we combine this recursion with the Rayleigh-Ritz method. There are two solutions. We will begin by describing a general form of the algorithm that encompasses most of the variants that have appeared in the literature. when they are of the order of the rounding unit compared with A. The first is to reorthogonalize the vectors at each step and restart when it becomes impossible to store the Lanczos vectors in main memory. We will then treat the restarted Lanczos algorithm in which orthogonalization is performed at each step. i. In both cases the result of the orthogonalization is something that is not. In §3.

To avoid this problem. • Although we have written a nonterminating loop. the time taken to transfer Lanczos vectors from the backing store will dominate the calculation.g. THE LANCZOS ALGORITHM 349 Given a starting vector HI this algorithm computes a sequence of suitably orthogonalized Lanczos vectors. But if a reorthogonalization is performed at each Lanczos step. Second. No reorthogonalization: v is accepted as is at each iteration. The actual storing of these vectors is not a problem. Algorithm 3. it obviously will have to stop at some point—e. the preceding vectors must be stored and then brought back into the fast memory of the machine. since we have to save them anyway to compute Ritz vectors. 2. VolII: Eigensystems . 3. Full reorthogonalization: v is always reorthogonalized against all its predecessors. we can agree to tolerate a prescribed degree of nonorthogonality so that we do not have to reorthogonalize at every step. to restart or to deflate converged Ritz vectors. First. 1.1: Lanczos procedure with orthogonalization this procedure. we can make two compromises.1 sketches a general procedure that encompasses most of the variants of the Lanczos algorithm.. Algorithm 3. First. this code encompasses five common variants of the Lanczos method. They are a useful guide to what follows.SEC. we can reorthogonalize Uk only against those vectors to which it is actually nonorthogonal. The following comments should be read carefully. • Depending on how the decision is made to reorthogonalize and the on details of the reorthogonalization process.

Selective reorthogonalization: v is reorthogonalized against a subspace of the span of its predecessors determined by the current Ritz vectors. so that instead of a Lanczos decomposition we have the relation where Hk is upper Hessenberg.2). In the next two subsections we will treat full and periodic reorthogonalization. This is because in spite of the fact that reorthogonalization restores orthogonality.It turns out that the vector v in statement 6 can fail to be orthogonal to Uk and Uk-i to working accuracy. This is not even an Arnoldi decomposition. To distinguish . 5. Chapter 4. it does not affect the coefficients significantly: the loss of orthogonality corresponds to errors in ak and /^-i that are below the rounding threshold. 4. The cure is to reorthogonalize. Periodic reorthogonalization: v is reorthogonalized against all its predecessors. KRYLOV SEQUENCE METHODS 3. Specifically. Note that the coefficients o^ and (3k-i are not updated during this process. For the other procedures see the notes and references. The second problem is how to reorthogonalize. The first problem is to determine when global reorthogonalization is needed.350 CHAPTER 5. • Statements 9-11 attempt to enforce global orthogonality—that is. but only when it fails a test for orthogonality. Nonetheless. Partial reorthogonalization: v is reorthogonalized against a subset of its predecessors when it fails a test for orthogonality. It will be useful to fix our nomenclature and notation at this point. Statements 7-8 attempt to enforce local orthogonality of v against Uk and u^-i. there are effective work-arounds for this problem. If eigenpairs outside this eigenspace are desired. since with periodic. See §3. In what follows we will tacitly assume that this is done whenever necessary. the Rayleigh quotient becomes upper Hessenberg.Thus we use the term "reorthogonalize" in statement 10 in a general sense—to be specified in the context of the variant in question. In that case Uk effectively spans an eigenspace of A. partial. • It may happen that /3k becomes so small that it can be neglected. we simply have to to reorthogonalize Uk against Uk-i • Other variants require that we also reorthogonalize Uk-i against Uk-2. and selective reorthogonalization the matrix Uk+i is not required to be orthogonal to working accuracy. • The loss of orthogonality in the Lanczos procedure is of two types: local and global. The matrix Hk is a Rayleigh quotient in the general sense of Definition 2. the orthogonality of v against Uk-i • There are three problems here. The third problem is that reorthogonalization contaminates the Lanczos decomposition. which is done in statements 7-8.3. the Krylov sequence must be restarted. We do this by keeping a running estimate of the inner products U^Uk (see Algorithm 3.7. If we are performing full reorthogonalization.

since if ((V v) T ) is a left inverse of (Uk Wfc+i).2. . Implicit restarting In its larger details implicitly restarting a Lanczos decomposition is the same as restarting an Arnoldi decomposition (see §2.7. the elements above the first superdiagonal must be small. The reader is assumed to be familiar with the contents of §2. then Hk = VTAUk. Since the elements of Hk below the first subdiagonal are zero. THE LANCZOS ALGORITHM 351 it from the ordinary Rayleigh quotient Tjt we will call it the adjusted Rayleigh quotient. Thus the main difference between the Lanczos algorithm with full orthogonalization and the Arnoldi algorithm is that the Rayleigh quotient for the former is symmetric (to working accuracy). We will denote this deviation by SL.1 will be of paramount importance. If we set then the restarted decomposition is VolII: Eigensystems . FULL REORTHOGONALIZATION AND RESTARTING When the Lanczos algorithm is run in the presence of rounding error with full reorthogonalization the results are stable in the usual sense. 3.2). . In what follows the deviation of Hk from the tridiagonal matrix (3.SEC.13) in Theorem 2. In this subsection we will briefly sketch the resulting simplifications. Chapter 4. Hence Hk is a small perturbation of a symmetric matrix. Ac m _A:. To see this. postmultiply the relation (2.1). . Note that it is truly a Rayleigh quotient in the sense of Definition 2.to give the transformed decomposition where Q is the matrix of the accumulated QR transformations.1) generated by Algorithm 3. This implies that the elements above the superdiagonal in the Rayleigh quotient are negligible and can be ignored. Given the Lanczos decomposition and m—k shifts KI . . so that 3. one applies m—k implicit QR steps with shifts «t.5 by Uk to get where Bk is now the matrix Hk in (3.

(i = 1. The reduction to diagonal form is done by the symmetric tridiagonal QR algorithm. .3.3. m). If this pair is moved to the beginning of the Rayleigh quotient. Consequently the residual norm for the ith Ritz vector is the magnitude of the ith component /% of 6. one can be more sophisticated about choosing filter polynomials than with the Arnoldi method.e. Since the Schur decomposition of a symmetric matrix is its spectral decomposition. Consequently. Since the Rayleigh quotient is diagonal.. of course. For more see the notes and references. The first half of this subsection is devoted to the definition and consequences of semiorthogonality. Both general and exact shifts can be used in restarting the Lanczos algorithm. if this component satisfies the criterion for deflation. then the primitive Ritz vectors are the unit vectors e. . Krylov-spectral restarting The comments on the forward instability of the QR algorithm for Hessenberg matrices (p.. The second shows how to maintain semiorthogonality by timely reorthogonalizations. 325) apply to the tridiagonal Rayleigh quotient of the Lanczos algorithm. The chief difference between the Krylov-Schur and the Krylov-spectral methods is in the treatment of the Rayleigh quotient. KRYLOV SEQUENCE METHODS The major difference between implicit restarts for the Lanczos and the Arnoldi decompositions is that implicit symmetric tridiagonal QR steps are used in the former (see Algorithm 1. This number can be used as in §2. Convergence and deflation If is the current Krylov-spectral decomposition. allow orthogonality to be completely lost. it will pay to use the analogue of the Krylov-Schur method. when exact shifts are to be used in restarting. if we suitably restrict the loss we can compute accurate Ritz values and partially accurate Ritz vectors. we will call this analogue the Krylov-spectral method. there is no need for special routines to exchange eigenvalues or eigenblocks. SEMIORTHOGONAL METHODS We turn now to methods that allow the orthogonality of the Lanczos method to deteriorate. eigenvalues can be moved around by simple permutations—i. A method that performs reorthogonalization only when there is danger of the loss becoming too great is called a semiorthogonal method. The fact that one Ritz pair has been deflated does not affect the later deflation of other Ritz pairs.352 CHAPTER 5. Contrast this with the discussion on page 339. For large n this makes no difference in efficiency. its component in 6 follows it... We cannot. However. A particularly satisfying aspect of the Krylov-spectral decomposition is that the deflation of Ritz pairs is uncoupled. 3.3 to determine the convergence of the ith Ritz pair. it can be set to zero and the decomposition deflated. But because the spectrum of A is real. Consequently. Chapter 3).

1. we have If 11/ .1. there is nothing to keep us from using a larger value of Tmax — saY»to reduce the number of reorthogonalizations. will limit the attainable accuracy of our results. where W is strictly upper triangular. we would say that U is orthogonal to working accuracy. • Since the diagonal elements of C are one. from the elementary properties of the 2-norm. Since this small deviation from normality makes no difference in what follows. and let C = UTU. 353 Definition 3. Thus semiorthogonality requires orthogonality to only half working accuracy.C\\2 were of the order of the rounding unit. the members of these bases will not have norm exactly one. then Vol II: Eigensystems . • Our definition of semiorthogonality is meant to be applied to the bases generated by Algorithm 3. not the rounding unit.SEC. However. we will tacitly assume that all our bases are normalized. The QR factorization of a semiorthogonal basis Let U be semiorthogonal and let be the QR factorization U. Let U eRnxh have columns of 2-norm one. THE LANCZOS ALGORITHM Semiorthogonal bases We begin with a definition. The price we pay is that the errors that were formerly at the level of the rounding unit are now of the order 7max» and this number. 3. Then If we write R = I+ W. the definition says that U is semiorthogonal if the elements of / — C are within roughly the square root of the rounding unit. Owing to rounding error. • Choosing 7max to be about the square root of the rounding unit has the convenience that certain quantities in the Lanczos process will have negligible errors on the order of the rounding unit. they will be normal to working accuracy. However. Specifically. because they are normalized explicitly (statements 12 and 13). Then U is SEMIORTHOGONAL if Three comments on this definition.

2)] that reorthogonalization in the Lanczos algorithm introduces elements in the upper triangular part of the Rayleigh quotient. these elements are insignificant and can be ignored. we have To get some notion of what this approximation means. . Consequently. Then the vector d of reorthogonalization coefficients is Since \\QTq\\2 = 1 and RT £ /. The size of the reorthogonalization coefficients We have observed [see (3.19)] we can expect it to be of the same order of magnitude. Now 7 must be larger than 7max> smce a loss of semiorthogonality is just what triggers a reorthogonalization. these elements may be too large to ignore. Let where q£lZ(Q) and q_i_£'R(Q)'L have norm one. destroying its tridiagonal structure. Let Uk be semiorthogonal and let Uk = QRbe its QR factorization.354 CHAPTER 5. We will now show that if our basis is semiorthogonal. KRYLOV SEQUENCE METHODS where the approximation on the right is to working accuracy. note that 7 is the norm or the projection of Uk+i onto the column space of Uk and hence measures how w^+i deviates from orthogonality to 7£(£/&). Since the elements of the difference Sk of the adjusted Rayleigh quotient Hk and Tk depend on the components of d [see (3. It follows that up to terms of 0(6 M ) the matrix W is the strictly upper triangular part of C and hence that In particular We will use these facts in the sequel. Let Pk^k+i — Auk — akUk — fik-\uk-\ be the relation obtained after the Lanczos step but before the reorthogonalization. For full reorthogonalization. In the Lanczos process it is rare to encounter an extremely small /?£. Let us suppose that we must reorthogonalize at step k. we can expect the elements of d to be on the order of ^/CM .

so that most of its elements are much less than one. Although w has 2-norm one. it can be used to compute accurate Ritz values. Thus the product of (R — I)w may be considerably less than y7^. Its proof is beyond the scope of this volume.Thus we can expect the Ritz values from Tk to be as accurate as those we would obtain by full reorthogonalization. Perhaps more important is the effect of semiorthogonality on the residual norms. Let be the decomposition computed by Algorithm 3. Let Uk — QkRk be the QR factorization ofUk. since it says that the errors in the Ritz vector can be as large as y^.5) we have At first glance this result appears to be discouraging. since it is these numbers that we use to decide on convergence. The effects on the Ritz vectors Although the effects of semiorthogonality on the Rayleigh quotient and its Ritz values are benign.1.1 and assume that Uk is semiorthogonal.7) that Vol II: Eigensystems . they have the potential to alter the corresponding Ritz vectors. Instead we must compute the vector UkW. we see from (3.Then The theorem says that up to rounding error.SEC. 3. THE LANCZOS ALGORITHM The effects on the Rayleigh quotient 355 We have observed [see (3. Moreover. Since by (3. Now if (//. its elements decrease as the vector converges. The problem is that although a primitive Ritz vector w will be accurately determined by Tk (provided it is well conditioned) we cannot compute the corresponding Ritz vector QkW.2. Theorem 3. w) is a primitive Ritz pair. as the following remarkable theorem shows. Nonetheless. there are mitigating factors.2)] that the Rayleigh quotient Hk = Tk + Sk is not the same as the tridiagonal matrix Tk computed by Algorithm 3. Tk is a Rayleigh quotient associated with the space spanned by Uk. only some of the elements ofR-I are of order y^. However.

However. the comments about eigenvectors apply here also. Instead one must carry along the matrix Sk and use it in computing residual norms. it is almost as expensive to compute the inner products ujuk+i as to perform a full reorthogonalization. Be that as it may. we can use the inverse power method on Hk = Tk + Sk to correct the vector. Estimating the loss of orthogonality Algorithms based on semiorthogonality allow the orthogonality to deteriorate until a vector Uf. measure the departure from orthogonality of i/. since Hk is tridiagonal with a few spikes above the diagonal in the columns corresponding to reorthogonalization steps. Unfortunately. Such a process is cheap. which are the inner products 7^ = ujuj. since it may be smaller than the right-hand side of (3. The residual does not have to stagnate once it reaches -)/€M. We will now show how to estimate the loss of orthogonality as the Lanczos algorithm progresses.+i violates one of the semiorthogonality conditions at which point Uk+i is reorthogonalized.1 and let Then the size of the off-diagonal elements of Ck. the above analysis shows clearly that we cannot rely on the usual estimate \ftk^k\ to decide on convergence. For the sake of continuity. Although it is not expensive to compute and store Sk (or equivalently Hk). we will leave this derivation to the end of the section and assume that Sk is available whenever we need it. Hence the residual norm is Thus the semilarge values in 5^ can inflate the residual norm. KRYLOV SEQUENCE METHODS Since Uk+i is semiorthogonal. Let Uk be the Lanczos basis computed by Algorithm 3.. Another advantage of saving this matrix is that if a particular Ritz pair stops converging.356 CHAPTER 5. It turns out that we can use the relation (3. Suppose that we have computed Ck and we want to compute Ck+i • Write the jth column of (3.8).7) to derive a recurrence for the 7^-. since it is the product SkW that determines the the inflation. it has a QR factorization QR with R = I.and HJ.7) in the form where hj is a quantity of order || A||eM that accounts for the effects of rounding error in the relation. the formulas for the columns of Sk are not easy to derive. Also write .

11). experience indicates that \\ht\\2 < \\A\\2€M.. and simplifying. and Vol H: Eigensystems . .1.2 implements this scheme.We adjust the sign of this replacement so that it always increases 7^+1. u]. Consequently.^ themselves. Two comments. What we need are estimates of the 7. k—2 we can compute all but the first and last elements of row k+l of Ck. subtracting.10) by uj.j. Unfortunately.j = u£+lv.. But these bounds grow too quickly to be useful: they will trigger reorthogonalizations before they are needed.SEC.UkSj. u k+l"J Multiplying (3. Consequently. Algorithm 3. where a is a suitable estimate of ||-A||2. In exact arithmetic with accurate values of qjhk.* = uk+i uk IS °n the order of the rounding unit.9) by u^ and (3. q£hj. Algorithm 3.The first element can be computed by leaving out the term /^-^j-i in (3. But because we are maintaining semiorthogonality. Because we maintain local orthogonality. It requires an estimate a of \\A\\2. we obtain Letting j = 2.2: Estimation of uT .j.The quantities u^UjSj and uJUkSk are in principle computable. However. 3. THE LANCZOS ALGORITHM 357 Given the output of Algorithm 3. with aeM. they are also on the order of aeM. the quantity 7^+1. the quantities qjhj and q^hk are unknown.. rp • By taking absolute values in (3. we can absorb them into the corrections for qjhj and q^hk. we can replace qjhj and q^hj. this algorithm computes estimates of the inner products 7fc+i.11) we could obtain upper bounds on the loss of orthogonality.

the lack of orthogonality will propagate to tifc+2 and hence to the subsequent Lanczos vectors.358 CHAPTER 5. We may have to repeat the orthogonalization of v against Uk • We will treat these items in turn. Specifically. This means we must replace j3k in the algorithm by \\v\\2. the process is more complicated than a full reorthogonalization step.1. 1. experiments show that the 7^ calculated by this algorithm track the true values rather well. The process of reorthogonalizing v and Uk against t^-i may affect the orthogonality of these two vectors. where q£n(Uk). the row k+l of C depends only on rows kth and k—1. Because we have allowed our basis to deteriorate. There are three differences. the terms of order a£M has little effect on the recursion. we only need the row k+lto decide when to reorthogonalize uk+i. we must perform a reorthogonalization step. if for some j the estimate jk+i. • In the basic recurrence. The fact that we may have to repeat the reorthogonalization of v is a consequence of the fact that we are using a semiorthogonal basis to perform the reorthogonalization. and \\u\\2 .j has a value greater than 7max. driven upward by the choice of sign of r. Hence the lost orthogonality must be restored. The fact that we have to reorthogonalize both Uk and v follows from the three-term recurrence If either Uk or Uk+i fails to be orthogonal to Uk-i. Periodic reorthogonalization The ability to estimate the deviation of the Lanczos vectors from orthogonality enables us to be precise about the test in statement 9 in Algorithm 3. To see what is going on.\\q\\2 = \\q±. • In Algorithm 3.11) for the orthogonality estimates. We must reorthogonalize both uk and v against Uk-i • 2. Hence the norm of the component of . The substitution of upper bounds on these quantities turns the 7^ into an estimate. the 7.j would be exact. This problem is also reflected by the presence of jk. Suppose we are trying to orthogonalize a vector of the form u = iq + rqt.1 we do not test the vector Uk+i but its unnormalized counterpart v. where to working accuracy W is the strictly upper triangular part of Ck = U^Uk [see (3. Consequently. let Uk = QR be the QR factorization of Uk • Recall that we can write R = I+W. Nonetheless. at any time we need only store two rows—the current row and the preceding row.\\2 = 1The orthogonalized vector is u = (I — UU^)u. KRYLOV SEQUENCE METHODS uJUkSk. We must reorthogonalize v against Uk3. Moreover.j-i in the recursion (3. The reason seems to be that as the loss of orthogonality grows.qL£ 1l(Uk}L.4)].

But we only perform a reorthogonalization if 7 is near y^ or larger. and. it may require another orthogonalization. the danger is that cancellation will magnify otherwise negligible rounding errors. It is worth noting that although we perform reorthogonalizations in the Arnoldi method.||2 to decide whether to reorthogonalize again. Owing to limitations of the page size. Algorithms 3. and since R is near the identity matrix. • Note that the algorithm does not renormalize Uk. after the process the elements 7^ and 7jt+i. it is possible for one reorthogonalization to fail to purge the components along H(Uk). we have been forced to separate the outer Lanczos loop from the reorthogonalization. Thus. Thus we can use x/||?. In the semiorthogonal Lanczos method. assumes that the previous Lanczos vectors are available. Now the vector v of reorthogonalization coefficients is x = U^v — RQTUk+i. THE LANCZOS ALGORITHM u lying in 7l(Uk) is the norm of 359 Now ||QTw||2 is the size of the component of it in TZ(Uk)i and (7! is the size of the corresponding component of u.4 pull things together.This is not a problem for Uk. In the Arnoldi method. • The reorthogonalization. the first reorthogonalization step may have failed to eliminate the component. Hence if ||QT^||2 > y^. But since v is not semiorthogonal. since in this case by semiorthogonality 7 is well below y^. our reasons for doing so are different. But they are essentially one algorithm. and the cure is to repeat the orthogonalization.3 and 3. 3. let [in the notation of VolII: Eigensystems . the basis vectors are not orthogonal. we have ||ar||2 = ||QTv||. When should we reorthogonalize? The size of the component of v in 1^(11k) is T ||<2 -y||2. To see this. which is done by the modified Gram-Schmidt algorithm. The details will depend on the size of the problem and the choice of backing store. as the above analysis shows. Since reorthogonalization leaves us with a Uk and Uk+i that are orthogonal to the columns of Uk-i to working accuracy. Here are some comments. where the vectors are orthogonal to working accuracy.SEC.j should be some multiple of the rounding unit— say -\/n€M. The reason is that the reorthogonalization does not significantly change its norm. they have a limited capacity to reduce unwanted components—even when there is no rounding error involved. It follows that the reorthogonalization step will tend to reduce the component by a factor of ^/e^. Reorthogonalization also solves the problem of how to restart the recursion for the jij.

Algorithm 3. KRYLOV SEQUENCE METHODS This algorithm performs Lanczos steps with periodic reorthogonalization.4.3: Periodic orthogonalization I: Outer loop o .360 CHAPTER 5. The algorithm also updates the Rayleigh quotient Hk = Tk + Sk [see (3. The reorthogonalization is done in Algorithm 3. It is assumed the last two rows of the matrix Ck estimating U^+1 Uk+i are available.7)1.

3 Algorithm 3. 3.4: Periodic reorthogonalization II: Orthogonalization step o Vol II: Eigensystems . THELANCZOS ALGORITHM 361 This algorithm replaces statements 12-14 in Algorithm 3.SEC.

and /?. Before the reorthogonalization we have the relation which holds to working accuracy. The results of the reorthogonalization replace Uk with and v with Solving (3. Since \\Uk\\2 = 1. we can obtain Sk by subtracting the a.362 CHAPTER 5. we have 72 + r2 = 1. 72 < €M. • We have chosen to compute the Rayleigh quotient Hk instead of Sk. Hence r = 1 to working accuracy. Instead of allocating storage to retain the zero elements.12)] Uk = 7#-f T<7j_. It is not at all clear that this is worthwhile.Since by definition Sk = Hk — Tk. since the storage is no more than that required for a set of primitive Ritz vectors.13)..17) and (3. Updating Hk We must now derive the formulas for updating Hk after a reorthogonalization. KRYLOV SEQUENCE METHODS (3. most columns of Hk will be zero above the first superdiagonal. which must be computed to test for convergence. Since the reorthogonalized vector is rg_L. By semiorthogonality.18) into (3.15) for Uk and v and substituting the results in (3.and subdiagonals of Hk• Since reorthogonalization is an exceptional event.16) and define Hk by . it has norm one to working accuracy. we get Also If we substitute (3. one can store only the nonzero elements in a packed data structure.from the diagonal and the super.14) and (3.

suppose that (/z. Specifically.We make no assumptions about how this pair was obtained—in most cases (ju. but the following analysis does not require it. Consequently. we can avoid accumulating H^. These adjustments have been implemented in statements 21-23 of Algorithm 3. Ukw) is an approximate eigenpair in the space spanned by Uk.4. But for smaller residuals. Thus a knowledge of the adjusted Rayleigh quotient enables us to compute residual norms without having to multiply by A.SEC. w) will be an eigenpair of Tk. THE LANCZOS ALGORITHM then it is straightforward to verify that 363 which is a Krylov decomposition involving the new vectors iik and v. which by semiorthogonality are bounded by 0(^l/€^). If we do not require very accurate eigenvectors. Hence the difference between the adjusted Rayleigh quotient Hk and theLanczos matrix T* is also 0( y^). we should use Hk. Specifically. until the residual falls below y7^.19) consist of coefficients from the reorthogonalization step. 3. the corrections in (3. Vol II: Eigensystems . Computing residual norms A knowledge of Hk makes the computation of residual norms easy. it does not matter if we use Tk or Hk to estimate its norm. From the relation we have thai where It follows that the residual norm p = \\AUkW — ^UkW\\2 satisfies where bv semiorthoeonalitv Thus estimates p to at least half the number of significant figures carried in the calculation.

17.5e-07 1. Ukw^ ').9e-02 2. whereas the former continues to decrease. Various statistics are summarized in the following table.9e-19 c4 1.2.38.3e-15 6. In this example A is a diagonal matrix of order n = 500 whose first eigenvalue \\ = 1 and whose other eigenvalues are given by the recurrence The periodically reorthogonalized Lanczos algorithm was run on this matrix with a random starting vector and 7max = -^/W~l6/n. KRYLOV SEQUENCE METHODS We conclude this subsection with an example that illustrates the behavior of periodic reorthogonalization in the Lanczos algorithm.9e-15 3. But the two values diverge only after the true residual norm falls below y^.9e-02 2.QTAQ\\2.9e-15 3.9e-02 2. The fifth column shows that the approximate residual norm (3. Example 33.9e-15 c3 1. in which we are concerned with the approximate eigenpairs (/4 .8e-16 8. The third and fourth columns show that the classic estimate for the residual norm deviates from the true residual norm.4 produce a Krylov decomposition whose residual is of the order of the rounding unit.20).22. Orthogonalizations took place for k = 11. c4: The true residual norm.9e-15 5. The second column shows that the corrections in Algorithm 3. cl: ||Tjb . where the pair (^5 . The numbers in the first column illustrate the results of Theorem 3.364 An example CHAPTER 5. .0e-15 c2 3.20) is a reliable estimate of the residual norm. The tridiagonalLanczos matrix Tk is to working accuracy the Rayleigh quotient formed from the Q-factor ofUk.5e-07 1.4e-12 The columns of the table contain the following quantities. This means that in spite of the contamination ofTk by the reorthogonalization process. as noted above. its eigenvalues converge to eigenvalues of A. In particular the latter stagnates at about 10~~12.33. c5: The residual norm estimated by (3.9e-15 3.5e-07 2.4e-12 1.28.5e-15 7. k 10 20 30 40 cl 3. Wg ') is the fifth eigenpairofthe Lanczos matrix Tk.4e-12 1.4e-12 c5 1. where Q is the Q-factor of Uk. c2: The global residual norm c3: The classical estimate \(3ke£w$ '\ for the residual norm.

was the first to come to grips with the need for reorthogonalization. there is no cure except to compute Ritz vectors from the adjusted Rayleigh quotient Hk. who in his thesis [196. Lanczos knew that his method was in some sense iterative. THELANCZOS ALGORITHM 365 Perhaps the most important point of this example is that Ritz vectors computed from the matrix T^ can fail to converge. When this happens. incidentally. periodic reorthogonalization to Grcar [97]. In the Lanczos method.4.Fortunately.1959]. For example. which unifies these methods. 198. Chapter 4. see [195].4. and a good deal of research has been devoted to finding a way around this problem. see [6] and [7]. For more on this topic. inexact shifts are a real possibility. Selective reorthogonalization is due to Parlett and Scott [209].1971] and other papers [195. 1966] and especially Paige. and partial reorthogonalization to Simon [238]. NOTES AND REFERENCES The Lanczos algorithm As was noted in §3. Krylov-spectral restarting follows from our general treatment of Krylov decompositions. Another possibility is to use Leja points or their approximations. 3. For an error analysis of full reorthogonalization. which. Filter polynomials For Arnoldi's method exact shifts are generally used in restarting because the alternatives are cumbersome. 199] laid the foundation for much of the work that was to follow. where the eigenvalues in question lie on the real line. For example. But for some time the method was chiefly regarded as an exact method for tridiagonalizing a symmetric matrix.SEC. the simple form of Hk makes it easy to apply a step of the inverse power method to the eigenvectors of Tk to obtain the primitive Ritz vectors of Hk. it is so viewed in an early paper of Wilkinson [296. 197. Wu and Simon [306] introduce it impromptu under the name of thick restarting. 3. However. Special mention must be made of an elegant paper by Simon [237]. The acceptance of the Lanczos method as an iterative method is due to Kaniel [145. since his vectors could be combined to produce good approximations to eigenvectors before the algorithm ran to completion. Only when Sorensen showed how to restart the Arnoldi process (see §2) did the fully reorthogonalized method become a reasonable option. Vol II: Eigensystems . Full reorthogonalization The need for full reorthogonalization has been generally cited as a defect of the Lanczos method. Semiorthogonal methods The semiorthogonal variants of Lanczos's algorithm all appeared between 1979 and 1984. one could use the zeros of a Chebyshev polynomial to suppress the spectrum on a given interval.

and the reader is referred to the treatment of Simon [238]. The formulas (3. KRYLOV SEQUENCE METHODS Periodic reorthogonalization is the most straightforward of the methods. who also gives empirical grounds for believing that the quantities calculated in Algorithm 3.2. A package LANCZOS is available from Netlib. . \(3k^Wj \ is the residual norm for the Ritz vector [remember that for Ritz vectors r in (3..Although the idea is simple. The fact that semiorthogonality can cause the convergence of Ritz vectors to stagnate does not appear to have been noticed before (although in the context of linear systems the possible untoward effects are hinted at in [237] and [205]). its explicit formulation and computational use are due to Simon [237. Theorem 3. A package LANZ implementing partial reorthogonalization is available from Netlib. but as the analysis in §3. Cullum and Willoughby [46. who shows that it applies to any basis generated by a semiorthogonal method.2 actually track the loss of orthogonality.20).2 is due to Simon [237]... Partial reorthogonalization is based on the observation that one only needs to reorthogonalize v against those vectors U{ for which the bounds computed by Algorithm 3.366 CHAPTER 5. shows it may fail to .48] have shown how to detect the spurious Ritz values. Selective reorthogonalization is based on a celebrated result of Paige [196]. the devil is in the details.. No reorthogonalization It is possible to use the Lanczos algorithm to good effect with no reorthogonalization. Chapter 3.2 are above 7max. and the reader should look for his presence in the above references. The singular value decomposition The singular values and vectors (0-. 238].20) is zero]. selective reorthogonalization consists of saving converged and nearly converged Ritz vectors and orthogonalizing the Lanczos vectors against them. But as is usual with Lanczos methods. k) be the Ritz vectors computed at step k of the Lanczos process and let Wj be the corresponding primitive Ritz vectors. A package LANSO implementing periodic reorthogonalization is available from Netlib. This works well for the dominant singular values. Then there are constants 7^ of order of the rounding unit such that Now by (3. however. u^ Vi) of a matrix X can be calculated by applying the Lanczos algorithm to the cross-product matrix X^X. For another recurrence see [143]. It follows that Uk+i can have large components in 11(1}k) along only Ritz vectors that have almost converged.19) are new. Broadly speaking. the implementation is complicated.11) is implicit in Paige's thesis [196]. The recurrence (3. Let zj (j = 1. What happens is that after a loss of orthogonality previously converged Ritz values may reappear and spurious Ritz values may be generated.

Golub and Underwood [92. we pointed out that a Krylov sequence cannot see more than one copy of a multiple eigenvalue and indicated that one way around the problem is to work with block Krylov sequences. that gets around the failure by allowing the Rayleigh quotient to deviate locally from tridiagonality. But it can fail in other. Software is available in SVDPACK at Netlib. proposed by Parlett. Lewis. If we denote bases for these sequences by Uk and Vk and require that they be biorthognal in the sense that V^Uk = /. 47. The resulting generalization of the Lanczos algorithm is called a block Lanczos algorithm. 275]. For more see [21.104] gives a full treatment of the theory of the bi-Lanczos method. and Liu [210].3. Taylor. An alternative is to apply the method to the matrix whose eigenpairs are However. this reduces to the usual Lanczos method. For more see [79. For the use of shift-and-invert enhancement in a block Lanczos algorithm see the paper of Grimes. and Simon [98]. strictly speaking. The method shares with the Lanzcos method the problem of loss of orthogonality (or. and Ruhe [219]. THE GENERALIZED EIGENVALUE PROBLEM 367 give sufficiently accurate results for the smaller singular values and their vectors. since they are interior eigenvalues. one in A and the other in A1. The basic idea is to transform the generalized eiVol II: Eigensystems . Chapter 4. 105. THE GENERALIZED EIGENVALUE PROBLEM In this section we will consider the use of Krylov methods to solve the generalized eigenvalue problem Ax = \Bx. which is particularly strong on implementation issues. The bi-Lanczos algorithm The original Lanczos algorithm [ 157] was actually a method for general non-Hermitian matrices.280]. The cure is a look-ahead Lanczos recurrence. then the Rayleigh quotient Tk = VJ^AVk is tridiagonal. 4. It turns out that if one orthogonalizes a block Krylov sequence one gets a block tridiagonal Rayleigh quotient. When A is symmetric and u\ — v\.SEC. 4. Gutknecht [103. convergence to the smaller eigenvalues can be expected to be slow. 89]. more serious ways. biorthogonality). The idea was to generate two Krylov sequences. Block Lanczos algorithms In §3. Such algorithms have been proposed by Cullum and Donath [45]. and it follows that the columns of Uk and 14 satisfy three-term recurrence relations.

1 we will discuss these issues. since the usual residual criterion concerns the transformed problem. . and write the problem in the form Then Hence if we set then Because the transformation from A to /x is the same as for the ordinary shift-and-invert method. but that is another story. if B is nonsingular we can take C = B~1A. In §4. 4. In §4. To solve the linear system. this presents no problems. all transformations require that a linear system be solved to compute the products Cu required by Krylov methods. But when factorizations are expensive. In many applications the matrix B is positive definite. (The alternative is to use an iterative method.2 we will show how we can replace orthogonality in the Arnoldi and Lanczos algorithms with B -orthogonality to obtain new algorithms that are specially tailored to this form of the generalized eigenproblem. if we define C properly. TRANSFORMATION AND CONVERGENCE Transformation to an ordinary eigenproblem The first step in applying Krylov sequence methods to a generalized eigenproblem Ax = \Bx is to find an equivalent eigenproblem Cx — fix with the same right eigenvectors. A second issue is how to determine convergence. The products = Cu can then be computed as follows. To see how this is done let K be a shift. Unfortunately. An important issue here is the nature of the transformation.) When the matrix in question is easy to factor. For example. we can transform to an ordinary eigenproblem and get the benefits of shift-and-invert enhancement for the cost of a single factorization. KRYLOV SEQUENCE METHODS genvalue problem into an ordinary eigenvalue problem and then apply some variant of the Arnoldi or Lanczos method. As it turns out. we need to make each one count.1. we need to factor a matrix—B in the above example. not the original. we will call C the generalized shift-and-invert transform.368 CHAPTER 5.

3. The backward error theorem is an analogue of Theorem 1. Residuals and backward error We turn now to the problem of convergence criteria. the product v = Cu can be efficiently computed by the following algorithm. Vol II: Eigensystems . stop when the backward error is sufficiently small—say some multiple of the rounding unit. 4. THE GENERALIZED EIGENVALUE PROBLEM 369 The eigenvalues of the pencil (A. we can use this theorem to determine convergence.1. For more on the use of backward error to determine convergence. Direct verification.XB and let Let || • II denote the Frobemus or 2-norm.x)bean approximate eigenpair of the pencil A .XBx. Since the residual is in general computable. We will then show how to use the residual Cx — Xx to bound the norm of r. see the comments following Theorem 1. If the matrix A — KB can be factored. Thus the generalized shift-and-invert transformation requires only a single factorization.SEC. Theorem 4. so that they will tend to be found first by Krylov methods. • The bounds are optimal in the sense that there are no matrices E and F satisfying for which where at least one of the inequalities is strict. and let where Then and Proof. Namely. Chapter 2. B) that are near K become large. We will proceed in two stages. First we will establish a backward perturbation theorem for the generalized eigenproblem Ax = \Bx in terms of residual r = Ax .3. Chapter 2. Let (X.

ar) converged. We will begin with a brief treatment of the B innerproduct.1 that (A.B)~1B. which is similar to C. Finally. where the normwise relative error in A and B is bounded bv Consequently. B). 4. which may not be available. However. we can declare the pair (A. compute a Cholesky factorization R^R of B and apply the Lanczos algorithm to the symmetric matrix R(A . we can restore symmetry of a sort by working with a different inner product generated by B. where fj. To get such a bound multiply s by A — KB to get Dividing by /z.B.K. Fortunately. POSITIVE DEFINITE B A difficulty with applying the generalized shift-and-invert transformation to the S/PD pencil (A. Unfortunately. What we really want is a bound on the norm of the residual r = Ax . We could.KB)~1R^. We will then introduce the Arnoldi and Lanczos algorithms with respect to the B inner product. if p is less than some tolerance. we may take as an approximate bound on the backward error.fix. we get Hence It follows from Theorem 4. the residuals we calculate are of the form s = Cx . the Arnold! method with a generalized shift-and-in vert transformation C = (A .B)~1B is no longer symmetric and it appears we are forced to use the Arnoldi method rather than the more efficient Lanczos method to solve it. Consequently. in practice we will be looking for eigenvalues near the shift*. This criterion has the drawback that we need to know the norms of A and B. this procedure requires two factorizations—one of B and the other of A — K. of course.XBx.370 Bounding the residual CHAPTER 5. we will indicate some of the things that can happen when B is singular or ill conditioned. x) is anexacteigenpairof pair(A. B) is that the matrix C = (A — K. we will have \\A\\ + \K\\\B\\ = \\A\\ + |A|||5||. Hence. KRYLOV SEQUENCE METHODS When we use. . and simplifying. = (X — «)-1.2. say.

definite.B)~1B is B-symmetric.to the right-hand side of (4. In particular. if A is symmetric. The alternate definition—(x.y)B — ^(#. The latter proof is essentially the same as for the 2-norm and proceeds via the R-f'aiir'l'iv inrniiialitv • There is a curious inversion of the order of x and y as one passes from the left.#)B.2).2/). the order matters. In the complex case. This makes no difference when we are working over the field of real numbers. Then the B INNER PRODUCT is defined hv The 5-NORM is A matrix C is ^-SYMMETRIC if for all x and y. however. THE GENERALIZED EIGENVALUE PROBLEM The B inner product We begin with some definitions. VolII: Eigensystems .SEC.B = n(xjy)Bynd(xjfj. • The reader should prove that the B inner product is actually an innerproduct—i. 4. then the generalized shift-and-invert transform (A K.e. y)s — fi(x. • It is easily verified that: The matrix C is B symmetric if and only ifBC is symmetric.. bilinear function—and that the -B-norm is actually a norm. 371 Definition 4.2. y]B-> which is not standard. Let B be a positive definite matrix. that it is a symmetric. Here are three comments on this definition. y)s = x^By — would give (//or. if we define then(/iz. which is how functional analysts expect their inner products to behave. Specifically.

. v = v — hj*Uj 6. KRYLOV SEQUENCE METHODS It is natural to say that two vectors u and v are B-orthogonal if (w. Unless we are careful. 1. forj = 1 to k 2. 5. consider the following B variant of the modified Gram-Schmidt algorithm.. If U is square it is Borthozonal.. However. B-Orthogonalization In what follows we will need to J9-orthogonalize a vector v against a set u\. we can perform unnecessary multiplications. for j = 1 to k Wj = B*UJ hjj — wj*v j v — v — hj*Uj end for j This program requires k multiplications by B. The following easily proved result will be important later. 1.372 Orthogonality CHAPTER 5. . 3. 4. for j — 1 to k hj = (ujJv)B v = v — hj*Uj end for j When we recast this algorithm in matrix terms we get the following. On the other hand. V)B = 0. Thus postmultiplication by an orthogonal matrix (but not a ^-orthogonal matrix) preserves 5-orthonormality. 1.v)B 3. hJ = (uj. To see this. 2. for j = 1 to k 5. notice that the the notation (z. This can be done by a variant of the Gram-Schmidt algorithm in which the B inner product replaces the ordinary inner product. end for j 4. end for j . We will say that U is B-orthonormal if U^BU — I. y)s conceals a multiplication of a vector by B. Uk of j5-orthonormal vectors. 2. 3. All of the usual properties and algorithms associated with orthogonality carry over to Borthogonality. consider the B variant of the classical Gram-Schmidt algorithm. 4.

THE GENERALIZED EIGENVALUE PROBLEM In matrix terms this algorithm becomes 373 This version requires only one multiplication by B.6) Vol Hi Eigensystems . The B-Arnoldi method The J?-Arnoldi method generates an Arnoldi decomposition of the form where C is the generalized shift-and-invert transform and Uk is #-orthonormal.5). Because reorthogonalization is always used in the Arnoldi process.Specifically. then from (4. It follows on premultiplication by U^B that EkM = isw.6) where u?k is the last component of w. we have from (4. we can use the classical Gram-Schmidt algorithm. where z = UkW. Hence it is desirable to use the classical Gram-Schmidt algorithm whenever possible. • Ritz pairs and convergence. There are two ways around this difficulty. if Hw = vw> then with z — Uw. to save multiplications by B. Here are some details. so that we can find the primitive Ritz vector v as an eigenvector of HkOn the other hand. • Orthogonalization. Ritz pairs may be computed from the Rayleigh quotient Hk = U^BCUk. Now it is well known that the modified Gram-Schmidt algorithm is numerically superior to the classical Gram-Schmidt algorithm. if Cz = vz. The procedure amounts to substituting f?-orthogonalizationfor ordinary orthogonalization. In large problems multiplications by B may dominate the computation.SEC. as suggested in (4. 4.

) Thus we are free to perform orthogonal similarities on the Rayleigh quotient H^. By (4. the J?-Arnoldi method for the generalized eigenproblem with shiftand-invert enhancement is essentially a minor modification of the standard Arnoldi method.7) contains || wjt+i ||2.#-Arnoldi method.374 CHAPTER 5.0ueT to while preserving the 5-orthonormality of UQ. u\) by setting #21 and 7^+1. we can transform the J?-Arnoldi decomposition CU = UH .1 to zero. The error introduced into the decomposition by this deflation is The problem is that we must estimate ||(£/2 Wjt+i)||2. It should be noted that when we are deflating Ritz values the vector #21 is zero.In the ordinary Arnoldi method. In the . so that all we need to calculate is ||Uk+i \\2> To summarize. (In our terminology the transformed decomposition is a #-orthonormal Krylov decomposition. we can determine convergence of Ritz pairs directly from the decomposition and the primitive Ritz vector without having to compute the residual vector. Deflation is somewhat more complicated than restarting. KRYLOV SEQUENCE METHODS Hence the residual 2-norm of the Ritz pair (*/. which is a rather large matrix. Note that (4. however.3). • Restarting. Let Q be orthogonal. the main difference being the need to compute the matrix-vector products of the form Bv. as usual. ||WAT+I||B is one. . this quantity is one. However. This means that both implicit and Krylov-Schur restarting can be applied to the B-Arnoldi method without change. To see what is involved. suppose we have transformed our decomposition into the form Then we can deflate the pair (*/. Ukw] is Thus. in the restarted Arnoldi method this matrix will be in main memory and we can use the bound to compute a cheap bound. and ||wjt+i||2 will generally be different from one. • Deflation.

it may be more efficient to process each block by the classical GramSchmidt algorithm as in (4. When it is necessary to reorthogonalize. We first observe that the null space of B and C are the same. 4. and since H is Hessenberg. as suggested in (4. Thus we cannot implement the classical Gram-Schmidt algorithm in the B -inner product. note that if z is a null vector of B then # is a null vector of C. y) whenever z G N(B) — and this fact presents complications for our algorithms. the best we can hope for is a block modified Gram-Schmidt algorithm. although they now estimate deviation from B -orthogonality. the columns of Uk must be brought in from backing store in blocks that fit in main memory. where x € T^(C) and y € J\f(B). we need a little geometry. so that x 6 N(B).nB)~lBx = 0. Thus the 5-Arnoldi method becomes a 5-Lanczos method. and the J9-norm a seminorm.K. In this case the B -inner product is said to be a semi-inner product.2 are unchanged. (This decomposition of z into x + y is not an orthogonal decomposition. Once again the B version of the method can be obtained by obvious modifications of the original method. to avoid forming matrix-vector products involving B. We will now make the simplifying assumption that so that any vector z can be uniquely written in the form z — x+y. (x + z.5). and rank(C) = n . Just as the restarted B-Arnoldi method is a slightly modified version of the usual restarted Arnoldi method. On the other hand if a: is a null vector of C.SEC. BC is symmetric. On the other hand if the block size is large or multiplications by B are cheap. the algorithms described above can be formally applied when B is positive semidefinite. For if there is an x such that Vol II: Eigensystems .) The assumption is always true when C is nondefective. To see this. then (A . i. we get Bx — 0. The B -Lanczos method with periodic reorthogonalization In §3. Multiplying by A . rank(C) + null(C) = n.4). To see what is going on. THE GENERALIZED EIGENVALUE PROBLEM The restarted 5-Lanczos method 375 We have already noted that if A is symmetric then C = (A—KB) ~ 1 B is B-symmetric. The result now follows from the fact that for any matrix C of order n. It follows that the Rayleigh quotient H = U^BCU is symmetric.B)~l B does not involve the inverse of B.B. Semidefinite B Since the generalized shift-and-invert transform C = (A — K.e.null(B). If the block size is small. The B semi-inner products is blind to vectors lying in the null space of B — that is.. The estimates of the deterioration of orthogonality in Algorithm 3.2). so the restarted J9-Lanczos method is a modification of the ordinary restarted Lanczos method (see §3. y) = (ar.3 we described the Lanczos method with periodic reorthogonalization. it must be tridiagonal. it makes sense to store the columns of BUk.

if we are going to use a Krylov method to calculate eigenvectors of B we must insure that the Krylov vectors lie in 7£(C). be. In practice.376 CHAPTER 5. There are two potential dangers here. « * lieinT^(C). .Thent^ £ 7£(C). . Specifically. . To analyze these possibilities. KRYLOV SEQUENCE METHODS Cx is nonzero and lies in J\f(B). and such magnifications are empirically observed to happen. so that the primitive Ritz vectors are calculated inaccurately. something that can happen only if C has a nontrivial Jordan form. other than the null vectors of B. 1.. Lett. Uk and Cuk. Consequently. Thus if we start with a vector in 7£(C). Since Uk+i is a linear combination of ^i.. . Since the B -inner product cannot see vectors in N(B\ it is possible for the Lanczos or Arnoldi process to magnify these small errors. Even if a primitive Ritz vector w is calculated accurately. at each stage rounding error will introduce small errors lying in N(B). let us suppose that we have the J?-Arnoldi decomposition where Fk is an error matrix of order of the rounding unit that accounts for rounding errors made during the Arnoldi process.. from (4. it must also lie in 7£(<7). The null-space error in Uk may somehow effect the Rayleigh quotient Hk. Mathematically. say. Now suppose that wi. then C2x — 0. this can be done as follows. Write where the columns of Uk are in T^(C) and the columns of Vk are in M(B). . Then it follows that and The contamination of the Rayleigh quotient is not a problem. lie in 7£((7).11) that Hk is essentially the Rayleigh quotient with respect to a set of vectors uncontaminated by contributions from M(B). 2. a random vector and let HI = Cv/\\Cv\\B.10) we have It then follows from (4. Now the eigenvectors of C. our Krylov subspaces lie in 7£(C). the null-space error may contaminate the Ritz vector UkW. .

Form > n/2. . . Hence.95. the null-space error will go to zero. Example 4.SEC. Now the left-hand side of this relation has no components in M(B).95k~l). To see this let (V. LetDk = diag(l. Hence the null-space error in UkW is small in proportion as the cmantitv is small. Uk is the last component of w. Vol II: Eigensystems . But |/3^^|||w||2 is the residual norm of the Ritz pair (i/.9) it follows that where. In it P denotes the projection ontoJ\f(B) corresponding to the decomposition R n = T^(C} + Af(B) [see (4. The following table summarizes the results. . From (4. Each multiplication by C was followed by the addition of a random perturbation on the order of the rounding unit. w) be a primitive Ritz pair. let and In this example we apply the B-Arnold! method to C with n — 500 and m = 400.. Ukw). as the Ritz pair converges.. The following example illustrates this phenomenon.3. as usual. The starting vector u\ is of the form Cr/\\Cr\\s.. 4.8) and the following discussion]. where r is a random vector. THE GENERALIZED EIGENVALUE PROBLEM 377 It turns out that the errors Vk do not contaminate the Ritz vectors either—at least asymptotically.

l:k—l] is also unreduced Hessenberg. it also holds for the #-Arnoldi process. Theorem 4. Define Uk — Uk+iQ. Let AUk = Uk+\ Hk be an Amoldi decomposition. even when B is semidefinite. Moreover. So far all is according to theory. Initially the component in the null space increases. The fourth column contains the product \ftk^k\> which along with \\Uk+i\\2 estimates the residual norm. LetAUk-i = UkHk be the decomposition resulting from one step of zero-shift implicit restarting. Other examples of rapid growth of null-space error may be found in the literature. The third column contains the norm of the component in N(B) of the Ritz vector corresponding to the fifth eigenvalue ofC.15) is the decomposition that results from one step of zero-shifted implicit restarting. If Hk — QR is the QR factorization of Hk. note that the null-space error (column 2) ultimately increases far above one. then (up to column scaling) Uk = UkQ.weget Since R is nonsingular and Q [1:fc. the first column of Uk is a multiple ofCui. The chief difference is that Uk+i is no longer unique.14) by g[l:Ar. H k-i = RQ[l:k. Thus (4. We will return to the remaining columns later. Because Hk is unreduced Hessenberg. from (4.15) is an unreduced Arnoldi decomposition.378 CHAPTER 5. Hence if we multiply equation (4. .4. Hence by the uniqueness of the unreduced Arnoldi decomposition. The following suggests a way of purging the unwanted components from the entire sequence. l:k—l]. It is based on the following characterization of the effects of a step of implicit restarting with zero shift.l:A?-l]. Instead. where Hk is unreduced. This puts a limit on how far we can extend an Arnoldi or Lanczos sequence. However. since in finite precision arithmetic the null space error will ultimately overwhelm any useful information in the Krylov vectors. Proof.14) the first column of Uk satisfies Since m ^ 0. which is seen to grow steadily as k increases. Qk is unreduced Hessenberg and Rk is nonsingular. we can add to it any matrix V in N(B) such thatVfh. • Although we have proved this theorem for the ordinary Arnoldi process. but as the residual norm decreases so does the component. (4. = 0. KRYLOV SEQUENCE METHODS The second column contains the norm of the component ofUk+i in N(B). Then By definition Uk-i — UkQ[l'k.1 :k—IJ is an unreduced Hessenberg matrix.

(He also says that the algorithm is not new. so is the right-hand side—i.ft"1 not be large. The chief problem is that the null-space error in the Arnoldi vectors can increase until it overwhelms the column-space information. at present we lack a reliable recursion for the null-space error. The reason is that the null-space error in Uk (column 2) has grown so large that it is destroying information about the component of Uk in 7l(C}. 319]. 4." The transform (A-KB)~1B appears implicitly in a book by Parlett [206. The cure is to perform periodic purging in the same style as periodic reorthogonalization. From (4. which. They called their transformation "the spectral transformation.. Unfortunately. thereafter it grows. Since in practice the Arnoldi method must be used with restarts. where B = R^R is the Cholesky factorization of B. He also gives a version of the Lanczos algorithm for B~l A. requires a factorization of B." Here we have preferred the more informative "shift-and-invert transformation. With the Lanczos algorithms. one step of zero-shifted implicit restarting purges the B-Arnoldi vectors of null-space error. The #-Arnoldi and Lanczos methods with semidefinite B are imperfectly understood at this time. This approach has the difficulty that it requires two factorizations. 4. For the first 50 iterations it remains satisfactorily small. which is needed to know when to purge. the authors assume that one will use several shifts. B].9) we have 379 Since the left-hand side of this equation is in 1Z(C). Ericsson and Ruhe [72] proposed applying the Lanczos algorithm to the matrix R(A — KB)~1R^. The fifth column is the null-space error in Uk+iQ.e.3. so that the the initial factorization of B can be prorated against the several factorizations of A — KB). The last two columns in (4. THE GENERALIZED EIGENVALUE PROBLEM The application of this theorem is simple. in which the Lanczos vectors are B -orthogonal.SEC. Scott [236] applied Parlett's algorithm to the pencil [B(A. However. citing Vol II: Eigensystems . since in the presence of rounding error it will magnify the residual error Fk in the decomposition. one of B and one of A — KB (however. who rejects it. although some experimentation may be required to determine the longest safe restarting interval. including a zero shift among the restart shifts may suffice to keep the errors under control. It helps that we can tell when the null-space errors are out of hand by the sudden growth of the Arnoldi vectors.13) illustrate the effectiveness of this method of purging null-space error. For this to happen it is essential that . however. NOTES AND REFERENCES The generalized shift-and-invert transformation and the Lanczos algorithm The idea of using a shift-and-invert strategy for large generalized eigenvalue problems arose in connection with the Lanczos process. p. showing that it required no factorization of B.KB)~1B. the problem of purification is analogous to that of reorthogonalization—it requires access to the entire set of Lanczos vectors.

380 CHAPTER 5. but for a different purpose. and Jensen [184].) When reduced to a standard eigenvalue problem by multiplying by B~l. He credits Parlett with recognizing that the algorithm can be regarded as the Lanczos algorithm in the B-irmer product. Finally.12) can be used to purge a Ritz vector of null-space error.e. Residuals and backward error Theorem 4. They introduce the elegant method of using zero-shift implicit restarting to purge the null-space error. /). His solution is a double purging of the Ritz vectors based on the formula (4. the semidefinite B-Arnoldi algorithm. Ericsson [71] considers these problems in detail for the Lanczos algorithm.12). They also point out that the formula (4. in which C is what we have been calling the generalized shift-and-invert transform. Ericsson.282]. Parlett. Semidefinite B The basic paper for the semidefinite .8) that 7l(C) and Af(B) have only a trivial intersection is violated in practical problems. the authors give a rounding-error analysis of their procedures. He also considers the case of semidefinite B. They point out that the null-space error in the Lanczos vectors can grow and give an example in which it increases 19 orders of magnitude in 40 iterations. . (The same formula was used earlier by Ericsson and Ruhe [72].#-Lanczos algorithm is by Nour-Omid. The hypothesis (4. KRYLOV SEQUENCE METHODS two technical reports [183. Their example of the need for purification increases the null-space error by 13 orders of magnitude in 11 iterations. which they combine with the purification of Nour-Omid et al..1 follows from generalizations of a theorem of Rigal and Caches [216] for the backward error of linear systems and is due to Fraysse and Toumazou [77] and Higham and Higham [112]. this pencil becomes (C. Meerbergen and Spence [176] treat the nonsymmetric case—i. to handle the problem of overlapping null and column spaces.) The fact that the purging becomes unnecessary as a Ritz vector converges appears to have been overlooked.

we get the Jacobi-Davidson algorithm. its implementation is comparatively simple.6 ALTERNATIVES Although the most widely used algorithms for large eigenproblems are based on Krylov sequences. then the first term in the expansion dominates. But it is very reliable. converges to x j. When we combine Newton's method with Rayleigh-Ritz refinement. But with some care they can be adapted to large sparse eigenprobems. these methods appear to be mostly suitable for dense problems. When | A21 is near | AI |.1. if we perform the power method on a second vector u'Q to get the expansion 381 . It is an older method and is potentially slow. In this chapter we will consider two alternatives: subspace iteration and methods based on Newton's method—particularly the Jacobi-Davidson method. and it is easy to use. Chapter 2) we assumed that A was diagonalizable and expanded in terms of the eigenvectors Xj of A. there are other approaches. combined with shift-and-invert enhancement or Chebyshev acceleration. with which this chapter and volume conclude. the converges of the power method will be slow. and Aku0j suitably scaled. We next turn to methods based on Newton's method for solving systems of nonlinear equations. 1. it sometimes wins the race. SUBSPACE ITERATION In motivating the power method with starting vector UQ (§ 1. However. If and 71 / 0. Since Newton's method requires the solution of a linear system. Moreover. Subspace iteration is a block generalization of the power method.

it is more natural to use a variant of the Rayleigh-Ritz method that computes the dominant Schur vectors of A. Chapter 4] where LI is of order I. of course. However. ALTERNATIVES we can eliminate the effect of A2 by forming the linear combination whose convergence depends of the ratio |As/Ai|. In §1. whose union is A(A). We wish to show that as k increases the space K(AkUo) contains increasingly accurate approximation to the dominant eigenspaces of A and also to determine the rates of convergence. the effect is to speed up the convergence. If this ratio is less than |A 2 /Ai |.1 we will develop the basic theory of the algorithm. Finally. Hence we can cure the problem of dependency by maintaining an orthonormal basis for 7£( AkUo).1. This cure leads to an interesting and useful connection between subspace iteration and the QR algorithm.382 CHAPTER 6. But in fact we are only interested in the space spanned by AkUo. and L$ is of order n—m. We could. in § 1. It turns out that power subspace 7£( A*'£/o) generally contains a more accurate approximation to x\ than could be obtained by the power method. First.3 we will treat the symmetric case and in particular show how to use Chebyshev polynomials to accelerate convergence. Subspace iteration is a generalization of this procedure. The second problem is how to extract approximations to eigenvectors from the space. However. use a Rayleigh-Ritz procedure. Beginning with an nxm matrix UQ. Moreover. the relation to the QR algorithm. In §1.7). THEORY As suggested in the introduction to this section. the theory of subspace iteration divides into three parts: convergences of the subpaces. since they all tend toward the dominant eigenspaces of A. the columns of AkUo will tend toward linear dependence. the power subspace contains approximations to other dominant eigenspaces. We will suppose that A has the spectral representation [see (1. one considers the powers AkUo. 1. L^ is of order m—l. there are two problems that must be addressed before we can derive an algorithm based on the above observations. Convergence Let UQ be an nxm matrix.2 we will show how to implement subspace interation. are assumed to be ordered so that . The eigenvalues of these matrices. We will treat each of these in turn. since we are working with orthonormal vectors. and the extraction of Schur vectors.

so that the column space of AkUo will approach the eigenspace TZ(Xi). (1. L^ is oforder m-1. Chapter 1. Let the matrix A of order n have the spectral representation where LI is of order I. Multiplying (1.3) by Ah and recalling that AXi = XiLi. 1. we have that for any e > 0 Thus we see that the column space of AkUo contains an approximation to the eigenspace IZ(Xi) that converges essentially as |A m +i/A/| fe . assume that C\2 = (Cj1 C^)H is nonsingular and let where D\ has t columns. Moreover. We summarize these results in the following theorem. and their eigenvalues are ordered so that Vol II: Eigensystems . To make this more precise.2) suggests that the first term in the expression will dominate.1). L% is of order n-m. then By construction K(V} ') C K(AkUQ). Theorem 1.9. SUBSPACE ITERATION 383 where d = Y^UQ. by Theorem 2. we get The bottommost expression in this equation is analogous to (1. Then If we partition Vk = (V^ V2(A:)).1. in which we recapitulate a bit.SEC. Moreover.

Moreover. where S is chosen to restore independence. we are not interested in the matrix AkUo—only the subspace spanned by it. the convergence of these approximations deteriorates as t increases. For example. Chapter 4]. if |Ai| > \\2\. the columns will all tend to lie along the dominant eigenvector of A. we can reduce the size of UQ to the point where the hypothesis is satisfied (at the very worst we might end up with m = €). which is in general smaller than | A 2 /A!|. Consequently.4). implies that we can take e = 0 in (1.Let Ok be the largest canonical angle between K(AkUo) and K(Xi). then for any e > 0. A natural choice is to make AkUoR orthogonal. • The theorem shows that subspace iteration is potentially much faster than the power method. • The theorem contains no hypothesis that A is nondefective. Fortunately. However. we can replace it with AkUoS. which Krylov sequence methods do not handle well [see the discussion following (3. This observation is important in applications. There are four comments to be made about this theorem. then Theorem 2. If the matrix (C^ C^)H is nonsingular.j-i/A^|. If. • We are free to position t at any point in the spectrum of A where there is a gap in the magnitudes of the eigenvalues. the eigenvalues A^ and A m+ i are nondefective. But if that condition fails. so that the theorem is valid for the original UQ. ALTERNATIVES where d = Y) H E/b. This suggests that subspace iteration can deal with defectiveness.9.1 then applies. the magnitudes of the first ra eigenvalues of A are distinct. Chapter 1. for example. Forif^ = 1. If. then the convergence depends essentially on the ratio |A m +i/Ai|. adding back the missing columns of UQ can only improve the accuracy of the approximation to 7£(XL). since the convergence ratio is essentially |Am.14). • The theorem would appear to depend on choosing m so that | Am | is strictly greater than | A m+ i |. the columns space of AkUo will contain approximations to each of the eigenspaces corresponding to the first £ eigenvalues of A. when the columns of AkUo begin to deteriorate.384 Let UQ G C n x m be given and write CHAPTER 6. since we will not know the eigenvalues of A when we choose m. The QR connection One difficulty with the naive version of subspace iteration is that the columns of Ak UQ tend to become increasingly dependent. however. Theorem 1. If .

1. When the hypothesis (1. QkRk is me QR factorization of AkUo.1.2. Now recall (Theorem 2. so that some eigenvalues have equal magnitudes. The importance of this theorem is that the matrices Uk generated by (1.2.SEC. The diagonal elements of R converge to the dominant eigenvalues of A. Chapter 2) that if Ak = QkRk is the QR decomposition of Ak then Qk and Rk are the accumulated Q. and let X\ = QR be the QR factorization ofXi. the convergence of the jth column of Qk is governed by the largest of the ratios \Xj/Xj_i \ and \Xj+i/Xj\ (but only the latter when j — 1). But we know that H(Qk) must contain an approximation to Xj that is converging as \Xj/Xm+i\k. Partition X = (Xi X^}. As in Theorem 2. can be adapted to establish the following result. Chapter 2. we get the following algorithm. Let U0eCnxm be orthonormal and let where Suppose that X 1U0 has anLU decomposition LT. Nonetheless. Vol II: Eigensystems .2.9). In fact. SUBSPACE ITERATION 385 we orthogonalize by computing a QR factorization after each multiplication by A. This suggests that subspace iteration is in some way connected with the QR algorithm. the space spanned by the corresponding columns of Qk converge to the corresponding dominant eigenspace. where there is a gap in the magnitudes of the eigenvalues. Chapter 1]. IfAkUo has the QR factorization UkRk. then there are diagonal matrices Dk with \Dk\ = I such that UkDk —» Q. the proof of Theorem 2. where U is unit lower trapezoidal and T is upper triangular.and R-factors from the unshifted QR algorithm. Theorem 1.6) is not satisfied. We now turn to the problem of how to extract an approximation that converges at this faster rate.5) converge to the Schur vectors spanning the dominant eigenspaces [see the comments at (1. Chapter 2. An easy induction shows that if Rk = RI • • • Rk that is. the corresponding columns of Qk will not settle down.

Consequently. Finally. But since we are only concerned with the subspace spanned by U. where the eigenvalues of B are arranged in descending order of magnitude Compute the QR factorization V — Qk+iRk+i The motivation for this procedure is the following. V = VW 5. The following procedure. so that the dominant eigenspace Xj is well define the approximations 1^[Qk('-. we can obtain the iterate following the SRR refinement by orthogonalizing VW. IMPLEMENTATION In this subsection we will sketch at an intermediate level what is needed to produce an implementation of subspace iteration. The reader should keep in mind that actual working code requires many ad hoc decisions. ALTERNATIVES Given the sequence Qk.386 Schur-Rayleigh-Ritz refinement CHAPTER 6. see the notes and references. accomplishes this.2. The power step must be performed at each iteration. B = Q*V 3. l:j)l converge to Xj essentially as |Am/Aj|*.5) to generate approximations to eigenvectors of A by the standard Rayleigh-Ritz method. The full analysis of this method is complicated and we will not give it here. V = AQk 2. However. But the columns of Wj are just the first j Schur vectors of B. The innermost operation is a power step in which we calculate AU (for notational simplicity we drop the iteration subscript). For further details. we do not need to orthogonalize until there . at the outer level is the performance of an SRR step. 1. we can use (1. which is called the Schur-Rayleigh-Ritz (SRR) method. which we do not treat here. The basic result is that if | Aj | > | \j+i \.Now if we wish to use the Rayleigh-Ritz method to compute an orthonormal approximation to the dominant eigenspace Xj. The next level is the orthogonalization of U. 4. What we really want to do is to rearrange the orthogonal matrix Qk so that its columns reflect the fast convergence of the dominant subspaces. the matrix Qk W contains Ritz bases for all the dominant eigenspaces^j. 1. Thus the SRR procedure gives properly converging approximations to all the dominant eigenspaces at once. one of the strengths of subspace iteration is that it gives meaningful results when the matrix A is defective or has nearly degenerate eigenvectors. we would compute an orthonormal basis Wj for the corresponding invariant subspace of B and for QkWj. Since AQkW = VW. The matrix B is the Rayleigh quotient B = Q^AQk. Compute the Schur decomposition B = WTWn. A general algorithm The theory of the last section shows that there are three nested operations that we must perform.

Perform an SRR step 9. Orthogonalize the columns of U 1.1: Subspace iteration is a danger of a nontrivial loss of orthogonality. the SRR step is needed only when we expect some of the columns of U to have converged. end while 8. where m > nv. Because it is so general. Thus we may be able to perform several power steps before we need to Orthogonalize U. the algorithm leaves several important questions unanswered: 1. end while Algorithm 1. so that where T is upper triangular with its eigenvalues arranged in descending order of magnitude. 3. How do we test for convergence of Schur vectors? How do we deflate converged Schur vectors? How can we tell when to perform an SRR step? How can we tell when to Orthogonalize? We will treat each of these problems in turn. U = A*U 5. 2. end while 6. suppose that U is composed of the dominant Schur vectors of A. These observations are summarized in Algorithm 1. The input to the program is an n x m orthonormal starting matrix U. To motivate the test. while (convergence is not expected) 3.SEC. while (the number of converged vectors is less than nsv) 2. As usual we will test a residual. while (the columns of U are sufficiently independent) 4.1. the jth column of U satisfies Vol II: Eigensystems . SUBSPACE ITERATION 387 This algorithm is an outline of a general subspace iteration for the first nsv dominant Schur vectors of a general matrix A. 1. 1. Test for convergence and deflate 10. Similarly. Convergence Convergence is tested immediately after an SRR step. 4. In particular.

eM) and 7 is a scaling factor. we do not multiply U\ by A. This criterion is backward stable. Let where T( consists of the first I columns of T. Deflation Because the columns of U converge in order. the first I columns of U are exact Schur vectors of A + E. Chapter 5]. the magnitude of the jth eigenvalue in the Schur decomposition [cf. Because of the triangularity of TH.. Specifically. the matrix W of Schur vectors of B has the form and the new U has the form . Chapter 4.. Our converge theory says that the Schur vectors corresponding to the larger eigenvalues will converge more swiftly than those corresponding to smaller eigenvalues.e.21). deflation simply amounts to freezing the converged columns of U. we calculate the Rayleigh quotient Here T\\ is the triangular part of the Schur decomposition that corresponds to U\. For this reason. Finally. where U\ has i columns.. ALTERNATIVES where tj is the jth column of T.8. However. Then when we multiply by A we form the matrix ({/i. there is a matrix E with \\E\\2 = \\R\\z such that (A + E)U = UTi.. if A has been obtained by shift-andinvert enhancement.e... and stop with the first one to fail the test. and a better choice might be 7 = \TJJ\ —i. AU%)—i. Suppose that I vectors have converged. and SRR steps. and partition U — (U\ #2). then such an estimate might be too large. Thus we declare that the jth column of U has converged if where tol is a convergence criterion (e. to perform an SRR step. Then by Theorem 2. (2. suppose the first i columns of U have converged. At each orthogonalization step we orthogonalize the current U% first against U\ and then against itself.g. A natural choice of 7 is an estimate of some norm of A. In other words. we should test convergence of the Schur vectors in the order j = 1. This freezing results in significant savings in the power.388 CHAPTER 6.2. orthogonalization.

Specifically. Precautions must be taken against it wildly overpredicting the value of /. where \\E\\% < \\X\\eM—i. Let us focus on a particular Schur vector. the columns of the matrices U. To see what that risk is. The cure is to orthogonalize the columns of A1U for some /. especially in the early stages when linear convergence has not set in.2. so that we can estimate a by We can now use cr to predict when the residual norm will converge. we wish to determine / so that Hence It follows from (1. it will pay to extrapolate the average of their residual norms. where E is comparable to the error in rounding X on a machine whose rounding unit is eM. 1. so that an unnecessary SRR step is performed just before convergence. By Theorem 2. On the other hand. then pk = vk~Jpj.. and suppose at step j it has a residual norm pj and at step k > j. When to orthogonalize As we have mentioned. SUBSPACE ITERATION 389 When to perform an SRR step The theory of the last section suggests that the residuals for the approximate Schur vectors will converge linearly to zero. then HQj^lh is the sine of the largest canonical angle 9 between H ( X ) and 1l(X + E).. Finally. if we take / too large we risk losing information about the subspace spanned by A1 U... then we perform unnecessary work.A2U. If we take / too small. if A has nearly equimodular eigenvalues. VolII: Eigensystems . and consider the effects on H(X) of the perturbation X + E.SEC. Chapter 4. if Q is an orthonormal basis for H(X + E) and Q j. become increasingly dependent. If the residual norms are converging linearly with constant cr.8) is a crude extrapolation and must be used with some care. For more see the notes and references.7) that The formula (1. so that one SRR step detects the convergence of the entire cluster.AU. One must also take care that the formula does not slightly underpredict /. is an orthonormal basis for 7£(X) 1 . We can use this fact to make a crude prediction of when we expect a Schur vector to converge. let X = A1U. is has a residual norm pk.e.

which by hypothesis is large. Hence if Q±_ is any orthonormal basis for 7£(X). Since £/is orthonormal. we should orthogonalize after at most / steps.10) will be zero. we will almost certainly have K. To apply this result to subspace iteration. note that once the process starts converging.(R). Hence. let CHAPTER 6. Thus if we decide that no deterioration greater than 10* is to be tolerated. Thus / in (1. ALTERNATIVES be the QR factorization of X. where Amax(T) and Xmin(T) denote the largest and smallest eigenvalues of T in magnitude. Then has orthonormal columns.(Tl) < K(T)1.390 To find an orthonormal basis for 1Z(X + E). Hence as convergence sets in. the more all the columns . then Thus Hence and the degradation in the subspace of X is proportional to K. The reason is that for any matrix T we have «(T) < IAmax(r)/Amin(T)|. the sensitivity of A1 U is proportional to K. The common sense of this is that the larger the value of |A!/A 2 |.(T) > | Ai/A2|. where / satisfies An implication of this analysis is that if |Ai| > |A2| —as it might be when we use shift-and-invert enhancement with a very accurate shift—then we will have to reorthogonalize at each step. we will have where T is the Schur form from the last SRR step.

For example. so that we can use the more efficient symmetric eigenvalue routines to perform the SRR step. we must continue to work with the full T. Economization of storage In economizing storage it is important to appreciate the tradeoffs. since we have used several norm inequalities in deriving it. AI still makes its contribution to nonorthogonality.2. Chapter 5. we can introduce an auxiliary vector v of order n and write Algorithm (1.10) with care.1 we wrote U = AU. This can be done in two ways. so that there is no need of a postprocessing step to compute the eigenvectors. First. we will want to work with the real Schur form of the Rayleigh quotient (Theorem 3. There are two economies. The reason is that in the powering step.SEC. we can speed up convergence by replacing the power step with a Chebyshev iteration. Chapter 2). For example. that are less obvious. The tradeoff is that we must perform m separate multiplications by the matrix Vol II: Eigensystems .12) obviously uses less storage than (1. SYMMETRIC MATRICES When the matrix A is symmetric. Most of the above discussion goes through mutatis mutandis. 1. not the trailing principal submatrix of T associated with the remaining subspaces. whether or not its eigenvector has been deflated. First we can make do with a single array U instead of the two arrays required by the nonsymmetric algorithm. In statement 4 of Algorithm 1. Alternatively. however. l. It should be stressed that after deflation. The only problem is the ordering of the blocks of the real Schur form. SUBSPACE ITERATION 391 of AU will be contaminated by components of a?i. in practice the technique has shown itself to be effective.1.3. However. More important. we can obtain some obvious economies. Once again we must treat the crude extrapolation (1. Real matrices If A is real. we can use an auxiliary matrix V and proceed as follows.11) — significantly less if n is large. which has been treated in §2. Moreover. once it settles down it is likely to be pessimistic. the Rayleigh quotient is symmetric. indicating that the product A U is to overwrite the contents of U. which we must eliminate by orthogonalization. the Schur vectors are now eigenvectors.

we have no choice when A is non-Hermitian: we must use an extra array V. but we will perform many orthogonalization steps. But if A is Hermitian. . In the algorithm (1.392 CHAPTER 6. and a price must be paid for accessing that structure. so that we may assume that |A m +i| = 1 and hence Ic^A^-i-i)! = 1.8). which brings the tradeoffs full circle. Thus there is a true choice in the non-Hermitian case. but we will need few orthogonalization steps. The analogous result is true for the other eigenvalues. it will generally be represented by some data structure. however.12) that price must be paid ra times. assume that we have chosen s = |Am+1|. we can use the power steps to accelerate convergence. we have Thus Pk(A)U will contain approximations to the dominant eigenvector that converge at least as fast as e~k cosh ' Al I converges to zero. Conversely. Specifically. In this case. on the other hand. A natural choice is the scaled Chebyshev polynomial where |A m+ i| < s < |A m |. This can be done without an auxiliary array as follows. then convergence will be fast. How that choice will be made must depend on the representation of A and the properties of the machine in question. we have effectively scaled A by s. But in calculating the Rayleigh quotient B = U^AU. For the purposes of argument. then convergence will be slow. chosen to enhance the gap between | AI | and | Am+11. So far as computing the product AU is concerned. Chebyshev acceleration We have noted that if the eigenvalues of A are well separated in magnitude. where pk is a polynomial of degree k. of course. and we need only compute its upper half. note that when we evaluate*^ (A/s). then so is B. if the eigenvalues of A are clustered in magnitude. instead of computing AkU we will compute pk(A)U. at the cost of twice the storage. we need only access A once to compute the entire product A U—but. Chapter 4. both algorithms have the option of economizing storage. If the matrix A is sparse. To see why. From (3. In the algorithm (1.11). ALTERNATIVES A.

How do we compute Ck(A/s)Ul 3. .. we get a convergence ratio of 0. the tradeoff is that we must call the routine to multiply A into an array more often. The choice has the drawback that as the iteration converges. SUBSPACE ITERATION When | AI | is near one. the convergence ratio 393 can be much smaller than the usual ratio | AI | k obtained by powering. z/m —> A m . so that ordinarily we could expect a convergence ratio of 1/1. How does the acceleration affect our procedure for predicting when to perform orthogonalization steps? Let V{ (i — 1. m) be the Ritz values computed in the SRR step arranged in descending order of magnitude. Note that we have used two extra n x m arrays — U and V—to implement the recurrence. it is natural to take 5 = \vm\ for the scaling factor.. we can use the three-term recurrence (see item 1 in Theorem 3. If we generate the final U column by column. Since by the interlacing theorem (Theorem 3.6. In implementing Chebyshev acceleration there are three points to consider. To evaluate Ck(A/s)U. 1. 1. As above.87—a great improvement. we can make do with two extra n-vectors.01.. so that this drawback is of minimal importance.01 = 0.SEC. Chapter 4) The following algorithm implements this recurrence. But in subspace iteration we generally choose m somewhat greater than the number of eigenpairs we desire. suppose that AI = 1. if we use Chebyshev acceleration.7. On the other hand. What value should we use for the scaling factor 3? 2.99. VolIP. so that we do not get the best convergence ratio. Eigensystems . Chapter 1) we know that) A m+1 | < |i/TO| < | A m |. For example.

1976] introduced and analyzed the Schur-Rayleigh-Ritz method for approximating Schur vectors. we must modify the formula for the interval between orthogonalizations in two ways.1955] as the orthogonal version presented here does to the QR algorithm. [16.4.1970]. it is necessary to prove that the SRR procedure retrieves them. . so that where by (3. Thus we can avoid the invocation of a condition estimator and compute ft(T) as the ratio of vi/vm of the Ritz values.1969]. which bears the same relation to Rutishauser'sLR algorithm [224. First. Chapter 4. ALTERNATIVES Turning now to orthogonalization. The full Rayleigh-Ritz acceleration was introduced independently by Rutishauser [227. and Clint and Jennings [41. NOTES AND REFERENCES Subspace iteration Subspace iteration originated in Bauer's Treppeniteration. if / is greater than one. In The Algebraic Eigenvalue Problem Wilkinson [300.394 CHAPTER 6. For nonsymmetric matrices. For symmetric matrices Jennings [130. we will be performing Chebyshev iteration.1957]. The latter paper also introduces the technique of extrapolating residual norms to predict convergence. the Schur T of the Rayleigh quotient is diagonal.1971] compute Ritz vectors to get good approximations to eigenvectors from the Uk. Second. where it is argued that the phenomenon is unlikely to be of importance in practice.Stewart [257.13) by 1.1969]. Although the hypothesis |A m | > |A m+ i| is not necessary for the spaces T^(Uk) to contain good approximations to the Schur vectors.1967] introduced an approximate Rayleigh-Ritz step using first-order approximations to the primitive Ritz vectors. Clint and Jennings [42. 1965] introduced the orthogonal variant. since A is symmetric. A counterexample is given in [257].9). Hence we must replace the formula (1. Stewart [249. The idea of predicting when to orthogonalize is due to Rutishauser [228] for the symmetric case and was generalized to the nonsymmetric case by Stewart [257].

expand the square. NEWTON-BASED METHODS One of the most widely used methods for improving an approximation to a root of the equation f ( x ) = 0 is the Newton-Raphson formula It is a useful fact that specific instances of this formula can be derived without differentiation by using first-order perturbation expansions.SEC. For details and further references see [68]. For example. suppose we wish to compute ^/a for some positive a. All these references contain detailed discussions of implementation issues and repay close study. NEWTON-BASED METHODS Chebyshev acceleration 395 The use of Chebyshev polynomials to accelerate subspace iteration for symmetric matrices is due to Rutishauser [228].271] is designed for real nonsymmetric matrices and uses Rayleigh-Ritz refinement to compute eigenvectors. 2. s]. The Newton correction is On the other hand if we write where d is presumed small. This amounts to solving the equation f ( x ) = x2 — a = 0. The program LOPS I of Stewart and Jennings [270. s] rather than on [-s. as is done in the text. Software Rutishauser [228] published a program for the symmetric eigenproblem. The program SRRIT of Bai and Stewart [12] uses SRR steps to compute Schur vectors. 2. we get the approximate linear equation from which we find that VolII: Eigensy stems . Duff and Scott [68] describe code in the Harwell Library that includes Chebyshev acceleration for the symmetric and nonsymmetric cases. When A is positive definite it is more natural to base the Chebyshev polynomials on the interval [0. Chebyshev acceleration can also be performed on nonsymmetric matrices and can be effective when eigenpairs with the largest real parts are desired. and drop terms in d2.

1. we add the condition where w is a suitably chosen vector. It turns out that it yields some old friends like the Rayleighquotient method and the QR algorithm. Later we will consider the consequences of choosing w — z. In this section we are going to explore the consequences of applying this approach to the eigenvalue problem. Then If we discard products of small terms. ALTERNATIVES Thus the Newton correction d and the approximation (2.396 CHAPTER 6. APPROXIMATE NEWTON METHODS This section concerns the derivation of Newton-based methods and the analysis of their orthogonal variants. 2. For purposes of derivation. z + v) be an exact eigenpair of A. we obtain the following approximate equation in 77 and v: We therefore define approximate corrections to be the solutions of the equation Unfortunately. However.. A general approximate Newton method Suppose we are given an approximate eigenpair (/i.2 we will treat the Jacobi-Davidson algorithm. In the next section we will introduce approximate Newton methods and a class of variants that we will call orthogonal correction methods. .1) to the true correction d are identical. For reasons that will become clear later. z) of A and wish to determine r\ and v so that (//+77. In §2.?+v) is an improved approximation. we will assume that we have an approximation to A. When combined with the Rayleigh-Ritz method. We will begin with the general method and then specialize. it also yields some new methods and suggests ways of implementing them for large sparse eigenproblems. we could choose w — e^. where E is small. let (// + r). To determine the system completely. this equation is underdetermined: there are only n relations to determine the n components of v and the scalar 77. in which case the fcth component of z + v remains unchanged. For example. it becomes the Jacobi-Davidson method.

The correction equation There is an alternative to the correction formula (2. write the first equation in (2. that is of theoretical and practical use.5) the approximate Newton correction equation. Vol II: Eigensy stems .4).SEC. P^-z = 0 and P/r = r. NEWTON-BASED METHODS Now in matrix form our equations become 397 By multiplying the first row of this relation by WH( A — /z/) the second row. we will simply call it the Newton correction formula. It follows that nOW LET be the projector onto the orthogonal complement of w along u. It actually characterizes the correction vector v. To derive it.4) in the form The first order of business is to get rid of 77. Then follows that We will call (2. In particular. In the language of projectors. Let where r j_ is any vector that is orthogonal to r for which r^z ^ 0. 2.4) the approximate Newton correction formula. P^~ is an oblique projector onto the orthogonal complement of r j_ along z. as the following theorem shows. involving only v. we obtain the triangular system l and subtracting it from Hence We will call (2. If A — A.

5). Then Thus v satisfies (2. In other words. We first observe that it is natural to take w = z—that is. However. we need only compute matrix-vector products of the form A± y. . if we restrict p appropriately. then a simple picture shows that the minimum distance between z and span(£ + v) is approximately \\v\\z. and large projections can cause numerical cancellation in expressions that contain them. However. Proof.1. to require the correction v to be orthogonal to the approximate eigenvector z. We have already shown that if v satisfies (2. On the other hand. The correction equation says that the correction v to z satisfies a simple equatior involving the projected operator Although the equation is pretty. Then v satisfies (2. z and v is small compared with z.5).pi be nonsingular.398 CHAPTER 6. For if v J. and their implementation requires a factorization of the matrix A — pi. where y is an arbitrary vector.5). which exhibits the correction explicitly. v is close to the smallest possible correction.4) if and only ifv satisfies (2. which may be too expensive to compute for large sparse problems.5). Oblique projectors can be arbitrarily large. We will return to this point later. we can replace the projections P^. Suppose now that v satisfies (2. the correction formula is cast in terms of inverses. In the above notation. Orthogonal corrections The presence of oblique projectors in the correction equation is disturbing.4) then v satisfies (2. if we use an iterative method based on Krylov sequences to solve the correction equation. let A .and P^ by a single orthogonal projection. something that is easily done. is does not seem to offer much computational advantage over the correction formula. ALTERNATIVES Theorem 2.

Now by Theorem 1. 2. to be the Rayleigh quotient then r is indeed orthogonal to z. correction formula. Vol II: Eigensystems .4. then we may take 399 which is an orthogonal projection. we will usually drop the qualifier "orthogonal" and speak of the approximate Newton correction.7) the orthogonal correction formula and the orthogonal correction equation. Theorem 2. and correction equation. Chapter 2. and (2.to be the same. We should also like P/. Thus we have established the following result. dropping the approximate when A = A.6) and (2.SEC. z) be an approximate normalized eigenpair of A where and lei Let and assume that A is nonsingular. However.2 the orthogonal (approximate Newton) correction.2. Let (^. if we choose fj. Then the Newton-based correction v that is orthogonal to z is given by the correction formula Equivalently. v is the unique solution of the correction equation Three comments on this theorem. But since we will be concerned exclusively with orthogonal corrections in the sequel. NEWTON-BASED METHODS If we take w = z. in this case the correction equation becomes and this equation is consistent only if r J_ z. • We call the vector v of Theorem 2.

But equally important is that the operator A . Specifically.400 CHAPTER 6.2)]. The distinction. we have the following correspondences: in which p is to be determined. In the Z coordinate system. let Z = (z Z_\_) be a unitary matrix and partition We will also write where [see (2. The obvious sense is that we require v to be orthogonal to z. Analysis The fact that the operator in the projection equation (2. the iterated method is not strictly speaking Newton's method. even when A = A. The reason is that we use the normalized vector (z + v)/\\z + v\\2 and its Rayleigh quotient in place of z + v and /i + 77.7) is a projection of A suggests that for the purposes of analysis we should rotate the problem into a coordinate system in which the projector projects along the coordinate axes. • We have used the term "orthogonal" it more than one sense. The correction equation can now be written in the form .fj. however. is in the higher-order terms. However.I is projected onto the orthogonal complement of z in the correction equation. ALTERNATIVES • We have used the expression "approximate Newton method" rather than simply "Newton method" because both the correction formula and the correction equation contain the approximation A.

as pointed out in the last paragraph.§/p||2. The remaining component is By the correction formula (2. the problem is trivial in the Z coordinate system. For in this case the residual for the corrected vector is where.4. A natural quantity to measure the quality of the current approximation z is the residual norm \\r\\2. Vol II: Eigensystems . Second. Third. which makes the first component of the residual equal to zero. Thus in the Z coordinate system the correction equation and the correction formula are so simply related that there is no need to prove a theorem to establish their equivalence. NEWTON-BASED METHODS Hence we have 401 This expression corresponds to the correction formula.8) the first two terms in this expression vanish. By definition it uses the Rayleigh quotient ft corresponding to z. if v is small—which we may assume it to be in a convergence analysis — z is essentially normalized. First. There are three ways to simplify the analysis. and our only problem is to find a reasonably small residual. Chapter 2.\\v\\2. 2. The easiest choice is v — fi -f /iHp. we have \\z\\2 = \\z\\2 + \\v\\2 — 1 -f. f has the smallest possible residual.SEC. we may choose v at our convenience. we have the following theorem. although we should. and hence by Theorem 1. in principle. Thus any residual norm will serve to bound ||r||2. We now turn to the iterative application of the approximate Newton method. it is not necessary to compute f.Thus our problem is to develop a bound for the residual norm corresponding to the corrected vector z = z + v in terms of the original residual norm ||r||2. since v _L z. work with the normalized vector .Consequently. Hence Inserting iteration subscripts.

. convergence to a poorly separated eigenvalue requires a correspondingly accurate approximation. the error in the approximation must be reasonably less than the separation of A from its neighbors in the sense of the function sep. the eigenpairs (nk. then the residual norms ||f. It can then be shown that Hk and Mjt (suitably defined) approach fixed limits A. When € = 0. If then for all sufficiently small H^H. that Hence for £ not too large.) be a sequence of normalized approximate eigenpairs produced by the orthogonal approximate Newton method applied to the matrix A = A + E.Further let Z = (zk Z^) be unitary. so that rjk = \\Tk\\2. the bound ae < I is essentially What this says is that if we are going to use an approximation A instead of A in our iteration.. z) of A. Chapter 4.. If e > 0. the convergence is quadratic.402 CHAPTER 6. We will not present the proof here. so that by Theorem 2. the quantity a^k + &kilk\\Tk\\2) & uniformly bounded below one. Chapter 4. However. Let Tk = Az/j . zjt) (k — 0. Let (^. In other words.10.k and Mk are converging to A and M. In particular.9) can be made the basis of a local convergence proof for the orthogonal Newton-based method. .Hence the convergence is cubic. then the convergence of the method is linear with convergence ratios not greater than ae. if ||ro||2 is sufficiently small.HkZk.1.3. hk = <%. the idea is to obtain uniform upper bounds a on (7k and e on e^ (the quantities r/k are uniformly bounded by ||A||2).t II2 converge monotonically to zero. Since (j. z^ converge to an eigenpair (A. ALTERNATIVES Theorem 2. and define and Then Local convergence The recurrence (2. M. we have by Theorem 2.11. When A is Hermitian.

This variant of the approximate Newton method is closely related to the RQ variant of the QR algorithm. the rate of convergence depends on the accuracy of the approximation A = A + E. Moreover. In this case the approximate Newton's method consists of inner iterations to solve the correction equations and outer iterations consisting of Newton steps. This is an example of an approximation that changes from iteration to iteration. See the notes and references./^/ = A . we generally terminate an inner iteration when the residual is sufficiently small. if E is small enough. If we lack a natural approximation.SEC.12). Vol II: Eigensystems .rl. But this is not always possible. Chapter 2. It turns out the analysis given above covers this case. where r is the center of a focal region containing the eigenvalues of interest. Ideally. 2. if A is diagonally dominant with small off-diagonal elements. As with eigenvalue problems. For example.9) simplifies to where in the above notation a^ — ||(M* . so that the method converges quadratically. for the local convergence of the QR algorithm.rl) l\\2. In this case the approximate Newton method can be implemented solving only diagonal systems. then we must use an iterative method to solve the correction equation (2. NEWTON-BASED METHODS 403 Three approximations In the approximate Newton method. where Ek = (fJtk ~ r K> then A . • Natural approximations. (Note that r is sometimes illogically called a target in the literature.7). In many instances it is sufficient to replace A — \i^I by (A — r/). our iteration bound (2. It is natural to ask what effect a residual-based convergence criterion has on the convergence of the approximate Newton method. In the pure Newton method. Here we give three situations in which it may be desirable to take A ^ A. • Inexact solutions. This diagonal approximation is associated with the name of Jacobi. it is impossible to solve the equation A — Hkl directly. the method will converge rapidly. we might split A = D — E. • Constant shift approximations. we solve equations involving the matrix A — fikl. where D is the diagonal of A and E = D — A. we would take A = A. This is no coincidence. This bound is isomorphic to the bound (2. In this case. Since //^ changes with each iteration. Such approximations arise when A can be split into a matrix A that is easy to invert (in the sense of being able to solve systems involving A — [ikI) and a small remainder E.) If we write A = A + Ek. this may require a matrix factorization at each step. See the notes and references. In some applications.

Moreover. it followsfrom(2. 2. . it sometimes happens that while the method is converging to one eigenpair.11). and its presence in a work whose concern is mostly the tried and true is something of an anomaly. and it is technically interesting— and that is reason enough to include it. If we do not start the method close enough to an eigenpair. THE JACOBI-DAVIDSON METHOD The Jacobi-Davidson method is a combination of the approximate Newton iteration and Rayleigh-Ritz approximation. The basic Jacobi-Davidson method The approximate Newton method is suitable for improving a known approximation to an eigenpair. it has two drawbacks. ALTERNATIVES such that Now since (/ .zzH)r = r. it is reasonable to assume that the iterative method in question produces an approximate solution v that satisfies (/ — zz^)v = v. it has been used successfully. The method is new and less than well understood. it is promising. This implies that we may move E inside the brackets in (2. Chapter 2.2. Moreover.9) can guide our choice of a convergence tolerance for the inner iteration. it produces information that later speeds convergence to other eigenpairs in the focal area.This fact along with the iteration bound (2.404 From Theorem 1. In particular. it may converge to eigenpairs whose eigenvalues lie entirely outside the region we are interested in.ZZE)S = s. there is a matrix CHAPTER 6.3. there is no real control over what eigenvalue the method will converge to. Nontheless. it can be combined with an effective deflation technique that prevents it from reconverging to pairs that have already been found. However. as we shall see. Moreover. so that Thus the effect of terminating the inner iteration with a residual 5 is the same as performing the outer iteration with an approximation in which ||£||2 = ||s||2. We will begin with a general formulation of the Jacobi-Davidson algorithm and then introduce the refinements necessary to implement it. It has the advantage that the Rayleigh-Ritz procedure can focus the Newton iteration on a region containing the eigenvalues we are interested in.10) that (/ . 1.

the correction vector v^ is applied once to the current approximate z^ and discarded—much as past iterates in the power method are thrown away. By greedy... 8. . z] such that /z is near r r — Az — fj. The quantity // is the Rayliegh quotient corresponding to z. 9. Vk. and it is possible that another vector in this space will provide a better approximate eigenvector. 2. . Although the Rayleigh-Ritz method is not infallible. Solve the system (/.K!)(! . 6. Algorithm 2.. The correction is then folded into the correction subspace and the process repeated. NEWTON-BASED METHODS 405 Given a normalized starting vector z.1 implements the basic Jacobi-Davidson algorithm..1: The basic Jacobi-Davidson algorithm o 2. Moreover. Specifically. VQ. we mean that the method attempts to do as well as possible at each step without any attempt at global optimization. If it is expensive to vary K in solving Vol II: Eigensystems .. this algorithm attempts to compute an eigenpair whose eigenvalue is near a focal point r.. v J_ z Orthonormalize w with respect to V V = (V v) end for k Signal nonconvergence Algorithm 2. 3. It consists of a basic Rayleigh-Ritz step to generate a starting vector for an approximate Newton step. 4. In particular (up to normalization). and K.12).zzu)(A.ZZH)V = -r. The matrix A is a suitable approximation of A. Yet this is only one vector in the correction space spanned by z0. 11. •> vk and use a Rayleigh-Ritz procedure to generate 2^+1. fi Choose a shift K near r or fj.z 5. //. 2. This suggests that we save the vectors ZQ. 1.. if (r is sufficiently small) return (^. 1. the new approximation will generally be better than (2.. The algorithm contains three shift parameters: r. we are free to choose one near a focal point. V =v for & = !. because we can choose from among k+1 Ritz pairs. T is the center of a focal region in which we wish to find eigenpairs.kmax Using V" compute a normalized Ritz pair (/^. It is greedy. . so that we can mitigate the effects of wild jumps in the Newton iterates. z)'. As stated in the prologue.VQ. 10.SEC.

The convergence ratios for the first case decrease from about 0..95".952.1 exhibits the results.1.24 to 0. where e is a vector of random normal deviates. which is atypical. we begin with a diagonal matrix A with eigenvalues 1.. It is seen that the JacobiDavidson method is converging faster than the ordinary approximate Newton method. The Jacobi-Davidson algorithm was used to approximate the eigenvector ei. then by the analysis of the last section the choice K = fj. while for the second case they increase from a value of 0.) .95.. in Case 2. the second. As in Example 3. if it is inexpensive to vary K and A — A.406 CHAPTER 6.953.. The table in Figure 2. To slow convergence so that the process has time to evolve. will give quadratic convergence—provided something does not go wrong with the Rayleigh-Ritz procedure. ALTERNATIVES Figure 2.05e. Example 2. However. Two cases were considered: Case 1 in which the Rayleigh-Ritz step was included and Case 2 in which it was not..1. Chapter4. A starting vector u was obtained by normalizing a vector of the form ei + 0.67...28 to 0. The following example shows that the Rayleigh-Ritz refinement can improve the convergence of the approximate Newton method.041..1: Convergence of the Jacobi-Davidson method o the correction equation. The first column shows the convergence of the residual norm in Case 1. it will generally be chosen equal to r.4. K was set to // + 0. (All this excludes the first step.

. The first issue is the matter of deflation.If we decide at the last iteration to declare the iteration has produced a satisfactory approximation to ei and start looking for approximations to e2. we must restrict the iteration to proceed outside the space spanned by the previous eigenvectors. Thus. 2. . we must take care that the iteration does not reconverge to a vector that has already been found. we cannot say just how the iteration should be restricted without knowing the unknown eigenvectors. and write Now suppose (/i. If we have found a set ui.1 is designed to find a single eigenpair. Algorithm 2. Then is block upper triangular. let (U t/j_) be unitary. They track the residual norms rather closely and show that it is the Rayleigh-Ritz approximation that is driving the convergence. The example actually shows more. we will begin with a very accurate starting vector. NEWTON-BASED METHODS 407 The third column shows the tangents of the angles between the correction space and the eigenvector ei. Unfortunately. When more than one eigenpair is to be computed. A cure for this problem is to work with Schur vectors. The last column exhibits the tangents of the angles between the correction subspace and the subdominant eigenvector 62. z] is a normalized eigenpair of B. Let us begin by assuming that we have an orthonormal matrix U of Schur vectors of A.SEC. up}... that is..13) by mimicking the reduction to Schur form. An algorithm to do this would allow us to build up ever larger partial Schur decompositions a step at a time. We wish to extend this partial Schur decomposition to a partial Schur decomposition of order one greater. Specifically. Extending Schur decompositions We now turn to implementation issues. up of Schur vectors.. . i. We begin by observing that we can extend (2..e. U satisfies the relation for some upper triangular matrix T = U^AU. the restriction can be cast in terms of orthogonal projections involving known vectors. It follows that Vol II: Eigensystems .. and let (z Z±) be unitary. we know that the other Schur vectors lie in the orthogonal complement of span(wi.

suppose y satisfies (2. We now compute the residual . z) is an eigenpair of B. z). Let be the orthogonal projector onto the column space of U±. Then y = U±zt where z — U±y. and a new correction vector v. In other words there is a one-one correspondence to eigenvectors of A± lying in 7£( t/j_) and eigenvectors ofB. This typically dense matrix process is not much use in the large sparse problems that are our present concern.15). Note that because V is orthogonal to H(U). For it follows from (2. and let y = U±.13). it follows on multiplying by U± that Bz = fj. an orthonormal matrix V of orthogonalized corrections. We begin by orthogonalizing v with respect to V and incorporating it into V. we will use the Jacobi-Davidson procedure. Then (ji. The ingredients for a step of the procedure are the matrix U of Schur vectors. Note that a knowledge of the eigenvector y is all that is necessary to extend the partial Schur decomposition..z. the Rayleigh quotient can also be computed from the original matrix A in the form VTAV. We next compute the Rayleigh quotient B = VTA±V and a Ritz pair (/i. Then it is easily verified that Now suppose that (p.z. We will assume that V and v are orthogonal to 7£(J7).14) that the extended partial Schur decomposition is where To compute a vector y satisfying (2. CHAPTER 6. Since py = A±y = U±Bz. the following observation allows us to recast the algorithm in terms of the original matrix A. ALTERNATIVES is a partial Schur decomposition of A of order one higher than (2. y) satisfies Conversely.15). However.408 Hence.

On the other hand. Thus v should be orthogonalized against both U and V—with reorthogonalization if necessary. multiplying s by UH. z. the focal point r (which may vary).. in statements 4 and 15). This is done in statement 5. • As V grows. to get another correction to add to V. There is a subtle point here. we want to be sufficiently small.SEC. if there is cancellation when v is orthogonalized against V.16) and (2. we compute the size of its components in U and U±. we solve the correction equation where v _L U. • In principle. since by definition U^v = 0. • Again excluding operations with A. Algorithm 2. • Excluding operations with A (e. • The routine EPSD (Extend Partial Schur Decomposition) is designed to be used by a higher-level program that decides such things as the number of Schur vectors desired. There are several comments to be made. To assess the size of s. we can expect the operation count to be a multiple of n(m—£)2 flam. 2. the mainfloating-pointoperations come from matrix-vector products such as Vp. Thus if U is not too large and the algorithm returns at k = m < m. However. Vol II: Eigensystems . it is not necessary to recompute the entire Rayleigh quotient C — only its southeast boundary.\iz and orthogonalize it against U. First. Rather we compute Az . we get Hence Ps = 0. • We do not compute the residual r directly. What we really want is to get a satisfactorily small residual when we substitute z for y in (2.17)—that is. Finally. if we multiply s by Pj_ = U±U±. and how to restart when the arrays V and W are about to overflow storage. we do not have to orthogonalize v against U in statement 3. the storage requirements are up to 2nm floating-point numbers.2 implements this scheme.g. z) can be accepted as our eigenpair. the results will no longer be orthogonal to U to working accuracy. NEWTON-BASED METHODS 409 and test to see if it is small enough that (//. we get It follows to that we may use r in a convergence test.

It is assumed that the arrays V and W cannot be expanded to have more than m columns. and AX = PL APj.2: Extending a partial Schur decomposition o . are V.C/f/ H .. Algorithm 2. an nx (•£—!) matrix of orthonormalized correction vectors. C — WHV\ a focal point r. in addition to /7. Pj_ = / .410 CHAPTER 6. an additional correction vector v. W = AV. EPSD extends it to a partial Schur decomposition of the form The inputs. EPSD returns the next correction vector v. ALTERNATIVES Given a partial Schur decomposition AU = t/T. In the following code. If convergence does not occur before that happens. and an approximation A = A + E.

Here we use the notation of Algorithm 2. detect these imposters by the fact that they have large residuals. Chapter 4. But it is better to have a good vector in hand than to know that a vector is bad. Harmonic Ritz vectors In §4.11. the Jacobi-Davidson algorithm builds up a basis V from which eigenvectors can be extracted by the Rayleigh-Ritz method. Recall (Definition 4. the natural choice of K is either r or /z. Vp) is a harmonic Ritz pair with orthonormal basis V and shift r if Moreover.r!)U. Specifically. the Rayleigh-Ritz method may fail. Unlike Krylov sequence methods. It is possible. Chapter 4) that (r + £. it may make the correction equation harder to solve—requiring. however. of course. An alternative is to use the harmonic Rayleigh-Ritz procedure.SEC. if WR is the QR factorization of (A . Restarting Like Krylov sequence methods. We can. At that point we must restart with a reduced basis. the Jacobi-Davidson method does not have to preserve any special relation in the new basis — we are free to restart in any way we desire. Fortunately we can extract that information by using a Schur-Rayleigh-Ritz process. Otherwise put. a matrix factorization at each step. If such a vector is accepted as a Ritz vector. Ultimately. 2. The basis V contains hard-won information about the vectors near the focal point.4. The later choice will give asymptotically swifter convergence. we replace V with a basis for the primitive Ritz subspace corresponding to the Ritz values within p of r. we noted that an interior eigenvalue could be approximated by a Rayleigh quotient of a vector that has nothing to do with the eigenvalue in question. V will become so large that it exceeds the storage capacity of the machine in question. a focal point r. and a radius />. we may proceed as follows.2. but since \JL varies with fc. given V". for example. to abuse that freedom. then Vol II: Eigensystems . NEWTON-BASED METHODS 411 • As mentioned above. information that should not be discarded in the process of restarting. which discourages imposters.

r + 6). Note that the matrix B — W^ U can be updated as we bring in new correction vectors v.4. The Schur form is diagonal.18). 1:1] and computing its QR factorization. Similarly economies can be achieved in the restarting procedure (2.3). we showed that where z — Vv is the harmonic Ritz vector. we have the following algorithm. The principle change in the harmonic extension (Algorithm 2. is in how the harmonic Ritz vectors are . Chapter 4. In §4.2. Note the indirect way of updating the QR factorization of the restarted matrix It is faster than simply forming WRQ[:. Specifically. since that value will not make the residual orthogonal to V. Instead we must use the ordinary Rayleigh quotient p^VAVp. A little algebra then shows that Algorithm 2. Having computed a primitive harmonic Ritz pair (p.3 implements the harmonic version of partial Schur extension. ALTERNATIVES so that we can calculate 6 and p from this simpler and numerically more stable generalized eigenproblem. the Rayleigh quotient C is Hermitian. other than the natural economies listed in the last paragraph. so that only the upper half of C need be retained. The algorithm for restarting the harmonic extension algorithm is more complicated than Algorithm 2. Hermitian matrices When A is Hermitian some obvious simplifications occur in Algorithm 2.2 (EPSD). For example.2 owing to the need to keep the columns of W orthogonal. we cannot use the harmonic eigenvalue in computing the residual. it has comparable storage requirements and operation counts. so that EPSD need only return the converged Ritz pair (statement 13).412 CHAPTER 6. Although it is more complicated than Algorithm 2.

In the following code P. an nx(l—1) matrix of orthonormalized correction vectors.SEC. and Aj_ = P^AP_L. are V. NEWTON-BASED METHODS 413 Given a partial Schur decomposition AU — UT. and an approximation A = A + E. It is assumed that the arrays V and W cannot be expanded to have more than m columns. HEPSD returns the next correction vector v. If convergence does not occur before that happens.UUH.L = I . R). Algorithm 23: Harmonic extension of a partial Schur decomposition o Vol II: Eigensystems . The inputs. in addition to U. an additional correction vector v\ the QR factorization WR of (A — r!)V. a focal point r. the current harmonic pair (#. 2. HEPSD extends it to a partial Schur decomposition of the form using harmonic Ritz vectors.

we will assume that a solution x exists. when A is Hermitian. We should therefore use the Hermitian eigenproblem to compute harmonic Ritz pairs. we will consider the general system in which we will assume M is nonsingular. If C has been precomputed. both for residuals for the correction equation and for restarting. .22) and (4. It is easy to verify that if x is defined by (2.21). If it is expensive to solve systems involving the matrix M. • Statement 1 anticipates the use of this algorithm to solve several systems.20). Therefore. To derive the solution of this system. Here are three comments. then the bulk of the work in the algorithm will be concentrated in the formation of Q = M~1Q. it can be used whenever another solution involving M and Q is required. ALTERNATIVES computed. Solving projected systems We now turn to the problem of solving the correction equation It will pay to consider this problem in greater generality than just the correction equation. Chapter 4.4.414 CHAPTER 6. the nonsymmetric eigenproblems (4. where a is a solution of (2.22). a linear system from which we can compute a.4 implements this scheme.23). we have or Hence. Algorithm 2. can give complex harmonic Ritz pairs. Since x _L Q. then x satisfies (2. Chapter 4. As noted in §4.

we can expect cancellation in statement 5. In the first place.. x is likely to have inaccuracies that will propagate to the solution x. The correction equation: Exact solution We now return to the solution of the correction equation Since z _L U. 2. Even when there is no cancellation.SEC. then the vector x will not be large.4 to solve the correction equation. For more see the discussion on economization of storage on page 391.4: Solution of nroiftcted ennations Likewise. the algorithm requires that we solve systems of the form Vol II: Eigensystems . However. if n is very large. it should be one that can be stably updated without pivoting—e. • We could go further and precompute or update a factorization of C by which we could calculate the solution of the system in statement 4. we must allocate additional storage to contain Q — at least when M is nonsymmetric. The mechanics of the algorithm argue strongly against changing the shift K frequently. then one can solve the system Mz z and update C in the form In both cases there is a great savings in work. we can set Q = (U z) and use Algorithm 2.g. But if a factorization is to be updated. the savings will be insignificant. while the vector x can be very large. NEWTON-BASED METHODS This algorithm solves the projected equation 415 where Q is orthononnal. a QR decomposition. if Q is augmented by a single vector z. • If Mis ill conditioned but (I-QQH)M(I-QQH) is not. In this case. and M is nonsingulai Algorithm 2. However.

since the preconditioner must approximate the projected system. In application to the Jacobi-Davidson algorithm.4 suggest that we should avoid ill conditioning in A — K!. which has to be recomputed whenever K changes. however. But that is not the whole story. Our comments on the accuracy of Algorithm 2. not the shifted matrix. by a process called preconditioning. we need only know the operations we are required to perform with the matrix of our system.23) is the following algorithm. to describe how iterative solutions are used in the Jacobi-Davidson method. we can form the matrix-vector product z = M~lKw by the following algorithm. Second. If it is cheap to solve systems in M. Thus. Fortunately. if we can find an effective preconditioner that does not depend on K. we have more . it is easy to use Krylov sequence iterations with it. This means we should avoid a focal point r that accurately approximates an eigenvalue A of A. the matrix K is. The subject of iterative methods is a large one—far beyond the scope of this volume. The idea is to determine a matrix M — usually based on special characteristics of the problem in question—such that M~1K is in some sense near the identity matrix. it itself must be projected. The correction equation: Iterative solution An alternative to the exact solution of the correction equation is to use an iterative method to provide an approximate solution. ALTERNATIVES (A — K!)X = y.3 as the shift K. Second. Let us consider a general linear system Ku — b. The convergence of most Krylov sequence methods can be improved. as usual. where Q = (U z). Since the Jacobi-Davidson algorithm already requires that we be able to form matrix-vector products in A. In the Jacobi-Davidson method. The use of a preconditioned solver gives us more freedom to choose a shift K.QQH)( A — «/)(/ QQH). The most successful of the various iterative methods for solving this system are based on Krylov sequences.The preconditioner M is determined from the matrix A — AV/.416 CHAPTER 6. Thus our equivalent of (2. This means that we only need to be able to form the matrix-vector product Kw for an arbitrary vector w. The system in statement 2 can be solved by Algorithm 2. the system to be solved involves the preconditioner. it could retard convergence to other eigenpairs.2 and 2. All this suggests that we use the focal point r in Algorithms 2. This requires a factorization of A. First. we can vary K — at least within the limits for which the preconditioner is effective. often dramatically. (/ — QQH)(A — K!)(! — QQ^). if K does not change. the matrix is (/. and update it by adding the vector z as described above. Even though such a focal point will speed up convergence to A.4. we can precompute the matrix C.

2: Convergence of the Jacobi-Davidson method with errors in the correction equations freedom to place K near an eigenvalue. From a local point of view. we can choose to solve the correction equation more or less accurately. there is very little hard analysis to guide us in this matter. since we work only with the projected operator (/ — QQ^)(A — Kl)(I — QQ^} and never have to explicitly solve systems involving A -AC/. The table in Figure 2. The convergence is satisfactory.01 is introduced in the correction equation. Since we are free to stop an iterative method at any time. However. However. There is a tension between the local convergence driven by the approximate Newton iteration and the broader convergence driven by the correction subspace. The following example illustrates this phenomenon. though not as fast as in Example 2. The data is the same as in Example 2. 2. The only difference is that at each stage a norm wise relative error of 0.2 shows the progress of the iteration (for a different random starting vector). If we decrease the error in the solution to the correction equation in the above exVol II: Eigensystems .4. Example 2. the approximations to 62 in the correction space stagnate after an initial decline.SEC. Unfortunately. only a certain number of figures of v actually contribute to the improvement of z. and there is no reason to solve the correction equation more accurately than to that number of figures.4 (which the reader should review).5. this lack of accuracy can exact a price when we come to compute other eigenpairs. NEWTON-BASED METHODS 417 Figure 2.

The fact that asymptotically Newton's method and the Rayleigh quotient method produce essentially the same iterates can be established by a theorem of Denis and Schnabel [64] to the effect that any two superlinearly converging methods launched from the same point must up to higher-order terms produce the Newton direction. Many people have applied Newton's method to the eigenvalue problem. The formulation in terms of derivatives is due to Simpson. then up to second-order terms in w we must have WTZ = 0. working with a polynomial equation p(x) = 0. ALTERNATIVES ample to 10~3 and 10~4. etc. The formula (2. Peters and Wilkinson [211] explore in detail the relation of Newton's method to the inverse power method. The scheme involved explicitly computing the polynomials p. any convergence of the correction subspace should be toward eigenvectors of A = A + E. when the approximate Newton method is used. expanded p(x + d) = q(d) and ignored higher terms in d to get a linear equation for d. They justify the choice w — z by observing that if we require z and z + w to be normalized. 2. NOTES AND REFERENCES Newton's method and the eigenvalue problem According to Kollerstrom [150] and Ypma [307] neither Newton nor Raphson used the calculus in the derivation of their methods. More generally. q. (However. He then continued by expanding q(d -f e) = r(e). but the convergence of approximations to 62 stagnates at 5-10~3 and 5-10~4. should stagnate at the level of \\E\\. see [4]. Newton. partitioning the R. linearizing. we must have (here we are ignoring second-order terms in the normalization of the first row and column of the Q-factor). Newton was actually interested in computing series expansions for x.418 CHAPTER 6.) Simpson showed how to effect the same calculations by working only with p.3. with w — z is due to Peters and Wilkinson [211].and Q-factors. in which case the extra work in computing the polynomials is justified. Specifically. so that approximations to eigenvectors of A. and solving for e.3). For some references. other than the one currently being found. he seems to have been the originator of the idea expanding and linearizing to compute corrections to zeros of functions —just the approach we use here. It is ironic that although Newton cannot be credited with the formula that bears his (and Raphson's) name. the convergence to ei is marginally faster. — obviously a computationally expensive procedure. r. It then follows that . suggesting that the approximations to vectors other than the current vector will stagnate at about the level of the error in the solution of the correction equations. The relation of Newton's method to the RQ variant of the QR algorithm is best seen by considering the RQ factorization of A—K! in the Z coordinate system.

To honor these distinctions we use the term "approximate" rather than "inexact" Newton method. Dembo. since any residual in the approximate solution of the Newton equation can be thrown back onto the Jacobian and conversely any perturbation of the Jacobian in the Newton equation corresponds to an approximate solution with a nonzero residual. (V"!-1-. Olsen [186] and his coauthors observed that the lack of orthogonality of v to z was a drawback. The chief difference is in the correction equation. the Jacobi-Davidson method might better be called the Newton-Rayleigh-Ritz method.5.5: Davidson's method o so that up to second-order terms the correction from the Q-factor and the Newton correction are the same. NEWTON-BASED METHODS 419 This algorithm takes a symmetric diagonally dominant matrix A = D + E. In some sense these methods are related to the approximate Newton methods introduced here. The story starts with Davidson's algorithm [53. where D is the diagonal of A and an orthonormal matrix V of correction vectors and returns a Ritz pair (^u. The Jacobi-Davidson method As it has been derived here. In 1990. z) approximating the fcth smallest eigenvalue of A and an augmented correction matrix R Here. and Steihaug [55] introduced and analyzed inexact Newton methods. Algorithm 2. Eisenstat. nroiftction on to 7?. It is very close to Algorithm 2. so that the asymptotic relation between the inverse power method and Newton's method insures that Davidson's method will work well. Inexact Newton methods In 1982. But in other. However. and it is instructive to see how the current name came about. which is sketched in Algorithm 2. which is essentially an approximate inverse power method.SEC. Using first-order perturbation expansions. For diagonally dominant matrices the approximation is quite good. from which it computes a Ritz vector and a new correction that is folded back into the subspace. it differs from the approximate Newton method given here in that we are working with a special constrained problem and the approximation is not to the Jacobian itself. in which the Newton equation is solved inexactly. 2. they rediscovered Vol II: Eigensystems . 1975].1 — it builds up a correction subspace. Pi is the. less favorable problems it tends to stagnate.

their major contribution was the introduction of the correction equation.3). 244]. It is worth noting that the author's application was so large that they could not save the corrections. 242. but with an approximation as in (2. and hence what they actually used was the approximate Newton method. ALTERNATIVES the matrix formulation of Peters and Wilkinson.243.420 CHAPTER 6. . thus deriving the orthogonal Newton step. Templates may be found in [241. 245] Sieijpen. which expresses the correction in terms of a projection of the original matrix. In later papers [74. Hermitian matrices The implementation for harmonic Ritz vectors suggested here differs somewhat from the implementation in [243]. We have preferred to leave V orthogonal and work with the harmonic Rayleigh-Ritz equation (2. and their colleagues also adapted the algorithm to harmonic Ritz vectors and the generalized eigenvalue problem. In 1996. 239. Thus Olsen's method is Davidson's method with an approximate Newton correction. Jacobi partitioned a dominant symmetric matrix A in the form Assuming an approximate eigenvector in the form ei. he computes a refined eigenvector of the form This is easily seen to be one step of the approximate Newton method with a diagonal (Jacobi) approximation. they went on to show how to deflate the problem and to incorporate iterative solutions with preconditioning. Sleijpen and van der Vorst [240] derived essentially the same algorithm by considering an algorithm of Jacobi [128. van der Vorst. Sleijpen and van der Vorst extended this example to general matrices. Specifically. There the authors accumulate R~l in the basis V. From the point of view taken here.1846] for approximating eigenvectors of diagonally dominant matrices.6).19). and solved it to get the correction formula (2. In particular. It is the key to making the method effective for large sparse matrices.

The scalars that form a vector are called the components of the vector. Thus the vector a may be written Figure A. 1. All scalars are denoted by lowercase Latin or Greek letters. An n-vector is a collection of n scalars arranged in a column. Scalars are real or complex numbers. This background has been presented in full detail in Volume I of this work.I gives the Greek alphabet and its corresponding Latin letters (some of the correspondences are quite artificial). The set of real numbers will be denoted by R. SCALARS. If a matrix is denoted by a letter. Here we confine ourselves to a review of some of the more important notation and facts. 3. the set of complex numbers by C. the set of complex mxn matrices by C m x n . VECTORS. AND MATRICES 1. An m x n matrix is a rectangular array of mn scalars having m rows and n columns.APPENDIX: BACKGROUND The reader of this volume is assumed to know the basics of linear algebra and analysis as well as the behavior of matrix algorithms on digital computers. 2. the set of complex n-vectors by C n . Matrices are denoted by uppercase Latin and Greek letters. The set of real mxn matrices is denoted by R m xn. If a vector is denoted by a letter. The scalars in a matrix are called the elements of the matrix. 5. its elements will be denoted by a corresponding lowercase Latin or Greek 421 . its components will be denoted by a corresponding lowercase Latin or Greek letter. Vectors are denoted by lowercase Latin letters. 4. The set of real nvectors is denoted by Rn.

and the conjugate transpose as A^.422 APPENDIX: BACKGROUND Figure A. 3. Akk. It has the following useful properties. The transpose of A is written A1. 2. 5. ^22. det(AH) = det(A).I: The Greek alphabet and Latin equivalents o letter. 1. The conjugate of a matrix A is written A. If A is block triangular with diagonal blocks AH . 4. The matrix sum and matrix product are written as usual as A -f B and AB. The determinant of A is written det(A). If A is of order n. 9. The trace of a square matrix is the sum of its diagonal elements. det(AB) = det(A)det(£). If A is nonsingular. It is written trace(A). . 8. then det(A) = det(4n) det(422) • • -det(Akk). its inverse is written A~l. 1. det(A~l) = det(A)-1. An n x n matrix is said to be square of order n. then det(iiA) = iin(A). • • • . Thus the matrix A may be written 6. We also write 10. 11.

an identity matrix is written simply /. . When the context makes the order clear. The vector whose Uh component is one and and whose other components are zero is called the z'th unit vector and is written e t .. . A matrix with orthonormal rows or columns is said to be orthonormal. The vector whose components are all one is written e. 20. Premultiplying a matrix by P rearranges its rows in the order n .in- 19. 12. If A and B are matrices of the same dimension. AND MATRICES 423 6. A matrix U is unitary if UHU = /. VECTORS. Postmultiplying a matrix by P rearranges its columns in the order *!>•••. 7. It is written \A\. A square matrix whose off-diagonal elements are zero is said to be diagonal. SCALARS. . . 23. The matrix is called the cross operator. f>n is written diag(<$i. 16. in... Premultiplying a matrix by the cross operator reverses the order of its rows. | det( A) | is the product of the singular values of A. or O m X n when it is necessary to specify the dimension. 14. in is a permutation of the integers 1. . 22. . . or On when it is necessary to specify the dimension. Postmultiplying a matrix by the cross operator reverses the order of its columns. If ii. It is Hermitian if AH = A. 18. and A > B. .. The absolute value of a matrix A is the matrix whose elements are the absolute value of the elements of A.SEC. we write A < B to mean that ij < bij. The matrix /„ whose diagonal elements are one and whose off-diagonal elements are zero is called the identity matrix of order n. . . Similarly for the relations A < B. Any vector whose components are all zero is called a zero vector and is written 0. 21. Sn). A > B. . det( A] is the product of the eigenvalues of A. A real matrix U is orthogonal if U^U — I.. A matrix A is symmetric if A1 = A.. 1. n. Any matrix whose components are all zero is called a zero matrix and is written 0.. a 13. 15. Vol II: Eigensystems . 17. the matrix is called a permutation matrix... A diagonal matrix whose diagonal elements are ^ j .

The symbol "X" stands for a quantity that may or may not be zero (but usually is not). 5. X. 2. or matrices are denoted by calligraphic letters — e. sets of scalars. Let F be R or C depending on the underlying scalars. The rank A is 6. En. The null space of A is N(A) — {x: Ax — 0}. 2. The column space of a m x n matrix A is the subspace H(A) = {Ax: x 6 F71}. If X and y are disjoint subspaces. A rectangular matrix whose elements below the main diagonal are zero is upper trapezoidal.424 APPENDIX: BACKGROUND 24. The nullity of A is . A Wilkinson diagram is a device for illustrating the structure of a matrix. then 3. 25.. Triangular or trapezoidal matrices whose main diagonal elements are one are unit triangular or trapezoidal. The subspace spanned by a set of vectors X is written span(^f). With the exception of the sets R. The following is a Wilkinson diagram of a diagonal matrix of order four: Here the symbol "0" stands for a zero element. 4. The dimension of a subspace X is written dim(X}. C. vectors. but a zero is always 0 or 0. 27.g. A rectangular matrix whose elements above the main diagonal are zero is upper trapezoidal. etc. Lower and upper triangular matrices have Wilkinson diagrams of the form 26. Letters other than X are often used tc denote such elements.. LINEAR ALGEBRA 1.

An nxp matrix X is of full column rank if rank(J^) = p. POSITIVE DEFINITE MATRICES 1. If A is positive definite. Pivoting may be necessary for stability (see Algorithm 1:3.1. i 9. The diagonal elements of a positive definite matrix are positive. its determinant is nonnegative. 2. 6. Any principal submatrix of a positive definite matrix is positive definite.g. 3..1) computes the LU decomposition. then there is a positive (semi)definite matrix A 7 such that 8. THE LU DECOMPOSITION 1. its determinant is positive. We say that X is of full rank when the context tells whether we are referring to column or row rank (e. It is positive definite if and only if X is of full column rank.3). If A is positive definite. Gaussian elimination (Algorithm 1:3. 7. If A is positive semidefinite. 5. It is of full row rank if rank(X) = n. Then A can be factored uniquely in the form where L is unit lower triangular and U is upper triangular. 3. Then X^AX is positive semidefinite. We call the factorization the LU decomposition of A and call the matrices L and U the L-factor and the U-factor. 5. then A is nonsingular and A~l is positive definite. THE CHOLESKY DECOMPOSITION 425 7. Let A be of order n. 4. The matrix A 2 is unique. some people drop the symmetry condition and call any matrix A satisfy ing the conditions ^ 0 => xHAx > 0 a positive definite matrix. Be warned. 2. then A is positive semidefinite.1. 4. Let A be positive definite and let X G C n x p . Vol II: Eigensystems . A is positive (semi)definite if and only if its eigenvalues are positive (nonnegative). If A is positive (semi)definite. Then A is positive definite if If x / 0 => x Ax > 0.SEC. Let A be of order n and assume that the first n-1 leading principal submatrices of A are nonsingular. when n > p). In the real case.

426 APPENDIX: BACKGROUND 5. 3. The matrix Q is the Q-factor of A. 2. Specifically.2.2.5. then we can write which is called the QR factorization of R. Then there is a unitary matrix Q of order n and an upper triangular matrix R of order p with nonnegative diagonal elements such that This factorization is called the QR decomposition of A.1. This factorization is called the Cholesky decomposition of A. It is unique. There is a pivoted version of the decomposition. If X is of full rank. If we partition Q — (Qx QL)-> where Qx has p columns. 6. Let X be an nxp matrix with n > p. the QR factorization is unique. THE QR DECOMPOSITION 1.2 computes a pivoted Cholesky decomposition. THE CHOLESKY DECOMPOSITION 1. there is a permutation matrix P such that where the elements of R satisfy Algorithm 1. and the matrix R is the R-factor. The Cholesky decomposition can be computed by variants of Gaussian elimination. 2. 4. . If X is of full rank. See Algorithm 1:3. 3. The matrix R is called the Cholesky factor of A. Any positive definite matrix can be uniquely factored in the form where R is an upper triangular matrix with positive diagonal elements. then Qx is an orthonormal basis for the column space of X and Q± is an orthonormal basis for the orthogonal complement of the column space of X.

A = XHX is positive definite. Such a projector is called an orthogonal projector.1. Vol II: Eigensystems . There is a pivoted version of the QR decomposition. 7. there is a permutation matrix P such that XP has the QR decomposition where the elements of R satisfy Algorithm 1. Its complementary projector is written Pj_. PROJECTORS 1. where X is any orthonormal basis for 7£(P). It projects onto y± along X. which is said to project onto X along the orthogonal complement y± of y. then P = X Jf H .1 computes a pivoted QR decomposition.11-13). 7. I—P is its complementary projector. Any vector z can be represented uniquely as the sum of a vector Px G X and a vector (/ . then P is a projector (or projection).4. If P2 = P. The QR factorization can be computed by the Gram-Schmidt algorithm (Algorithms 1. 2. Specifically. 4.5.2.4. Any projector P can be written in the form The spaces X — 7Z(X) and y = TZ(Y) uniquely define P. 6. If P is a projector onto X along y±. If X is of full rank. 3. and R is the Cholesky factor of A. The QR decomposition can be computed by orthogonal triangularization (Algorithm 1.SEC.1. 7. PROJECTORS 427 5.P)x <E y±.2). If P is a Hermitian projector.

This page intentionally left blank .

Demmel.1998. Greenbaum. Du Croz. Bai. Sorensen. Baglama. Calvetti. SIAM. and H. Amoldi. [2] A. Linear Algebra and Its Applications. van der Vorst. D. Anderson. II: The evaluation of the latent roots and latent vectors of a matrix. and L. [5] W. E. A. 1995.1996. On swapping diagonal blocks in real Schur form. Aitken. Bai and J. C. 146:203-226. S. C. Aitken. The solution of characteristic value-vector problems by Newton's method. [7] J. S. W. M. D. SIAM. J. Note on the quadratic convergence of Kogbetliantz's algorithm for computing the singular value decomposition. Reichel. Journal of Computational Physics. 11:38-45. A. Bai and J. and L. McKenney. Demmel.org/lapack/lawns/lawn08 . 1989. 9:17-29. J. Quarterly of Applied Mathematics. Also available online as LAPACK Working Note 8 from http: //www. LAPACK Users' Guide. J. Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Bai. Cited in [29]. [3] E. [4] P. [9] Z. International Journal of High Speed Computing. Reichel.1937. [11] Z. Baglama. 37:400-440. Calvetti. Linear Algebra and Its Applications. Bischof. Ruhe. [10] Z. Demmel.ps. J. [8] Z.netlib. On Bernoulli's numerical solution of algebraic equations. Hammarling. second edition.1951. Computation of a few small eigenvalues of a large matrix with application to liquid crystal modeling. and D. Anselone and L.1926. Ostrouchov. Rail. C. 2000. BIT. The principle of minimized iterations in the solution of the matrix eigenvalue problem. Philadelphia. B. Philadelphia.1993. Dongarra. On a block implementation of Hessenberg QR iteration. A. 57:269-304. [6] J. J. Proceedings of the Royal Society of Edinburgh. 1:97-112. Demmel. 46:289-305. Dongarra. Numerische Mathematik.1968. 429 . Bai. Studies in practical mathematics. Proceedings of the Royal Society of Edinburgh.REFERENCES [1] A. Z. 186:73-95. Iterative methods for the computation of a few eigenvalues of a large symmetric matrix. 104:131-140.1988.

1971. Bhatia. 9:386-393. 23:494-513. SIAM Journal on Applied Mathematics.430 REFERENCES [12] Z.1967. Large scale singular value computations. 8:820-826. Philadelphia. 142:195-209. L. 1987. [20] A. Stewart. 1970. H. Bartels and G. [21] M.BX = C. Stewart. SIAM Journal on Matrix Analysis and Applications. Error analysis of update methods for the symmetric eigenvalue problem.1990. Barlow. Bhatia. 16:359-367. An English translation by D. [18] R. Berry. Calculation of the eigenvalues of a symmetric tridiagonal matrix by the method of bisection. [15] W. S. SIAM Journal on Numerical Analysis. Springer. Numerische Mathematik. W. 1993. Ben-Israel. McGraw-Hill. 8:214-235. W. 6:13-49. Stewart. Bauer. [19] E. Martin.1957. Numerical Methods for Least Squares Problems.1997. Pitman Research Notes in Mathematics. H. 1990. Wilkinson. Longmann Scientific & Technical. 1996. R.1972. Essex. New York. Krause. Introduction to Matrix Analysis. Beltrami. 11:98-106. Das Verfahren der Treppeniteration und verwandte Verfahren zur Losung algebraischer Eigenwertprobleme. 21:51-54. Matrix Analysis. Eisner. ACM Transactions on Mathematical Software. Also in [305.1873. pages 249-256]. [16] F. and J. Bhatia. Zeitschriftfur angewandte Mathematik undPhysik. Berman and A. W.1992. [24] R. and G. Bavely and G. [23] R. . 1979. L. Harlow. 1997. Bellman. An algorithm for computing reducing subspaces by block diagonalization. Algorithm 776: SRRIT—A Fortran subroutine to calculate the dominant invariant subspace of a nonsymmetric matrix. [22] R. Communications oj'the ACM. Minneapolis. Linear Algebra and Its Applications. Giornale di Matematiche ad Uso degli Studenti Delle Universita. 14:598-816. Bounds for the variation of the roots of a polynomial and the eigenvalues of a matrix. L. Algorithm 432: The solution of the matrix equation AX . [13] J. Bai and G. W. Published in the USA by John Wiley. [25] A. SIAM. International Journal of Supercomputer Applications. A note on pencils of Hermitian or symmetric matrices. Department of Computer Science. second edition. New York. Bjorck. Barth. [17] C. [14] R. University of Minnesota. Perturbation Bounds for Matrix Eigenvalues. Sulle funzioni bilineari. Boley is available in Technical Report 90-37.

Lawrence. [27] A.1980. ACM Transactions on Mathematical Software. Numerische Mathematik. A reprint of Volumn 34. Cay ley. P. [29] K. R. Masson. [28] C. [34] T. W. ACM Transactions on Mathematical Software. Thirteen volumes plus index. 53:281—283. 1982. Sorensen. Arthur Cayley and A. Chatelin. Schneider. University of Kansas. [38] F. editors.1978. [36] S. M. v. Plemmons. Large-Scale Matrix Problems.1988. Charlier. [31] A. pp. New York. Cambridge University Press. 52:270-300. 148:17-37. H. Van Dooren. and P. L. Rank one modification of the symmetric eigenproblem. New York. 1993. Bjorck and G. Forsythe. Borchart. Golub. [39] F. 8:84-88. Nielsen. The multi-shift QR-algorithm: Aggressive deflation.1858. F. Vangegin. F. R. Philosophical Transactions of the Royal Society of London. Braman. Algorithm 581: An improved algorithm for computing the singular value decomposition. [35] T. Chatelin. 1988. Bemerkung iiber die beiden vorstehenden Aufsatze. R. J. and D. Cayley. Eigenvalues of Matrices. Mathias. Journal fur die reine und angewandte Mathematik. Cambridge. 475-496]. of Linear Algebra and Its Applications.REFERENCES 431 [26] A. Cited and reprinted in [33. volume 9. Cauchy. Bunch. Valeurs Propres de Matrices.2000. SIAM Journal on Matrix Analysis and Applications. Bjorck. Chan. 1999. North-Holland. and R. 2. [30] J.1973. The Collected Mathematical Papers of Arthur Cayley.1857. [37] J. C. On efficient implementations of Kogbetliantz's algorithm for computing the singular value decomposition.1982. and level 3 performance. Wiley. An improved algorithm for computing the singular value decomposition. Numerical methods for computing angles between linear subspaces. Chandrasekaran. 31:31-48. An efficient and stable algorithm for the symmetric-definite generalized eigenvalue problem. VolII: Eigensystems . 27:579-594. 1829. editors. In Oeuvres Completes (IIe Serie). Mathematics of Computation. maintaining well focused shifts. Technical report. R. 8:72-83. Expanded translation of [38]. New York. Department of Mathematics. A memoir on the theory of matrices. Paris. Chan. Reprinted 1963 by Johnson Reprint Corporation. [33] A. Sur F equation a 1'aide de laquelle on determine les inegalites seculaires des mouvements des planetes. and H. 1889-1898. Numerische Mathematik. [32] A. Byers. P. C. 21:1202-1228. 1981.

Moon. Curtis and J. The nearest definite pair for the Hermitian generalized eigenvalue problem.1984. [45] J. NJ.W. Reid. 1971. A simultaneous iteration method for the unsymmetric eigenvalue problem. Finding a positive definite linear combination of two Hermitian matrices. K. [46] J. Englewood Cliffs. Lake. [49] J.1981. 12:278-282. . Birkhauser. H. Numerische Mathematik. 1970. sparse real symmetric matrices. R. 1995. 44:329-358. Stuttgart.432 REFERENCES [40] S. Pacific Grove. Jennings. Numerical Linear Algebra and Applications. AZ. The evaluation of eigenvalues and eigenvectors of real symmetric matrices by simultaneous iteration. Crawford. R.1972. 302-303:6376. A. Higham. Cullum and W. 10:118124. Datta. Jennings. M. Clint and A. Willoughby. A block Lanczos algorithm for computing the Q algebraically largest eigenvalues and a corresponding eigenspace of large. 13:7680. ILPrograms. Algorithm 646 PDFIND: A routine to find a positive definite linear combination of two real symmetric matrices. A. Brooks/Cole. Linear Algebra and Its Applications. [44] C. Phoenix. [43] C. [50] A. CA. ACM Transactions on Mathematical Software. R. A divide and conquer method for the symmetric eigenproblem. 1985. 1999. Linear Algebra and Its Applications. [51] J. SIAM Journal on Scientific and Statistical Computing. 4:197-215. Cheng and N. 8:111-121. J. Cited in [25]. 36:177-195. [41] M. Clint and A. Journal of the Institutefor Mathematics and Applications. Willoughby. [48] J. Cited in [94]. In Proceedings of the 1974 IEEE Conference on Decision and Control. N. Journal of the Institute for Mathematics and Applications. R.Daniel. J. 1983. K.1986. Journal of Computational Physics. Donath. E. Cullum. Cullum and R. A. [42] M. and M. The Approximation Minimization of Functionals. A Lanczos algorithm for computing singular values and vectors of large matrices. Computing eigenvalues of very large symmetric matrices—An implemention of a Lanczos algorithm with no reorthogonalization. Willoughby. Crawford and Y. S. 51:37-48.1983. Cuppen. Cullum and R. pages 505-509. Lanczos Algorithms for Large Symmetric Eigenvalue Computations Volume 1: Theory. [47] J. [52] B. Computer Journal. On the automatic scaling of matrices for Gaussian elimination. Prentice-Hall.1971.1974.

Linear Algebra and Its Applications. Quasi-Newton methods.1977. I. Demmel and K. Demmel and W. [62] J. Demmel. [64] J.1992. 20:599-610. Kagstrom. W. S. 1983. S1AM Journal on Numerical Analysis. 1997. Technical Report CSD-97-971. S. Inexact Newton methods. The rotation of eigenvectors by a perturbation. [59] J. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. 1998. 13:1204-1245. [65] I. Drmac.1982.1970. Davidson. Demmel.1997.1975. Sorensen. Computing the singular value decomposition with high relative accuracy. [63] J. NJ. 88/89:139-186. Kahan. [57] J. Linear Algebra and Its Applications. 18:1200-1222. E. A new O(n2) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. K. [54] C. R. 7:1-46. [60] J. A fully parallel algorithm for the symmetric eigenvalue problem. Linear Algebra and Its Applications. III. Dennis and R. SIAM Journal on Numerical Analysis. [66] J. Dhillon. Veselic. SIAM Journal on Matrix Analysis and Applications. S. J. M. Computer Science Division.1986. More. W. SIAM. 79:163-193. SIAM Journal on Scientific and Statistical Computing. [56] J. C. motivations and theory. SIAM Journal on Scientific and Statistical Computing. Kahan.1990. University of California at Berkeley. Prentice-Hall. Demmel and B. SIAM Review. B. SIAM Journal on Scientific Computing. [61] J. Dongarra and D.1987. Implementation of Jacobi rotations for accurate singular value computation in floating point arithmetic. J. W. Computing stable eigendecompositions of matrix pencils. The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices. Englewood Cliffs. Davis and W. [55] R. 1983. 19:400-408. and T. 17:87-94. Dennis and J. Veselic. Demmel. Jacobi's method is more accurate than QR. Dembo. Eisenstat. Applied Numerical Linear Algebra. S. SI AM Journal on Numerical Analysis. [67] Z. Demmel. Gu. Accurate singular values of bidiagonal matrices. 8:sl39-sl54.1987. C. Steihaug. The condition number of equivalence transformations that block diagonalize matrix pencils. Journal of Computational Physics. [58] J. Eisenstat. Philadelphia. Computing stable eigendecompositions of matrices. 19:46-89.REFERENCES 433 [53] E. PA. Schnabel. Slapnicar. Drmac. E. 299:21-80. 11:873-912.1997. and Z. Vol II: Eigensystems .

pages 95-120. Nachtigal. Gutknecht. The Theory of Matrices. Eisner. Freund. 43:135-148. Large Scale Eigenvalue Problems. The spectral transformation Lanczos method for the numerical solution of large sparse generalized symmetric eigenvalue problems. 20:94-125. L. . North-Holland. H. E. 4:265-271. On the variation of the spectra of matrices. 35:1251-1268. R. [70] L. Interdisciplinary Project Center for Supercomputing. Forsythe and P. R. [75] G. editors. V. Willoughby. Henrici.434 REFERENCES [68] I. [80] F. Sleijpen. Computing selected eigenvalues of large sparse unsymmetric matrices using subspace iteration. SIAM Journal on Scientific Computing. Ericsson and A. Amsterdam. Mathematics of Computation. 11:77-80. [73] K.1960.1980. Journal of Computational and Applied Mathematics. and N. 19:137-159. Jacobi-Davidson style QR and QZ algorithms for the reduction of matrix pencils. Transaction of the American Mathematical Society. The cyclic Jacobi method for computing the principle values of a complex matrix. van der Vorst. G.1961. In J. An optimal bound for the spectral variation of two matrices. 1962. 1959. ACM Transactions on Mathematical Software. Linear Algebra and Its Applications. 1991. G.1998. A. New York. [71] T. S. An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices. Francis. 332-345. Scott. Numerische Mathematik. Gantmacher. [74] D. Fokkema. 94:1-23. Fernando and B. Cullum and R. Ericsson. and H. Freund.1985. Toumazou. [72] T. Accurate singular values and differential qd algorithms. M. A. Numerical Linear Algebra with Applications.1998. Ruhe. F. G.1982.1994. N. 1992. 1986. Vols. A note on the normwise perturbation theory for the regular generalized eigenproblem Ax = \Bx. Duff and J. M. [79] Roland W. parts I and n. Research Report 91-06. Linear Algebra and Its Applications. 47': 127-138. Quasi-kernel polynomials and their use in non-Hermitian matrix iterations.11.1993. Eisner. W. [78] R. Chelsea Publishing Company. 67:191-229. Computer Journal. Eidgenossische Technische Hochsulle Zurich. [77] V. A generalized eigenvalue problem and the Lanczos algorithm. Fraysse and V. [76] J. Parlett. [69] L. 1. 5:1-10. The QR transformation.

pages 364-377. [92] G. T. The Jacobi method for real symmetric matrices. A procedure for the diagonalization of normal matrices. Baltimore. Goldstine and L. Wilkinson. [93] G.1965. Johns Hopkins University Press. A block Lanczos method for computing the singular values and corresponding singular vectors of a matrix. H. [89] G. Some modified matrix eigenvalue problems. In Lecture Notes in Computer Science. third edition. Luk. [91] G. H. F. Golub and C. [95] G.1981. and J. pages 134151]. Uber die AbgrenzungderEigenwerteeiner Matrix.1973. Golub and C. Golub. Golub and W. H. [84] H. IEEE Transactions on Automatic Control. von Neumann. Matrix eigensystem routines—EISPACK guide extension. Mathematical Software HI. [94] G. Overton. Johns Hopkins University Press. 1977. [88] G. Garbow. Murray. and matrix approximations. and C. [86] G. B. Singular value decomposition and least squares solution. JournaloftheACM. 1996. F. Reinsch. J. AC24:909-913. TN. [82] S. H. [90] G. Vol II: Eigensystems . and M. Mathematics and Natural Sciences. Stanford University. H. In J. Dongarra. SIAM Journal on Numerical Analysis. MD. H. Oak Ridge National Laboratory. SIAM Review. ACM Transactions on Mathematical Software. S. New York. Springer-Verlag. Van Loan. L. Least squares. [87] G. Givens.1931. 6:749-754. Stanford.1970. 6:59-96. Numerical computation of the characteristic values of a real matrix. P. 6:176-195. JournaloftheACM. Nash. M. singular values. F. 2:205-224. Van Loan. Kahan. Goldstine. editor. and C. J. Cited in [94]. H. The Lanczos algorithm for the symmetric Ax = XBx problem. Horowitz.1959. New York. Hessenberg-Schur method for the problem AX + XB = C. [83] W. F. Academic Press. Baltimore. Calculating the singular values and pseudo-inverse of a matrix.REFERENCES 435 [81] B. Golub. Computer Science. H. NumerischeMathematik. Matrix Computations. H. Van Loan. H. Golub and C. [85] H. Boyle. Golub. Underwood. Aplicace Mathematicky. 7:149-169. 13:44-51. and J.1979. Moler. Matrix Computations. Golub and R. H. MD. Oak Ridge. Technical Report 1574. Golub. 1989. The block Lanczos method for computing eigenvalues. 15:318-344.1968.1959. Golub. A. Underwood. 1972. H. second edition. Also in [305. Rice. Gerschgorin. H. R. Technical Report STAN-CS-72-270. 1977. J. 1954. 14:403-420. CA. J. S. Izvestia AkademiiNauk SSSR.

C. [105] M. C. A divide-and-conquer algorithm for the symmetric tridiagonal eigenvalue problem.1976. Gu and S. H. Eisenstat. Look-ahead procedures for Lanczos-type product methods based on three-term Lanczos recurrences. J. SIAM Journal on Matrix Analysis and Applications. A completed theory of the unsymmetric Lanczos process and related algorithms: Part I. Lawrence Berkeley Laboratory. PhD thesis. H. Ill-conditioned eigensystems and the computation of the Jordan canonical form.1995. [98] R. J.1987. A divide-and-conquer algorithm for the bidiagonal SVD. [104] M. Gu and S.1994. 1981. Physics Division. and H. C. deMath. J. 8:101-186. Hari and K. [106] M.1991. C. Lewis. H. SIAM Journal on Matrix Analysis and Applications.1986. and R. Eisenstat. [108] V. Essai sur 1'etude des fonctions donnees par leur developpement de taylor. 15:15-58. A shifted block Lanczos algorithm for solving sparse symmetric generalized eigenproblems. Hadamard. Grimes. On sharp quadratic convergence bounds for the serial Jacobi methods. 16:79-92. Ressel. Veselic. [109] M. . Grcar. Hari. Berkeley. C. 18:578-619. [103] M. SIAM Journal on Scientific and Statistical Computing.1992. CA. University of Illinois at UrbanaChampaign. On Jacobi methods for singular value decompositions. SIAM Journal on Matrix Analysis and Applications.1994. 1994. [102] M. T. Golub and J. Heath. Gutknecht. J. A completed theory of the unsymmetric Lanczos process and related algorithms: Part II. D. 60:375-407. C.1995. [100] M. [107] V. Numerische Mathematik. Ward. 15:228-272. SIAM Journal on Matrix Analysis and Applications. [99] M. 21:1051-1078. Wilkinson.1994. Gutknecht and K. 13:594-639. 7:1147-1159.2000. (4e seriej. Computing the SVD of a product of two matrices. Gu. Laub. A. Simon. SIAM Review. SIAM Journal on Matrix Analysis and Applications. 15:1266-1276. Journ. G. Stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem. H. Technical Report LBL-35100. Analyses of the Lanczos Algorithm and of the Approximation Problem in Richardson's Method. Finding well-conditioned similarities to block-diagonalized nonsymmetric matrices is NP-hard. SIAM Journal on Matrix Analysis and Applications. Paige.1892. Gu and S. [97] J. SIAM Journal on Matrix Analysis and Applications. Eisenstat. 8:741-754. [101] M. H. SIAM Journal on Scientific and Statistical Computing.436 REFERENCES [96] G. 16:11'2-191. Gutknecht. G.

Relative perturbation bounds for matrix eigenvalues and singular values. A. Henrici. QR factorization with complete pivoting and accurate computation of the SVD. Henrici. Computing. Iterative algorithms for Gram-Schmidt orthogonalization. Leipzig and Berlin. Accuracy and Stability of Numerical Algorithms. Cambridge University Press. Topics in Matrix Analysis. Journal of the Society for Industrial and Applied Mathematics. Horn and C. On certain methods for expanding the characteristic polynomial. Matrix Analysis.1933. 1964.2000. Structured backward error and condition of generalized eigenvalue problems. Cambridge. 1991. R. [112] D. Analysis of a complex of statistical variables into principal components. Cambridge University Press. The accuracy of floating point summation. [120] H.REFERENCES 437 [110] P. Johnson. New York. spectral variation and fields of values of nonnormal matrices. Higham. R. 1985. Higham. A collection of papers that appeared in Gott. [122] A. SIAM. [119] R.1997. Householder and F. Numerische Mathematik. S. 309:153-174.1989. [121] A. 14:783-799. Dover. 1998. S. C. [118] R. Journal of Educational Psychology. J. 1:29-37. [Ill] P. C. Householder. Cambridge University Press. 1996. Ipsen. F. Computing an eigenvector with inverse iteration. 1912. New York. Bounds for iterates. SIAM Journal on Scientific Computing. Cambridge. Higham and N. 1970. Numerische Mathematik. J. Nachr. Hotelling. B. Ipsen. SIAM Journal on Matrix Analysis and Applications. The Theory of Matrices in Numerical Analysis. Cambridge. McGraw-Hill. 20:493-512. [124] I. 4:24-39. SIAM Review. Householder. J. 39:254-291. J. 24:417'-441 and 498-520.1998. Linear Algebra and Its Applications. Grundziige einer allgemeinen Theorie der linearen Integralgleichungen. Bauer. inverses. New York. Horn and C. [114] N. Higham. Hoffman. F. In Acta Numerica. 6:144-162. J.1958. Originally published by Blaisdell. [115] N. [123] A. 41:335-348. S. On the speed of convergence of cyclic and quasicyclic Jacob! methods for computing eigenvalues of Hermitian matrices. 1904-1910. pages 151-201. Philadelphia. [113] N. Teubner. Johnson. [117] W. VolII: Eigensy stems . L. A. Higham. The Numerical Treatment of a Single Nonlinear Equation. Hilbert. G.1958.1962. [116] D. [125] I.1993.

1997. Journal de Mathematiques Pures et Appliquees. Jacobi. Paris. 1999.1980. J. 53:265-270. Zhang. 1985. F. A refined subspace iteration algorithm for large sparse eigenproblems. Technical Report TR-3986.1967. G. Algorithm 560: JNF. Mathematics of Computation. 22:297-306. Jia. [132] Z. posthumous. Transactions on Mathematical Software. Jia. Linear Algebra and Its Applications. [136] C. Proceedings of the Cambridge Philosophical Society. Jiang and Z. Memoire sur les formes bilineaires. [135] E.2001. Jordan. Kagstrom and A. 19:35-54.Jordan. On the convergence of Ritz values. J.1845.1846. College Park. Ritz vectors.3:lQ3-174. 1870. Ruhe.2000. W. 32:35-52. Journalfiir die reine und angewandte Mathematik. W. Jacobi. Jacobi. Linear Algebra and Its Applications. Essai sur la geometric a n dimensions.1874. [138] C. Jia and G. Refined iterative algorithm based on Arnoldi's process for large unsymmetric eigenproblems. G. Jennings. Stewart. 6:437-443.438 REFERENCES [126] LC. [137] C. Ueber eine neue Auflosungsart der bei der Methode der kleinsten Quadrate vorkommenden linearen Gleichungen. an algorithm for numerical computation of the Jordan normal form of a complex matrix. [133] Z. [127] C. Absolute and relative perturbation bounds for invariant subspaces of matrices. A new shift of the QL algorithm for irreducible symmetric tridiagonal matrices. [130] A. Bulletin de la Societe Mathematique.1875. Linear Algebra and Its Applications. 259:1-23. J. [128] C. Jia and G. Ipsen. Journal fur die reine und angewandte Mathematik. A direct iteration method of obtaining latent roots and vectors of a symmetric matrix. 30:51-94. [139] B.1857. . [129] C. Traite des Substitutions et des Equations Algebriques. Department of Computer Science.2000. and refined Ritz vectors. Jordan. [134] Z. Deuxieme Serie. G. Applied Numerical Mathematics. Astronomische Nachrichten. Uber ein leichtes Verfahren die in der Theorie der Sacularstorungen vorkommenden Gleichungen numerisch aufzulosen. [131] Z. University of Maryland. 309:45-56. 65:261-272. 63:755-765. Uber eine elementare Transformation eines in Buzug jedes von zwei Variablen-Systemen linearen und homogenen Ausdrucks. Cited in [167]. Stewart. 70:637-647. An analysis of the Rayleigh-Ritz method for approximating eigenspaces.

REFERENCES

439

[140] B. Kagstrom and A. Ruhe. An algorithm for numerical computation of the Jordan normal form of a complex matrix. Transactions on Mathematical Software, 6:398-419,1980. [141] B. Kagstrom and P Wiberg. Extracting partial canonical structure for large scale eigenvalue problems. Numerical Algorithms, 24:195-237,1999. [142] W. Kahan. Inclusion theorems for clusters of eigenvalues of Hermitian matrices. Technical report, Computer Science Department, University of Toronto, 1967. [143] W. Kahan and B. N. Parlett. How far should you go with the Lanczos process? In J. Bunch andD. Rose, editors, Sparse Matrix Computations, pages 131-144. Academic Press, New York, 1976. [144] W. Kahan, B. N. Parlett, and E. Jiang. Residual bounds on approximate eigensystems of nonnormal matrices. SIAM Journal on Numerical Analysis, 19:470484,1982. [145] S. Kaniel. Estimates for some computational techniques in linear algebra. Mathematics of Computation, 20:369-378,1966. [146] T. Kato. Perturbation Theory for Linear Operators. Springer-Verlag, New York, 1966. [147] M. Kline. Mathematical Thought from Ancient to Modern Times. Oxford University Press, New York, 1972. [148] A. V. Knyazev. New estimates for Ritz vectors. Mathematics of Computation, 66:985-995,1997. [149] E. G. Kogbetliantz. Solution of linear systems by diagonalization of coefficients matrix. Quarterly of Applied Mathematics, 13:123-132,1955. [150] N. Kollerstrom. Thomas Simpson and "Newtons method of approximation": An enduring myth. British Journal for the History of Science, 25:347—354, 1992. [151] J. Konig. tiber eine Eigenschaft der Potenzreihen. Mathematische Annalen, 23:447^49,1884. [152] L. Kronecker. Algebraische Reduction der Schaaren bilinearer Formen. Sitzungberichte der Koniglich Preuftischen Akademie der Wissenschaften zu Berlin, pages 1225-1237,1890. [153] A. N. Krylov. O Cislennom resenii uravnenija, kotorym v techniceskih voprasah opredeljajutsja castoy malyh kolebanil materiaFnyh. Izv. Adad. Nauk SSSR old. Mat. Estest., pages 491-539,1931. Cited in [121].

Vol II: Eigensystems

440

REFERENCES

[154] V. N. Kublanovskaya. On some algorithms for the solution of the complete eigenvalue problem. Zhurnal VychislitelnoiMatematiki i Matematicheskoi Fiziki, 1:555-570,1961. In Russian. Translation in USSR Computational Mathematics and Mathematical Physics, 1:637-657,1962. [155] J.-L. Lagrange. Solution de differents problems de calcul integral. Miscellanea Taurinensia, 3,1762-1765. Cited and reprinted in [156, v. 1, pages 472-668]. [156] J.-L. Lagrange. (Euvres de Langrange. Gauthier-Villars, Paris, 1867-1892. [157] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards, 45:255-282,1950. [158] R. Lehoucq and D. C. Sorensen. Deflation techniques for an implicitly restarted Arnoldi iteration. SIAM Journal on Matrix Analysis and Applications, 17:789821, 1996. [159] R. B. Lehoucq. The computation of elementary unitary matrices. ACM Transactions on Mathematical Software, 22:393-400,1996. [160] R. B. Lehoucq. On the convergence of an implicitly restarted Arnoldi method. Technical Report SAND99-1756J, Sandia National Laboratories, Albuquerque, NM, 1999. [161] R. B. Lehoucq and J. A. Scott. An evaluation of software for computing eigenvalues of sparse nonsymmetric matrices. Preprint MCS-P547-1195, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, 1996. [162] R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK Users' Guide: Solution of Large Scale Eigenvalue Problems by Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia, 1998. [163] C.-K. Li and R. Mathias. The definite generalized eigenvalue problem: A reworked perturbation theory. Department of Mathematics, College of William & Mary, 2000. [164] R.-C. Li. Solving secular equations stably and efficiently. Technical Report UCB//CSD-94-851, Computer Science Division, University of California, Berkeley, 1993. [165] R-C. Li. Relative pertubation theory: I. Eigenvalue and singular value variations. SIAM Journal on Matrix Analysis and Applications, 19:956-982,1998. [ 166] R-C. Li. Relative pertubation theory: II. Eigenspace and singular subspace variations. SIAM Journal on Matrix Analysis and Applications, 20:471-492,1998. [167] C. C. Mac Duffee. The Theory of Matrices. Chelsea, New York, 1946.

REFERENCES

441

[168] M. Marcus and H. Mine. A Survey of Matrix Theory and Matrix Inequalities. Allyn and Bacon, Boston, 1964. [169] R. S. Martin, C. Reinsch, and J. H. Wilkinson. Householder tridiagonalization of a real symmetric matrix. Numerische Mathematik, 11:181-195,1968. Also in [305, pages 212-226]. [170] R. S. Martin and J. H. Wilkinson. Reduction of the symmetric eigenproblem Ax — \Bx and related problems to standard form. Numerische Mathematik, 11:99-110,1968. Also in [305, pages 303-314]. [171] W. F. Mascarenhas. On the convergence of the Jacobi method for arbitrary orderings. SI AM Journal on Matrix Analysis and Applications, 16:1197-1209, 1995. [172] R. Mathias. Accurate eigensystem computations by Jacobi methods. SIAM Journal on Matrix Analysis and Applications, 16:977-1003,1995. [173] R. Mathias. Spectral perturbation bounds for positive definite matrices. SIAM Journal on Matrix Analysis and Applications, 18:959-980,1997. [174] R. Mathias. Quadratic residual bounds for the Hermitian eigenvalue problem. SIAM Journal on Matrix Analysis and Applications, 18:541-550,1998. [175] H. I. Medley and R. S. Varga. On smallest isolated Gerschgorin disks for eigenvalues. III. Numerische Mathematik, 11:361-369,1968. [176] K. Meerbergen and A. Spence. Implicitly restarted Arnoldi with purification for the shift-invert transformation. Mathematics of Computation, 66:667-689, 1997. [177] D. S. Mitrinovic. Analytic Inequalities. Springer-Verlag, New York, 1970. [178] C. B. Moler and G. W. Stewart. An algorithm for generalized matrix eigenvalue problems. SIAM Journal on Numerical Analysis, 10:241-256,1973. [179] R. B. Morgan. Computing interior eigenvalues of large matrices. Linear Algebra and Its Applications, 154-156:289-309,1991. [180] R. B. Morgan. On restarting the Arnoldi method for large nonsymmetric eigenvalue problems. Mathematics of Computation, 65:1213-1230,1996. [181] R. B. Morgan. Implicitly restarted GMRES and Arnoldi methods for nonsymmetric systems of equations. SIAM Journal on Matrix Analysis and Applications, 21:1112-1135,2000. [182] C. L. Miintz. Solution direct de 1'equation seculaire et de quelques problemes analogues transcendents. Comptes Rendus de I'Academie des Sciences, Paris, 156:43^6,1913. VolII: Eigensy stems

442

REFERENCES

[183] M. Newman and P. F. Flanagan. Eigenvalue extraction in NASTRAN by the tridiagonal reduction (PEER) method—Real eigenvalue analysis. NASA Contractor Report CR-2731, NASA, Washington DC, 1976. Cited in [236]. [184] B. Nour-Omid, B. N. Parlett, T. Ericsson, and P. S. Jensen. How to implement the spectral transformation. Mathematics of Computation, 48:663-67'3,1897. [185] W. Oettli and W. Prager. Compatibility of approximate solution of linear equations with given error bounds for coefficients and right-hand sides. Numerische Mathematik, 6:405-409,1964. [186] J. Olsen, P. J0rgensen, and J. Simons. Passing the one-billion limit in full configuration-interaction (FCI) calculations. Chemical Physics Letters, 169:463^72,1990. [187] E. E. Osborne. On pre-conditioning of matrices. Journalof the ACM, 7:338345,1960. [188] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. I. Archive for Rational Mechanics and Analysis, 1:233-241,1958. [189] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors, n. Archive for Rational Mechanics and Analysis, 2:423-428,1958. [190] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. HI (generalized Rayleigh quotient and characteristic roots with linear elementary divisors). Archive for Rational Mechanics and Analysis, 3:325-340,1959. [191] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. IV (generalized Rayleigh quotient for nonlinear elementary divisors). Archive for Rational Mechanics and Analysis, 3:341-347,1959. [192] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. VI (usual Rayleigh quotient for nonlinear elementary divisors). Archive for Rational Mechanics and Analysis, 4:153-165,1959. [193] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. V (usual Rayleigh quotient for non-Hermitian matrices and linear elementary divisors). Archive for Rational Mechanics and Analysis, 3:472-481,1959. [194] A.M. Ostrowski. Solution ofEquations and Systems ofEquations. Academic Press. New York. 1960.

REFERENCES

443

[195] C. C. Paige. Practical use of the symmetric Lanczos process with reorthogonalization. BIT, 10:183-195,1970. [196] C. C. Paige. The Computation of Eigenvalues and Eigenvectors of Very Large Sparse Matrices. PhD thesis, University of London, 1971. [197] C. C. Paige. Computational variants of the Lanczos method for the eigenproblem. Journal of the Institute of Mathematics and Its Applications, 10:373—381, 1972. [198] C. C. Paige. Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix. Journal of the Institute of Mathematics and Its Applications, 18:341-349,1976. Cited in [25]. [199] C. C. Paige. Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem. Linear Algebra and Its Applications, 34:235-258,1980. Reprinted in [27]. [200] C. C. Paige. Computing the generalized singular value decomposition. SIAM Journal on Scientific and Statistical Computing, 7:1126-1146,1986. [201] C. C. Paige, B. N. Parlett, and H. van der Vorst. Approximate solutions and eigenvalue bounds from Krylov subspaces. Numerical Linear Algebra with Applications, 2:115-134,1995. [202] C. C. Paige and M. A. Saunders. Toward a generalized singular value decomposition. SIAM Journal on Numerical Analysis, 18:398-405,1981. [203] C. C. Paige and P. Van Dooren. On the quadratic convergence of Kogbetliantz's algorithm for computing the singular value decomposition. Linear Algebra and Its Applications, 77:301-313,1986. [204] B. N. Parlett. The origin and development of methods of the LR type. SIAM Review, 6:275-295,1964. [205] B. N. Parlett. A new look at the Lanczos algorithm for solving symmetric systems and linear equations. Linear Algebra and Its Applications, 29:323-346, 1980. [206] B. N. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall, Englewood Cliffs, NJ, 1980. Reissued with revisions by SIAM, Philadelphia, 1998. [207] B. N. Parlett and J. Le. Forward instability of tridiagonal qr. SIAM Journal on Matrix Analysis and Applications, 14:279-316,1993. [208] B. N. Parlett and C. Reinsch. Balancing a matrix for calculation of eigenvalues and eigenvectors. Numerische Mathematik, 13:293-304,1969. Also in [305, pages 315-326].

Vol IT. Eigensystems

444

REFERENCES

[209] B. N. Parlett and D. S. Scott. The Lanczos algorithm with selective orthogonalization. Mathematics of Computation, 33:217-238,1979. [210] B. N. Parlett, D. R. Taylor, and Z. A. Liu. A look-ahead Lanczos algorithm for unsymmetric matrices. Mathematics of Computation, 44:105-124,1985. [211] G. Peters and J. H. Wilkinson. Inverse iteration, ill-conditioned equations, and Newton's method. SIAM Review, 21:339-360,1979. Cited in [94]. [212] D. A. Pope and C. Tompkins. Maximizing functions of rotations: Experiments concerning speed of diagonalization of symmetric matrices using Jacobi's method. JournaloftheACM, 4:459-466,1957. [213] D. Porter and D. S. G. Stirling. Integral Equations. Cambridge University Press, Cambridge, 1990. [214] M. J. D. Powell. A new algorithm for unconstrained optimization. In J. B. Rosen, O. L. Mangasarian, and K. Ritter, editors, Nonlinear Programming, pages 31-65. Academic Press, New York, 1970. Cited in [64, page 196]. [215] Lord Rayleigh (J. W. Strutt). On the calculation of the frequency of vibration of a system in its gravest mode, with an example from hydrodynamics. The Philosophical Magazine, 47:556-572,1899. Cited in [206]. [216] J. L. Rigal and J. Gaches. On the compatibility of a given solution with the data of a linear system. JournaloftheACM, 14:543-548,1967. [217] W. Ritz. liber eine neue Method zur Losung gewisser Variationsprobleme der mathematischen Physik. Journal fiir die reine und angewandte Mathematik, 135:1-61,1909. [218] A. Ruhe. Perturbation bounds for means of eigenvalues and invariant subspaces. BIT, 10:343-354,1970. [219] A. Ruhe. Implementation aspects of band Lanczos algorithms for computation of eigenvalues of large sparse symmetric matrices. Mathematics of Computation, 33:680-687,1979. [220] A. Ruhe. Rational Krylov algorithms for nonsymmetric eigenvalue problems. H. Matrix pairs. Linear Algebra and Its Applications, 197-198:283-295,1994. [221] A. Ruhe. Rational Krylov subspace method. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 246-249. SIAM, Philadelphia, 2000. [222] H. Rutishauser. Anwendungen Des Quotienten-Differenzen-Algorithmus. Zeitschrift filr angewandte Mathematik und Physik, 5:496-508,1954. [223] H. Rutishauser. Der Quotienten-Differenzen-Algorithmus. Zeitschrift fiir angewandte Mathematik und Physik, 5:233-251,1954.

REFERENCES

445

[224] H. Rutishauser. Une methode pour la determination des valeurs propres d'une matrice. Comptes Rendus de VAcademic des Sciences, Paris, 240:34-36,1955. [225] H. Rutishauser. Solution of eigenvalue problems with the LR-transformation. National Bureau of Standards Applied Mathematics Series, 49:47-81,1958. [226] H. Rutishauser. The Jacobi method for real symmetric matrices. Numerische Mathematik, 9:1-10,1966. Also in [305, pages 202-211]. [227] H. Rutishauser. Computational aspects of F. L. Bauer's simultaneous iteration method. Numerische Mathematik, 13:4-13,1969. [228] H. Rutishauser. Simultaneous iteration method for symmetric matrices. Numerische Mathematik, 16:205-223,1970. Also in [305, pages 284-301]. [229] Y. Saad. On the rates of convergence of the Lanczos and the block Lanczos methods. SIAM Journal on Numerical Analysis, 17:687-706,1980. [230] Y. Saad. Variations of Arnoldi's method for computing eigenelements of large unsymmetric matrices. Linear Algebra and Its Applications, 34:269-295,1980. [231] Y. Saad. Projection methods for solving large sparse eigenvalue problems. In B. Kagstrom and A. Ruhe, editors, Matrix Pencils, pages 121-144. SpringerVerlag, New York, 1983. [232] Y. Saad. Chebyshev acceleration techniques for solving nonsymmetric eigenvalue problems. Mathematics of Computation, 42:567'-588,1984. [233] Y. Saad. Numerical Methods for Large Eigenvalue Problems: Theory and Algorithms. John Wiley, New York, 1992. [234] J. Schur. Uber die charakteristischen Wiirzeln einer linearen Substitution mit einer Anwendung auf die Theorie der Integralgleichungen. MathematischeAnnalen, 66:448-510,1909. [235] H. R. Schwarz. Tridiagonalization of a symmetric band matrix. Numerische Mathematik, 12:231-241,1968. Also in [305, pages 273-283]. [236] D. S. Scott. The advantages of inverted operators in Rayleigh-Ritz approximations. SIAM Journal on Scientific and Statistical Computing, 3:68-75,1982. [237] H. D. Simon. Analysis of the symmetric Lanczos algorithm with reorthogonalization methods. Linear Algebra and Its Applications, 61:101-132,1984. [238] H. D. Simon. The Lanczos algorithm with partial reorthogonalization. Mathematics ofComputation, 42:115-142,1984. [239] G. L. G. Sleijpen, A. G. L. Booten, D. R. Fokkema, and Henk van der Vorst. Jacobi-Davidson type methods for generalized eigenproblems and polynomial eigenproblems. BIT, 36:595-633,1996. Vol II: Eigensystems

446

REFERENCES

[240] G. L. G. Sleijpen and H. A. van der Vorst. A Jacobi-Davidson iteration method for linear eigenvalue problems. SIAM Journal on Matrix Analysis and Applications, 17:401^25,1996. [241] G. L. G. Sleijpen and H. A. van der Vorst. Generalized Hermitian eigenvalue problems: Jacobi-Davidson methods. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 123-127. SIAM, Philadelphia, 2000. [242] G. L. G. Sleijpen and H. A. van der Vorst. Generalized non-Hermitian eigenvalue problems: Jacobi-Davidson method. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 238-246. SIAM, Philadelphia, 2000. [243] G. L. G. Sleijpen and H. A. van der Vorst. The Hermitian eigenvalue problem: Jacobi-Davidson methods. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 88-105. SIAM, Philadelphia, 2000. [244] G. L. G. Sleijpen and H. A. van der Vorst. Non-Hermitian eigenvalue problems: Jacobi-Davidson methods. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 221-228. SIAM, Philadelphia, 2000. [245] G. L. G. Sleijpen, H. A. van der Vorst, and E. Meijerink. Efficient expansion of subspaces in the Jacobi-Davidson method for standard and generalized eigenproblems. Electronic Transactions on Numerical Analysis, 7:75-89,1998. [246] D. C. Sorensen. Implicit application of polynomial filters in a fc-step Arnoldi method. SIAM Journal on Matrix Analysis and Applications, 13:357-385, 1992. [247] D. C. Sorensen and P. T. P. Tang. On the orthogonality of eigenvectors computed by divide and conquer techniques. SIAM Journal on Numerical Analysis, 28:1752-1775,1991. [248] A. Stathopoulos, Y. Saad, and Kesheng Wu. Dynamic thick restarting of the Davidson and the implicitly restarted Arnoldi algorithms. SIAM Journal on Scientific Computing, 19:227-245,1998. [249] G. W. Stewart. Accelerating the orthogonal iteration for the eigenvalues of a Hermitian matrix. Numerische Mathematik, 13:362-376,1969. [250] G. W. Stewart. Error bounds for approximate invariant subspaces of closed linear operators. SIAM Journal on Numerical Analysis, 8:796-808,1971. [251] G. W. Stewart. On the sensitivity of the eigenvalue problem Ax = \Bx. SIAM Journal on Numerical Analysis, 4:669-686,1972.

REFERENCES

447

[252] G. W. Stewart. Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM Review, 15:727-764,1973. [253] G. W. Stewart. Introduction to Matrix Computations. Academic Press, New York, 1973. [254] G. W. Stewart. Modifying pivot elements in Gaussian elimination. Mathematics of Computation, 28:537-542,1974. [255] G. W. Stewart. Gerschgorin theory for the generalized eigenvalue problem Ax = XBx. Mathematics of Computation, 29:600-606,1975. [256] G. W. Stewart. Algorithm 407: HQR3 and EXCHNG: FORTRAN programs for calculating the eigenvalues of a real upper Hessenberg matrix in a prescribed order. ACM Transactions on Mathematical Software, 2:275-280,1976. [257] G. W. Stewart. Simultaneous iteration for computing invariant subspaces of a non-Hermitian matrix. Numerische Mathematik, 25:123-136,1976. [258] G. W. Stewart. On the perturbation of pseudo-inverses, projections, and linear least squares problems. SIAM Review, 19:634-662,1977. [259] G. W. Stewart. Computing the CS decomposition of a partitioned orthogonal matrix. Numerische Mathematik, 40:297-306,1982. [260] G. W. Stewart. A method for computing the generalized singular value decomposition. In B. Kagstrom and A. Ruhe, editors, Matrix Pencils, pages 207-220. Springer-Verlag, New York, 1983. [261] G. W. Stewart. Two simple residual bounds for the eigenvalues of Hermitian matrices. SIAM Journal on Matrix Analysis and Applications, 12:205-208, 1991. [262] G. W. Stewart. Error analysis of QR updating with exponential windowing. Mathematics of Computation, 59:135-140,1992. [263] G. W. Stewart. On the early history of the singular value decomposition. SIAM Review, 35:551-566,1993. [264] G. W. Stewart. Afternotes Goes to Graduate School. SIAM, Philadelphia, 1998. [265] G. W. Stewart. Matrix Algorithms I: Basic Decompositions. SIAM, Philadelphia, 1998. [266] G. W. Stewart. The decompositional approach to matrix computations. Computing in Science and Engineering, 2:50-59,2000. [267] G. W. Stewart. A Krylov-Schur algorithm for large eigenproblems. Technical Report TR-4127, Department of Computer Science, University of Maryland, College Park, 2000. To appear in the SIAM Journal on Matrix Analysis and Applications. Vol II: Eigensystems

448

REFERENCES

[268] G. W. Stewart. A generalization of Saad's theorem on Rayleigh-Ritz approximations. Linear Algebra and Its Applications, 327:115-120,2001. [269] G. W. Stewart and J.-G. Sun. Matrix Perturbation Theory. Academic Press, New York, 1990. [270] W. J. Stewart and A. Jennings. Algorithm 570: LOPSI a simultaneous iteration method for real matrices. ACM Transactions on Mathematical Software, 7:230232,1981. [271] W. J. Stewart and A. Jennings. A simultaneous iteration algorithm for real matrices. ACM Transactions on Mathematical Software, 7:184-198,1981. [272] J. J. Sylvester. On a theory of the syzygetic relations etc. Phil. Trans., pages 481-84,1853. Cited in [28]. [273] J. J. Sylvester. Sur 1'equation en matrices px = xq. Comptes Rendus de I'Academie des Sciences, pages 67-71 and 115-116,1884. [274] O. Taussky. A recurring theorem on determinants. American Mathematical Monthly, 46:672-675,1949. [275] D. R. Taylor. Analysis of the Look Ahead Lanczos Algorithm. PhD thesis, Department of Mathematics, University of California, Berkeley, 1982. Cited in [79]. [276] L. N. Trefethen. Computation of pseudospectra. In ActaNumerica, pages 247295. Cambridge University Press, Cambridge, 1999. [277] L. N. Trefethen. Spectral Methods in MATLAB. SIAM, Philadelphia, 2000. [278] L. N. Trefethen and D. Bau, HI. Numerical Linear Algebra. SIAM, Philadelphia, 1997. [279] L. N. Trefethern. Pseudospectra of linear operators. SIAM Review, 39:383^06, 1997. [280] R. Underwood. An iterative block Lanczos method for the solution of large sparse symmetric eigenproblems. Technical Report STAN-CS-75-496, Computer Science, Stanford University, Stanford, CA, 1975. PhD thesis. [281] P. Van Dooren. The computation of Kronecker's canonical form of a singular pencil. Linear Algebra and Its Applications, 27:103-140,1979. [282] J. M. Van Kats and H. A. van der Vorst. Automatic monitoring of Lanczos schemes for symmetric or skew-symmetric generalized eigenvalue problems. Technical Report TR 7, Academisc Computer Centre Utrecht, Budapestlaan 6, De Vithof-Utrecht, the Netherlands, 1977. Cited in [236]. [283] C. F. Van Loan. Generalizing the singular value decomposition. SIAM Journal on Numerical Analysis, 13:76-83,1976.

[285] R. [286] R. Linear Algebra and Its Applications. 12:835-853. [298] J. Bericht B 44/J/37. [297] J. Wilkinson.1985.REFERENCES 449 [284] C. [295] J.1959. Wilkinson. C. 22:364-375. Beitrage zur mathematischen Behandlung komplexer Eigenwertprobleme V: Bestimmung hoherer Eigenwerte durch gebrochene Iteration. C. Aerodynamische Versuchsanstalt Gottingen. SIAM Journal on Matrix Analysis and Applications.1995. 1944. Linear Algebra and Its Applications.1960. S. John Wiley & Sons. SIAM Journal on Scientific and Statistical Computing. Germany. [296] J. Wilkinson. Monatshefte Akadamie Wissenshaften Berlin. [293] K. 4:296-300. Wilkinson. 143:19-47. Ward. Computer Journal. Fundamentals of Matrix Computations.1975. Mathematical Tables and Other Aids to Computation. The combination shift QZ algorithm. 2:141-152. S. Computer Journal. S. Numerische Mathematik. [287] R. The use of iterative methods for finding the latent roots and vectors of matrices. Numerische Mathematik. New York. Note on the quadratic convergence of the cyclic Jacobi process. SIAM Journal on Numerical Analysis. [294] H. On smallest isolated Gerschgorin disks for eigenvalues. The calculation of eigenvectors by the method of Lanczos. 6:366-376. H. 3:23-27.1981.1964. H. [291] D. [290] D. Numerische Mathematik. [289] D.1962. Watkins. Balancing the generalized eigenvalue problem. Eisner. 1991. Householder's method for the solution of the algebraic eigenvalue problem.1955. Varga. S.1996. Weierstrass. 241-243:877-896. F. Convergence of algorithms of decomposition type for the eigenvalue problem. [288] D. Performance of the QZ algorithm in the presence of infinite eigenvalues. Forward stability and transmission of shifts in the QR algorithm. Watkins. SIAM Journal on Matrix Analysis and Applications. pages 310-338. Zur Theorie der bilinearen und quadratischen Formen. Van Loan. Watkins and L. Ward. 9:184-191.1868. 1:148-152. 46:479-491. 16:469—487. [292] D. Wielandt. H. Vol II: Eigensystems . Watkins. 2000. 1991. On the transmission of shifts and shift blurring in the QR algorithm. Watkins. S. S. H. Computing the CS and the generalized singular value decompositions.

Note on matrices with a very ill-conditioned eigenproblem. 1971. 1965. Vol. Rounding Errors in Algebraic Processes. Historical development of the Newton-Raphson method. Thick-restart Lanczos method for large symmetric eigenvalue problems. .1979. Oxford. Wilkinson. and related algorithms. Kronecker's canonical form and the QZ algorithm. Wilkinson. Linear Algebra and Its Applications. H. [300] J. H. J. Wilkinson and C. Clarendon Press. Wilkinson.450 REFERENCES [299] J. Wilkinson. [305] J.1972. 19:176-178. Ypma. NJ. [303] J. Linear Algebra and Its Applications. Computer Journal. [304] J. 1963. 8:77-84. 1:409—420. Englewood Cliffs. Handbook for Automatic Computation. 22:602616. Convergence of the LR.2000. Reinsch. Numerische Mathematik. H. 28:285-303. 37:531-551. Wu and H. Springer-Verlag. Prentice-Hall.1968. The Algebraic Eigenvalue Problem. [307] T. Simon. SIAM Journal on Matrix Analysis and Applications. Wilkinson. H. H Linear Algebra. H. [301] J.1995. SIAM Review. New York. [306] K. Wilkinson. QR. H.1965. Global convergence of tridiagonal QR algorithm with origin shifts. H. [302] J.

396-397 diagonally dominant A. 403 drawbacks. 300 QR factorization of Krylov sequence. see eigenvalue angle. 402 convergence rates. see Arnoldi decomposition Arnoldi method. 336 deflation. 422 A~T (transpose inverse of A). 339-340 . 341-342 shift-and-invert enhancement. 402 natural approximation. 301 reduced. 302. 336 hat convention. 400-402 and inexact Newton method. 404-405 equivalence of correction formula and equation.INDEX Underlined page numbers indicate a defining entry. 336 convergence criteria. A < B. 422 AT (transpose of A). 344 Arnoldi process. 344 comparison of implicit and Krylov-Schur restarting. 300 Arnoldi factorization.347 advantages. 403 451 local convergence. 422 Abel. 299. 332 computation of residuals. 76 Aitken.299. 297. A. 397. 300 uniqueness.110 A < B. see Arnoldi method computation of residuals.4). 55 absolute norm. 117. C. 337 in Arnoldi decomposition. 315 computation. 340 in Krylov-Schur method. see norm \A\ (absolute value of A). 423 algebraic multiplicity. H. 340 nonorthogonal bases. 315 alternate form. 396 analysis. 422 A~H (conjugate transpose inverse of 4). 334-336. 1-norm. 397 derivation. 423 accumulation of transformations.303-304 block. 403 orthogonal correction. 399 Arnoldi decomposition. 402 inner and outer iterations. 302 termination of Krylov sequence. 301 Arnoldi factorization. Only authors mentioned explicitly in the text are indexed.. see norm A (conjugate of A). 422 Au (conjugate transpose of . The abbreviation me indicates that there is more information at the main entry for this item. 403-404 solution. 422 A~l (inverse of A). N. 300 uniqueness of starting vector. 399 nomenclature. 419 constant shift approximation. 398 correction formula. 398 error recurrence. see norm 2-norm. 298 Rayleigh quotient. 69. 337-340. Slanted page numbers indicate an entry in a notes and references section.420 inexact solution.347 choice of tolerance. see canonical angle approximate Newton method. (componentwise comparison). etc. 403 correction equation.

69 backward stability and complex eigenvalues of real matrices. 325. 117 operation count. 320-323 overview. 318 the cycle. 320 Krylov-Schur restarting. 347 stagnation of Krylov subspace. 10 as paradigm. 346 exchanging eigenblocks. 372-373 economizing B products. 371 B-Arnoldi method. 373 orthonormal matrix.369 in convergence testing. W.170 implicit tridiagonal QR step. 334. 201 B inner product. 217 reduction to Hessenberg-triangular form. 336 stability. 375 restarted.265. 230. 334. 305 convergence criteria. 371 INDEX orthogonal matrix. 70.346-347 band tridiagonalization.253.372 orthogonalization. 375 backward error and unitary similarities. 322 truncation index. 332-333. 372-373 maintaining orthogonality. 338-339 equivalence of implicit and Kry lov-Schur restarting. 371 Cauchy inequality. 304 reorthogonalization. 345 also see filter polynomial implicit restarting. 318-322 double shift method.196. 347 storage requirements. me operation count. 61.. 192-193 Lanczos algorithm with full orthogonalization. 344-345 rationale. 373-374 restarting. 3J6. 337-338 stability. 328-329 loss of orthogonality. 329 the cycle. 183 double shift QR algorithm. 318. 341 arrowhead matrix. implicit restarting. 372 symmetry. 189 bidiagonal QR step. 351 QZ algorithm. 179 . 147 spectral decomposition updating. 338-339 divide-and-conquer algorithm for the spectral decomposition. 323 forward instability. 332-333. 345 convergence criteria.316 stability. Arnoldi method. 152 reduction to bidiagonal form. 371 norm.452 residual norm. 94 Householder's reduction to tridiagonal form. 10 inverse power method. 324-325 example. 304-305. 374-375 periodic reorthogonalization. 70 convergence criteria.E. Krylov-Schur restarting shift-and-invert enhancement.316 Arnoldi. 320 exact (Raylei gh quotient) shifts. 374 Ritz pairs. 317. 316 also see Arnoldi method. 303 restarted. 334-335. 102 Hessenberg QR algorithm. 373 deflation. 373 residual norms.344 ARPACK. 126 Arnoldi method. 325-326. 331 filter polynomial. 162. 126 eigenvector computation. 220 deflation in Arnoldi method. 346-347 shift-and-invert enhancement. 345 contraction phase. 167 inertia of a tridiagonal matrix. 374 orthogonalization. 315. 326-328. 62 from residual. 325. 373 5-Lanczos method. 346 operation count. 372 orthogonality. 305-306.

217 relative stability of singular values. 394 column space (K(X))..E. 220 shift computation. 395 balancing.264 of a combination of subspaces. 255 Braman. E. 129 453 C (complex numbers). 249 computation.. 421 Cn (complex n-space).315. 23 Beltrami.R. R. 234 updating. 315 matrix of order two. T.227-228 QR step. 219-220 operation count. 249-250 Cauchy.227-228 combined with QR decomposition. 367 bidiagonal matrix. 24 characteristic polynomial.426 pivoted. 44 computation. 152-153..129. 226 Bhatia. 55 and terminating Krylov sequences. 217 operation count. 224 detecting negligible superdiagonal elements.. 223-224. 421 C m x n (space of complex mxn matrices). 388 Wilkinson's contribution. 201 Vol II: Eigensystems Byers.. 226.227 first column of the transformation. 280 as filter polynomial. 220 stability. 87 subspace iteration. 231. E. 279 in Lanczos's method.. F. 250-251 of a vector and a subspace.. 215 complex bidiagonal matrix to real. 108 Bartels. 217. 7 and diagonalization. 222-223 zero shift. 367 look-ahead recurrence. 421 canonical angle. 249. 110 matrix pencil. Z. 23 Cayley.. see Krylov subspace block triangularization of nearly block triangular matrix. 23 Chan. 24 and companion matrix. 394 Bellman. 8-9 condition . 224. K. 264 block Krylov subspace. 171 chordal distance.. R. L. 55 complete system of eigenvectors. 162 companion matrix.. J. R.. 317 in complex plane. R. 157 Cholesky decomposition. 235 characteristic equation. 129 Bunch. 226 pivoting. 217 reduction to. M. 217. H. 24 Bau. 248. 424 column.223 bidiagonal QR algorithm. S. 23 Chatelin. 283.and row-oriented algorithms. D.228 operation count.112 and grading. 4. 27J_. 217 stability. 228 deflation.52 bi-Lanczos algorithm.155 Clint. 346. 23. A. 250 subspaces of unequal dimensions. 215-217. 102. A. 107-108. 245 Bjorck. 392 Cholesky algorithm. 4.279 Chebyshev polynomial. A. 23 Bauer. 4-5 characteristic value. 251 between two vectors. 227 biorthogonal bases..23. 228 Chandrasekaran.227 graded matrices.156 operation count.INDEX stability in the ususal sense. 49 characterization of largest angle. 280 in subspace iteration. 111 Bai. 138..264 angles between right and left eigenspaces.

202 diag(<$i. 129. 236-237 algorithms. 201 Curtis. 3j_ defective. E. J. 227 cross-product matrix. 276 dependence of eigenvectors. 34 convergence. normwise and componentwise. 20 complete system of eigenvectors. 183 divide-and-conquer algorithms. 181-183. 291-292. R. 34 condition number. 235-236 rotating to make B positive definite.. 20 dim(^) (dimension of X). 183.. 23. 185 unsuitability for graded matrices. 181-182 dominant eigenpair. . 107 complex eigenvector. R. 12. 15 sensitivity of eigenvalues. 51-52 also see sep generalized eigenvalue. 346 Dennis. 3± dominant eigenvalue.. 423 diagonalization block. 53. 265. C. 183 the effects of deflation.296 use by statisticians..!.I. 156 Datta. 212-213 refined Ritz vector.. 42-43. M.. 211-212 inaccuracy in the left singular vectors. 205 CS decomposition.. 236 deflation. 183 stability. 367 Cuppen. 185 recursive vs.185 error analysis. 30 Crawford. 201 generalities.454 INDEX Davidson. 143 ill conditioning. S. J. 280. Dembo. direct approach. 48. J. 211 inaccuracy in the singular values.. E. 7. 25 uniqueness.53 condition number.19-20 assessment. 7-8. 206 singular vector. 106-107 operation count. 48 simple eigenblock. 7 matrix.418 det(^l) (determinant of A). A. 422 Dhillon. 140 generalized eigenvector. 419 Demmel. 181-182 operation count. 50 Hermitian matrix. 48 limitations. 231 well conditioning.20] depth of the recursion. 8 dominant eigenvalues.. 48 Hermitian matrix. 23 Davidson's algorithm. 214.R. power method. 384 and Krylov subspaces. 279 consistent norm. 419 Davis. J.227 attractive features.38 definite generalized eigenvalue problem. see condition congruence transformation. W..265 defectiveness. R. N. 214 inaccuracy in the right singular vectors. 237 Cullum. 34 eigenvalue.53 condition number. 260 simple eigenspace. 48 eigenvalue.227. etc. 15. B. 209-210 condition number. C. dn] (diagonal matrix). 261 singular values.. 210-211 assessment.. see norm convergence ratios.S. 114 also see Hessenberg QR algorithm. 8-9 distinct eigenvalues. 48-51. 48 condition number. 210 S/PD generalized eigenvalue. 344 by inspection.. 424 divide-and-conquer algorithm for the spectral decomposition..51 eigenvector. QR algorithm. 236 cross-product algorithm for the S VD. 419 . 190 conjugate gradient method.

251. 243 nonorthogonal. me Hermitian matrix. 262-263. 127. 2. 23-24 left. 14 also see eigenpair. 31. 31..24 complete system. 44. 4. 47 first-order. 3 skew Hermitian matrix.INDEX dominant eigenvector. 23. 2 simple. 37-38 defective. 4. 52 condition. 247 deflation. J. 38 simple. 263-264. 3 unitary matrix. 3 dominant. 3 also see eigenspace eigenspace.52-53 .265 Vol II: Eigensystems 455 optimal. 7.12 Gerschgorin's theorem. J. W.. me characteristic polynomial. 242 left.334-335 backward error. 240 existence. 2 eigenvector. 2 complete system. 2. 4. 3 dominant. Hermitian matrix. 22. 242 as orthogonal complement of right eigenspace. 23-24 multiple. 7. 244. 242 orthonormal. 242 left.52 geometric multiplicity.115. 241 residual. 6. I.201 Duff.53 real matrix. 31. power method.367 Dongarra. 2 perturbation theory first-order. 242 behavior under similarity transformations. me nilpotent matrix. 2 algebraic multiplicity. 243 eigenbasis.53 simple eigenvalue.242 eigenblock. me Eisner's theorem.. me eigenvalue. 3 triangular matrix. see eigenspace eigenpair. 4 left and right. me derivatives. 12-13 normal matrix. 244 representation by a basis. 38. 46-47. 240-241 representation of A with respect to a basis. me diagonal matrix. 8 singular matrix. 395 ej («th unit vector). 13 history and nomenclature. see eigenspace eigenblock. 14 triangular matrix. S. 56 Donath. 8 singular matrix. 13 history and nomenclature. 253. 48.6. 280. 45-46. me diagonal matrix.53 rigorous bounds. 45-46. 242 eigenpair. 7. 5 characteristic equation.264 residual bounds for eigenspaces of Hermitian matrices. 2 left. 242 right. 46-47.. 423 eigenbasis. 39. 240. 243 nonuniqueness. 2 normalization. similarity transformation eigenvector. 254. 242 examples. me also see simple eigenspace eigenvalue. 100 diagonal matrix. me computing eigenvectors. 252. E. me continuity.265 simple. me Hermitian matrix. 5-6 right. 265 residual bounds for eigenvalues of Hermitian matrices. 3 dominant. 14 perturbation theory. 12 block triangular matrix.53 rigorous bounds. 31.

L.365 Leja points. 347.202 attaining a small residual. V. D.. 200 residual. 196-197 convergence. 317 Fischer's theorem. 133 chordal distance. 296 background. 38. S. 201. see Householder transformation Eisner's theorem. R. 130 left. 130 algebraic multiplicity. W. 130 eigenvalue. 198 tradeoffs in clustering. me generalized shift-and-invert transform. see norm Caches. R..53 rigorous bounds. 86-87 flrot. 111 6M (rounding unit). 345 Chebyshev polynomial. 142-143 equivalence transformation. 45^6. 132. 128. 326-328. R. 380 Freund. 197 orthogonalization. 228 filter polynomial. 202 elementary reflector.155 eigenpair. 129. 6 perturbation theory condition. 143 first-order expansion. 328 exponent exception computing bidiagonal QR shift. 41 fl (floating-point evaluation). 317. 140 projective representation. 140 first-order expansion. 130 perturbation of.52. J. 48-50.419 EISPACK 2x2 blocks in QZ algorithm. 200 starting vector. 131 effect on eigenvalues and eigenvectors. 138.. 317 in Lanczos algorithm. 136-137 right.456 normal matrix. J. 380 Galerkin method. 317.. 102 implicit double shift. 156 eigenvalues by bisection. 137 perturbation bound. 14 normalization. 134. 379. 67 INDEX Fernando. 130 condition. 8 singular matrix. see Rayleigh-Ritz method Gaussian elimination.231 eigenvector. 3 also see eigenpair. 91 Fokkema.. 365 Ritz values as roots.170 Fraysse. 200 loss of orthogonality. 46-47. me . 52.285 Eisner. 191 band matrix. 195-200. V. 5-6 right. T. 346 Francis. 89 factorization. 200 Eisenstat. 352. 228. 58 in complex arithmetic. 368.. 193 eigenvector computation. 27 Ericsson. 345-346 stability. 296 Frobenius norm.. similarity transformation eigenvectors of a symmetric band matrix. G. 133 condition. C. 198-200 limitations of the algorithm. 141-142 perturbation bound. 157 and inertia. 195-196 solution of inverse power equation.380 exchanging eigenblocks. 365 in Arnoldi method. 3 triangular matrix. Hermitian matrix. 4 simple. J55 characteristic polynomial. 200 generalized eigenvalue problem. 132 generalized Schur form. 223 computing inertia. 197-198 Gram-Schmidt algorithm. 27 flam (floating-point add and multiply). 2 null space characterization. K. me first-order. 110-112.53 real matrix. 119 plane rotation.

200 and balancing. 369 S/PD pencils..227. 293 generalized eigenvalue problem. 113. 367 gsreorthog.. H.411 computation. 42-43 . 203 singular value decomposition... 236-237 relation to the S/PD eigenproblem. 296. J. G. 143 generalized shift-and-invert transform. J55 also see generalized eigenvalue problem.24. 307. 170. 39 Gerschgorin. 293. 203 Golub. 294 exact eigenspace. eigenvector QZ algorithm.. 131 regular pencil. 202 eigenvectors by the inverse power method.52 condition. 198. 369. 140 nonregular pencils.379-380 bounding the residual. M. 112 eigenvalues by bisection. see eigenvalue Gerschgorin's theorem. 304 modified.201 Vol II: Eigensystems 457 Goldstine. 23.201. 367 graded matrix. 236-237 computation via the CS decomposition. 228 Gutknecht. 314-315 Hermitian matrix. 136. 133 condition.359 Grcar. 136. 293-294 from Krylov decomposition. 367 Hadamard. 134. G. 227-228 Jacobi's method. 144-147. 131 algebraic multiplicity. 13. 110 and the QR algorithm. W. 235 harmonic Rayleigh-Ritz method. 110 Handbook for Automatic Computation: Linear Algebra. 55. H. 303 B variants. 134-135 simple eigenvalue. 294. 296 Rayleigh quotient preferred to harmonic Ritz value. 132. 201. 370 generalized singular value decomposition. R. 143. H. 202 Jacobi's method. H. 108-110 bidiagonal QR algorithm. me reduction to ordinary eigenproblem.. 133 Weierstrass form. 293 Hermitian matrix eigenvalue.264. 365 Greek-Latin equivalents. 227 S/PD eigenproblem. generalized eigenvalue problem.INDEX Hessenberg-tri angular form. 236 geometric multiplicity. 112 balancing for eigenvalue computations. R. 368. 135 shifted pencil. 108 Gram-Schmidt algorithm. 52.336 screening of spurious eigenpairs. 131 residual. 52 Givens rotation. 130-131 pencil. see Gram—Schmidt algorithm Gu. 130 perturbation theory. M. 368-369 regular pair. 108. 147-148. 133 real generalized Schur form. A. 369 backward error... 294 ordinary eigenproblem. eigenvalue. 40-41 Gerschgorin disks.193 diagonal similarities. 370 matrix-vector product. eigenpair. 138 real generalized Schur form. 294 Rayleigh quotient. 155 order of eigenvalues. me Rayleigh quotient. S. 203 types of grading. J. 198. 170 also see plane rotation Givens. 39. 132 generalized Schur form. 421 Grimes. M. 296 harmonic Ritz pair. me infinite eigenvalue. 292 Hermitian matrix. generalized eigenvalue problem. 380 right and left eigenvectors. 372 gsreorthog.

123 operation count. 23 Horowitz. 380 Higham.380 Hilbert. 158 uniqueness of reduction. N. 118-123. 42 min-max characterization. 8! Householder reduction.. 41 perturbation. 98-100 convergence. 203 Hotelling. 87 operation count. 23.. 123-126 INDEX 2x 2 blocks.112 computation. 81-82 reduction of a vector to multiple of ei. D.265 residual bounds for eigenvalues. 87 stability. 92. 112 multishift. 83-88. 129 deflation. P. 115. 111 implicit Q theorem. 94 normwise criterion. 116. 41 Hoffman-Wielandt theorem. 100. 100 searching for negligible subdiagonal elements. 126 explicit shift algorithm. 94-95.265 sep. 144-147. 98 forward instability. 118 real Francis shifts.. US. 95-96 double shift algorithm. 87. 123 applying transformations to deflated problems. 254. 111 product with a matrix. D.. 43 interlacing theorem. 157 also see symmetric matrix Hessenberg form. 24. 147 Higham. A.112. 83 reduction of row vectors.458 Fischer's theorem. 105-106. 123 QR step.H.. 147 stability. 42 principal submatrix.228.156 operation count. 118-119 implicit Q theorem. 97. 279 hump. 94-95 relative criterion.105 applying transformations to deflated problems. me Hessenberg QR algorithm ad hoc shift. 69 Householder transformation.B±. 262-263.112 aggressive deflation. 34 no-ni .148 implementation issues. 94 complex conjugate shifts. 13 residual bounds for eigenspaces. 112 Wilkinson shift. 12. 86 overwriting original matrix. 111 of a symmetric matrix. 95-96 basic QR step. 263-264. 23 Hoffman-Wielandt theorem. L. me nonorthogonalreduction. 42 eigenvector. 123 detecting negligible subdiagonals. A. 120-123 stability. 82-83 reduction of real vectors. 111. me invariance of Hessenberg form.128 general strategy. 43. 92-94 operation count. 97-98 Hessenberg-triangular form. 82 Householder. 119 starting. J. S. 94-95 backward stability. 92 stability. R. 95 double shift algorithm.346 Francis double shift.. the. 86 first column of the accumulated transformation. 100 operation count. 82-83 reduction of a vector to multiple of e n .. 116. 128-129 transmission of shifts. 51 special properties. 325. accumulating transformations. 116 unreduced. 112 implicit double shift. 96-97 stability. 115 deflation. 98-100 eigenvalues only. 87.169 Horn. J.110.

20 powers of.265 Jacobi's method. 407-411 focal region. 380 Jia. 423 implicit Q theorem. J. 195-200 use of optimal residuals. 415-416 extending partial Schur decompositions. 416 preconditioning..296 Jiang. 203 as preprocessing step.202. 409 updating the Rayleigh quotient. 67 residual computation. 417-418 Olsen's method.. 347. 7. 417-418 correction equation. 22 sensitivity of eigenvalues.. 202 cyclic method. 190 of a tridiagonal matrix.25 Jordan block. 202 threshold method.. C. 419-420 operation count. 68 dominant eigenvector. 406 convergence testing. see Jordan canonical form Jordan canonical form. P. 68-69 implementation. JU6. 21. C. I. 192 stability. 396. 202 convergence. E. In (identity matrix). 419 derivation from approximate Newton method. 408-409 restarting. 71 Ipsen. 21 Jordan. 23 Jordan block. 38 uniqueness. 407 residual computation.420 Jacobi-Davidson method. C. u) (Krylov matrix). 414-415 ill conditioning. 395 Jensen. 203 parallelism. 193 invariant subspace. 296. 420 459 Davidson's algorithm.INDEX /. 205 oo-norm. S. 409 orthogonalization. 409 Jennings. G.419-420 compared with approximate Newton method. 191 invariance under congruence transformations. 404. 405 harmonic Ritz vectors. 412-414. 201 and LU decomposition. 416-418 free choice of shifts.264 Kk(A.265 Johnson.420 iterative solution of correction equation. Z. 22.36 assessment.381. 190. 411—412 Hermitian matrices. 202-203 and graded matrices. 203 rediscovery. 70-71. 415-416 restriction on choice of shifts. 415 multiple right-hand sides.170 backward error. 170. 423 f (cross matrix). 70. 416 stopping. R. 394. R.405. 203 quadratic. 414-415 storage requirements.. see eigenspace inverse power method.24-25. 192-193 inf (X) (smallest singular value of X).. 66-67 symmetric band matrix. 191-193 operation count. 201. 408-409 convergence to more than one vector. 267 fck(A. 267 Vol II: Eigensystems . 404-405 exact solution of correction equation. 203 Jacobi. 155. 409 partial Schur decomposition. 405-406 solving projected systems. u) (Krylov subspace).. 412 shift parameters. 407 retarded by inexact solution. 416-417 Krylov sequence methods. 34-35 principal vector. 66. 69 convergence. 56 effects of rounding error. 295. 226. A.403. C. see norm interlacing theorem.118 inertia. 42 interval bisection. 411 with harmonic Ritz vectors. 191 computed by Gaussian elimination.

297. 348 as iterative method.460 INDEX elementary properties. 266 degeneracy in Krylov sequence. 363 estimating loss of orthogonality.. 24 Kronecker.. 269 uniqueness of starting vector. 297. W.418 Krause. 277 convergence. 274 multiple eigenvalues. 310 translation. 268 Krylov matrix. 277.. Krylov decomposition. 335 economics of transforming. N. 110 Lagrange. 356-358 filter polynomial. 267. 265 Kaniel. 52 Kronecker product. 365 essential tridiagonality of Rayleigh quotient.365 Krylov-spectral restarting. 311 Krylov subspace. 365 Leja points. 53.52 Kline. 274-275. 268 block. 280. 315 and Rayleigh-Ritz method.170.. 279. 309 Rayleigh quotient. 23 K6nig.N. 365 full reorthogonalization. G. 298 . 278 non-Hermitian matrices convergence. 273 to interior eigenpairs. 301 also see Arnoldi decomposition. 352 error analysis. 307 Kahan. M. 352. 350 Lanczos decomposition. 310 orthonormal.. 266 convergence. 273-274 deflation.227. A. 277 convergence. Lanczos algorithm Krylov. 278 block Krylov sequence.-L. 306 uniqueness.J. 266. 277 compared with power method. 352 Chebyshev polynomial.264. 272-273 to inferior eigenvector. 277 non-Hermitian matrices. G. 312-314 similarity. 311-312 nomenclature.T. L.280. 279 and eigenspaces. 349. 314-315 computation of refined Ritz vectors. 268 and similarity transformations. 309 to an Arnoldi decomposition. 365 block.156. 279 Kublanovskaya. 280 Kato. 268 termination.307 Rayleigh quotient..365 stability.. 269. 228 Kollerstrom.279 multiple eigenvalues. 267-268 invariance under shifting. 276 and scaling. 275-276 polynomial representation. V. 267 Krylov sequence. 309 reduction to Arnoldi form.3J5 adjusted Rayleigh quotient. 268 and characteristic polynomial. 312-314 equivalence.280 polynomial bounds.279 and defective matrices.201...309.315 assessment of polynomial bounds. 316 computation of harmonic Ritz vectors. 362-363 and characteristic polynomial. 351 updating. E. 351-352.367 computation. 273 to the subdominant eigenvector.. S. 314 computation of residuals. 2 Lanczos algorithm. 351 implicit restarting. 365 convergence and deflation. 351 global orthogonality. 155 Krylov decomposition. 367 computing residual norms. 273-274 polynomial bounds.279-280.. 23 A(A) (the spectrum of ^4). N. J. 269-272 square root effect. 110 Kogbetliantz.

366 periodic reorthogonalization. 366 partial reorthogonalization. 362-363 also see Lanczos algorithm. selective reorthogonalization restarted. 365 and semiorthogonality. 365 Lanczos decomposition. 353 QR factorization. 183 eigenvalues by bisection. 348 local and global.365-366 example. 358-359 effects of relaxing constraint. 239 latent value. 355 residual norms. B. 228 CS decomposition. 307-308. 367 . 171 reduction to tridiagonal form.INDEX Lanczos recurrence. see Lanczos algorithm Lanczos. 187 tridiagonal QR step. Lanczos algorithm. 353-354 Vol II: Eigensystems 461 Rayleigh quotient. Lanczos algorithm. 358-362. 128. 350. Lanczos algorithm. 235 spectral decomposition downdating. no reorthogonalization. 366 software. 129 ad hoc shift in QR algorithm. 228 divide-and-conquer algorithm for the spectral decomposition. 348-349 general algorithm. 307. Lanczos algorithm. 350 method of minimized iterations. 102 exchanging eigenblocks. 279 no reorthogonalization. 346 left eigenpair. 239 storage of sparse matrices. C. 328. 346 multishiftQR algorithm. 354 singular value decomposition. 349-350 updating adjusted Rayleigh quotient. 112 2x2 blocks in QZ algorithm. 350 difficulties. 359 software. 308 local orthogonality. 350. 348. 366 Rayleigh quotient. 353. periodic reorthogonalization.129 PWK method for the tridiagonal QR algorithm.365-366 and reorthogonalization. 202 eigenvectors by the inverse power method. 105 bidiagonal QR algorithm. 202 eigenvectors of triangular matrices. 365-366 semiorthogonality. 170 S/PD generalized eigenvalue problem. G. 350 reorthogonalization. R. partial reorthogonalization. J. 201 storage of band matrices. 279. 280 selective reorthogonalization. 24 Le. 364 modified Gram-Schmidt algorithm. 355-356 size of reorthogonalization coefficients. 233 large matrices. J.315 loss of orthogonality. 345.348 alternative.. 349.346 Lewis. 170 Wilkinson's algorithm. see Lanczos algorithm Lanczos recurrence.365 LAPACK. 308 storage considerations. 365-366 software. 350. 350 loss of orthogonality. 355-356 Ritz vectors. 350 thick restarting. full reorthogonalization.. 358-359 contamination of the Rayleigh quotient. 239 storage of eigenvectors. 308 operation count. 315.. 366-367 termination and restarting. 2 Lehoucq. 237 differential qd algorithm.. 156 2x2 blocks in double shift QR algorithm. 347 contamination by reorthogonalization.

236 More. 423 matrix powers (Ak). 423 sum and product. K. Y. 423 orthonormal. 296. K.26. 424 triangular. 205 trace characterization. 395-396 history. 205 matrix norm. 110. 58-59 Meerbergen.S. 129. I. Z. 203 . 380 method of minimized iterations.425 and inertia. 418 relation to the QR algorithm. 27 and largest singular value. 65 tridiagonal matrix. R. 23 identity. 34 matrix-vector product effects of rounding error. J. 421 cross. 421 full rank.265 matrix.27 limitations. 26 family of norms. 27 co-norm.. 113 Mathias. S. 423 permutation. 424 Nash. see convergence ratios LR algorithm. 26 unitary invariance. 25 operator norm. 25. 403.. B. S. 27 unitary invariance. 265 Morgan. A. 71 norm. 367 INDEX M(X) (null space of X). 423 element.345 multiple eigenvalue. 29.12 multiset. R J.33. 418 relation to Rayleigh quotient method.235.462 Li... 279 Moler. R. 26 Frobenius norm.12 geometric multiplicity. 201 nilpotent matrix. C. 423 order. 33-36 behavior when computed. 111 LU decomposition. 2 algebraic multiplicity. 424 unit triangular. 191 Lui. 27 rounding a matrix. 29 equivalence of norms. 27 vector norm. 418 Newton-based methods. 12-13 nonnormal matrix. 36 l-norm. 6..35-36.. 30 examples. 36 spectral radius bounds. 26 absolute norm.J. 4. 27 properties. 29-30 construction by similarities. 381 Nielsen. 26.. 24 Newton's method. 422 symmetric.394 convergence. 30 Euclidean norm. 36 consistent vector norm. trapezoidal. 423 zero.. 27 and singular values. 423 diagonal. 27 of orthonormal matrix. 422 orthogonal. 33 the hump. 395 application to eigenproblem. 235 local convergence ratio.. C. 31-33. 26 and 2-norm.43 matrix norm. 2 Murray. 193. 423 trapezoidal.. B. C. R. 28 Martin. 156 Moon.. 28 examples.27 2-norm and Frobenius norm.. 28. 36 consistent norm.6. 424 unitary. 418-419 derivation by perturbation expansion. 418-419 Newton. 28 and spectral radius. 425 Hermitian. P. 423 history..36 compatable norm.

108 band tridiagonalization.. J. 192 Jacobi-Davidson method. 418.266. D. 419 operation count application of a plane rotation. 89 in the (e. 24. 409 Krylov-Schur restarting.60 convergence. 367. 27 triangle inequality. 189 basic QR step in Hessenberg QR algorithm.296.W. 26 unitarily invariant norm. 26 normal matrix. M. 425 positive semidefinite matrix. A. 2-norm subordinate norm. 70 effects of rounding error.. 69-70. S/PD generalized eigenproblem Peters. B. 70 power method. 170 positive definite matrix. 167 implicitly restarted Arnoldi. 226 implicit double shift QR step.170. 220 complex arithmetic.228. J. N. eigenvector. 14 Nour-Omid. 308 QZ algorithm. 82 hybrid QRD-SVD algorithm. 58. B. generalized eigenvalue problem. 424 nullity [null(A)]. 329 Lanczos recurrence. see unitary similarity Osborne. singular value decomposition. 65 . 92 bidiagonal QR step. 91 pre. see eigenpair. Hermitian matrix. 56. 237. G. E. 88-89 plane rotations Givens' method. see exponent exception Paige. 217 reduction to Hessenberg-triangular form. 346. 147 reduction to tridiagonal form. 14 eigenvalues. 295.. see norm. E. 380 null space CV(X)). 60-62 deflation.91 exponent exception. 86 Householder-matrix product... 71 overflow. 425 Powell. 88 introducing a zero into a 2-vector. j)-plane.420 Petrov method. 112 Ostrowski. 57 convergence criteria. 102 explicit shift Hessenberg QR algorithm.. 89-91 operation count.279. 111 and Wilkinson diagrams. 27 vector norm. 347. 14 eigenvectors. 315.. see generalized eigenvalue problem pentadiagonal matrix. 179-181 Wilkinson's algorithm for S/PD eigenproblem. simple eigenspace. 347. 424 Oettli.296.INDEX spectral norm. 66 choice of starting vector. relative and structured perturbation theory. 186 perturbation theory. 320-323 inertia of a tridiagonal matrix. 88 introducing a zero into a matrix. 181-182 eigenvectors of a triangular matrix. 88. 379. 70 Olsen. see Rayleigh-Ritz method plane rotation. 123 implicit tridiagonal QR step... C.265. 36. 380 pencil.280. 365. 366 Parlett. eigenvalue. C. 91 Arnoldi process. 304 balancing.169. 185 divide-and-conquer algorithms. M. 100 Householder reduction to Hesssenberg form.381 assessment. 107 divide-and-conquer algorithm for the spectral decomposition. 150-152 reduction to bidiagonal form. 162 Vol II: Eigensystems 463 spectral decomposition updating.and postmultiplication. 232 orthogonal similarity. 86-87 deflation by inspection. 89 generation. 91 application.

111. 424 Raphson. 155 choice of double shift. 75 cubic. 80 implications for the shifted algorithm. 110 QL algorithm. 169. 396 as factorization of a polynomial. 150-152 QZ step.170. 64. 76 convergence. 418 Rayleigh quotient shift. 73-74 multiple eigenvalues.170 QR step. 70 with two vectors.70 accuracy of. 126-128 QZ algorithm.138 for subspaces. 75 unshifted algorithm. 309 in Lanczos decomposition. QZ algorithm. 152 operation count. 111. 72. 342-344. see Jordan canonical form projector. 78-79 disorder in eigenvalues. 7_I unshifted. W. Strutt). 113. 301 in Krylov decomposition.. 427 proper value. W. 63 shifting. 80-81 QL algorithm. 148 treatment of infinite eigenvalues. 421 jpmxn (Space Of reai m x « matrices). 152 starting the double shift.63. 111 relation to Newton's method. 110 impracticality on a full matrix. 421 R n (real n-space). 73 relation to inverse power method. LQ). 69.156 2x2 blocks. 418-419 relation to power method. 424 rank. 70. 24 pseudospectrum. 24.464 implementation. see QR algorithm quasi-triangular matrix. 306 Rayleigh quotient method.295 Rayleigh quotient.77. 111 alternate shifts. 36-37 qd-algorithm.156 E (real numbers).169. see Rayleigh-Ritz method principal vector. 260 and optimal residuals. Hessenberg QR algorithm. 394. 343 Rayleigh (J. 381-382 Prager. 150. 71. 347 no free lunch.396 quadratic convergence. 153-155. 58 Rayleigh quotients. 64 Markov chain. 421 R(X) (column space of A"). 80 rates of convergence. 69 normalization. 70 for generalized eigenvalue problem. 155 deflation. 69 relation to Newton's method. 75 error bounds for a QR step. 115 computing eigenvectors. 77-80 history. 77-80 convergence to Schur decomposition. 58-59 rate of convergence. 70 primitive Ritz vector. 418 rational Krylov transformation. 79 equimodular eigenvalues. 136. see QR algorithm QR algorithm. QL. 59 operation counts. 1. 80 variants (RQ. see QR algorithm . 47. 156 stability. 111 shifted. 147-148. 113.418 INDEX Wilkinson shift. 148-152 real shifts. 75 also see bidiagonal QR algorithm. tridiagonal QR algorithm QR decomposition. 343-344 potential instability.153 eigenvalues only. 426-427 and QR algorithm. subspace iteration. 75 quadratic. 71 Rayleigh quotient shift. 148 computation of eigenvectors. 110 QR step. 72-73. 403. 252 in Arnoldi decomposition.55.

283 choosing a primitive Ritz basis. 348 as a generalized eigenvalue problem. 28 in eigenvalues. 23.280. 286-287 uniform separation condition. 159 residual. 72 computation by explicit shift QR algorithm. 295-296 primitive Ritz basis. 296 from Krylov decomposition. 285-287 Ritz vector. 115 computing eigenvectors. 281 harmonic Rayleigh-Ritz method.128 computation by double shift QR algorithm. 347. 22. 282 projection method. 43. W. 289 Ritz blocks and bases. 286-288 residual.. 407 Schur vector. 282 Ritz vector.396 and Lanczos algorithm. 2 Ritz pair. 123-126 computation by Householder transformation. 288 Ritz value. 252. J. 282 Ritz value. 195-196 optimal. 126-128 in subspace iteration. 284 Petrov method. 71 and Jordan form. 283 optimal residual. 282-283 Galerkin method. J. 395 Saad. 61_ backward error. 156 Reinsch. 291 compared with Ritz vector. 110. 113 relative and structured perturbation theory. 66-67 convergence testing. 70. 282 convergence. 282 primitive Ritz vector. computing eigenvectors partial.. 24. 478 Schur decomposition. me also see real Schur form .295. 314 convergence. 380 right eigenpair. 344-347 scalar. L.. 54 relative error. 282 Ritz space.380 Rutishauser. K. 282 Ritz pair.381. 345 Rigal. R. 247 reverse communication. 98-100 computing eigevectors. 1. see triangular matrix.231 normwise. 113. 291 cross-product algorithm. A. 295 rounding unit (eM). 159 in two-dimensional arrays. 291-292 Vol II: Eigensystems 465 basic algorithm.. 295-296 general procedure. 12.and row-oriented algorithms Ruhe.INDEX Rayleigh-Ritz method. see Rayleigh-Ritz method Ritz. 126. 288 Ritz basis. 280-281 real matrix eigenvectors. 282 Ritz block. 367. 293. 23. Y. 281 failure.C. 291-292. 38 real Schur form.. 62. 61. 28 representation of symmetric matrices.10-12. 394. 284 orthogonal.280. 282 separation condition. 379. 5-6 perturbation of simple eigenvalue.. 24 by rowwise deflation. me history. 295 oblique. 27 row orientation.265 computation in the inverse power method. 203.55. 287 exact eigenspace.253. 264 resolvent.. H. 283 two levels of approximation. 295-296 computation. 295 residual bound.295 Hermitian matrix. see column. 391 refined Ritz vector. 421 Schnabel. 159 packed storage. 290 Reid. 63. 252. B. 284. 279.295. 289. 289-290 use of Rayleigh quotient.

me singular value factorization. 259 condition of an eigenblock. H. 51 properties. 226 <ri(X) (the ith singular value ofX). 246. 229 two-sided method of Kogbetliantz.53-54. 9 invariance of trace. 205 and eigenvalues of the cross-product matrix. 27 and spectral decomposition of cross-product matrix. 245 block diagonalization. 244 angles between right and left eigenespaces. 228-229 one-sided method.255 and physical separation of eigenvalues. 395 secular equation. 204 singular value decomposition. 204 similarity transformation. 380 purging. A. 206. 379 Scott. 378-379. 204. 204. 260 condition of an eigenspace. 260. 247 spectral projector. 50-51. 376 defective matrix. 209 condition. 66 factorization. 205 effects of rounding. 380 sep.. 201 Scott. 418 singular value. 67 solving linear systems.380 rounding-error analysis. 204 perturbation theory.380 assessment. 265 INDEX bounds for eigenspace.. 245-246 complementary eigenspace. 228 history. 225 downdating. 211 min-max characterization. D. me first-order expansion. 204 and 2-norm. 261 bounds in terms of the error. 366-367 singular value. 246 standard representation.256 Hermitian matrices.. 247 spectral representation. 375-376 null-space error. 244-246 several eigenspaces.247 also see eigenspace Simpson. 346. 46.226-227 analogue of Rayleigh quotient. 380 effect on Rayleigh quotient. 10 Simon. R. 258 normalization of bases. 257 limitations. 205 differential qd algorithm. 257 resolvent. 24 semidefinite 5-Lanczos and Araoldi algorithms. me . I.205 and Frobenius norm.. 379 geometry. 9 unitary similarity. 254-255.204 and 2-norm. S. 8 eigenvalues and eigenvectors. 375-379.265 perturbation theory. 261 main theorem. 256 shift-and-invert enhancement. 376-377 growth. 204 singular vector. 245 deflation of an approximate eigenspace. 206. 206 ordering convention. 228 divide-and-conquer algorithms. 251 biorthogonal basis for left eigenspace. 157. H.15 Schur. 55. 67-68 <ri(X) (the ith singular value of X). 259 residual bounds. 245 uniqueness. 12. 24 Schwarz. 365. 245 corresponding left eigenspace.143. 226 Jacobi methods. 245 several eigenspaces. 365-367 simple eigenspace.265 alternate bound. \2 relation to eigenvectors.. 265 accuracy of the Rayleigh quotient.466 Schur vector. 228 Lanczos algorithm. 378. J. 376 effect on Ritz vectors. 208 second-order expansion for small singular values.

208 left singular vector. 66. 229 eigenvalue. 414-415 ill conditioning. 346 Steihaug. 64 shift-and-invert enhancement. 414-415 Sorensen. 181 basic algorithm. 175 solving the secular equation. 424 sparse matrix. 370 ill-conditioned B. 230 banded problems. 204 and eigenvectors of the cross-product matrix. J_3.239 S/PD generalized eigenproblem. 3^ norm bounds. see singular value decomposition singular vector.. 204 and eigenvectors of the cross-product matrix. A. 231 eigenvector.235 computation of R~TAR~l. L. 231 projective representation. 176-177 spectral enhancement. me first-order expansion. see subspace iteration. 232 spectral decomposition. 231-233. 234 no infinite eigenvalues. 175 operation count. 380 SRR. 157. 4]9 VolII: Eigensystems . 230 generalized shift and invert transformation. see skew Hermitian matrix Sleijpen. see simple eigenspace spectral transformation. 232 storage and overwriting. T.205 left. 14 eigenvalues. G. 346.. 173 secular equation. 179-181 reduction to standard form. 201. 365 span(^f) (the space spanned by X}. 201 assessment. 209-210. 158 and 2-norm. 205 skew Hermitian matrix. 230 perturbation theory. 345. 173-175 tradeoffs. 177-181 stability. 235 S/PD pencil. 58. G. 231 well conditioned eigenvalues not determined to high relative accuracy. 210 right. 31-33. 296. me spectral norm. C. see norm. D.. 171 computing eigenvectors. 233-235 use of pivoted Cholesky decomposition of B. Schur-Rayleigh-Ritz refinement stability in the ususal sense. 171. 14 skew symmetric matrix. 179 the naive approach. 177-178 deflation. 229 diagonalization by congruences. 171-173 accumulating transformations. 316. 235 Chandrasekaran's method.. 211-212 updating. 230 condition number. 379 also see generalized shift-and-invert transformation spectrum. 27 effects of rounding. 204 perturbation theory.INDEX uniqueness. 210 separation of singular values. 346. 53 spectral radius. 415 multiple right-hand sides. 210 right singular vector. 206-210. 235 congruence transformation. see spectral decomposition updating spectral decomposition updating. 87 Stathopoulos. 226-227 condition. 229 Wilkinson's algorithm. 234 467 use of spectral decomposition of B. 157. 36 spectral representation.229 B-orthogonality. 2-norm spectral projector.234-235 product problem (AB). 2 Spence. 231. 205 singular value factorization.420 solution of projected systems.. 231 ill conditioned. 232-233 operation count. A.

382.202 subspace iteration. 2.346. 159 packed storage. 186 . 382 freedom in dimension of the basis. 391 Schur-Rayleigh-Ritz refinement. 384 to dominant subspaces. 158.264.y) (canonical angle matrix).186 Stewart. 395 economization of storage. see singular value decomposition Sylvester's equation.52. 390 software. 10 trace(^) (trace of A). me representation. 381. 391 Chebyshev acceleration. 187-189. 395 Sturm sequence. 385 convergence testing.128 conditions for solution. 390-391 when to perform. G. 187 storage. 158 special properties. 24. 388 dependence in basis. D. 248 Toumazou. 9 use in debugging. 186 representation. 345. 186 symmetric matrix inertia. 104 tridiagonal matrix.. P. 394. 386. 391-392 Treppeniteration. T.-G... 201 limitations.394 shift-and-invert enhancement. 159 simplifications in the QR algorithim. J. 170. see S/PD generalized eigenproblem symmetric tridiagonal matrix. 395 Stewart. J. 201 Taussky.386.. 52 SVD. 102 stability and residuals. 189 operation count.295.156.. 186 eigenvectors. 387-388 stability. 52 Taylor. N. 380 trace as sum of eigenvalues. 189 band width. 394 real Schur form. 389-391. 203 Tang.37 triangle inequlity. 157 also see Hermitian matrix symmetric positive definite generalized eigenvalue problem. 16 Sylvester. J. 384 defectiveness.382. J. 24 numerical solution. 383 Schur-Rayleigh-Ritz refinement.237. 186 computing eigenvalues and eigenvectors. see tridiagonal matrix systolic array. 382. 16-17 Kronecker product form. R. 384 deflation. 393-394 Rayleigh-Ritz method.201 symmetric band matrix. see eigenvectors of a symmetric band matrix pentadiagonal matrix.. 0. 24. 316.394 convergence. 18.394 when to perform. 394 Sun. W.384-385 frequent with shift and invert enhancement. 24. 100-104 operation count. 265.155. L. 392-394. 367 Q(X... 189 stability.24 Sylvester operator.16. 23. 386 superiority to power method. V. 9 invariance under similarity transformations. 190. 422 Trefethen. 386-387 orthogonalization. P. W.. 395 symmetric matrix. see norm triangular matrix computing eigenvectors. 384 general algorithm.468 INDEX band tridiagonalization. 389. 102-104 computing left eigenvectors. 388 convergence theory. 394 and QR algorithm. 159 in two-dimensional arrays.

420 Van Loan. R.INDEX assumed symmetric.170 using symmetry.235.189 first column of the transformation. 162 stability. 167-168 Rayleigh quotient shift. 280.169. H. 170 Householder's reduction to.. 155 Wielandt.. 346. QR algorithm. 20 underflow. J. 170 . S. 23. R. D.. 112.. 70..171 stability. see exponent exception Underwood. 158 calculation of a selected eigenvalue. A.129.170 implicitly shifted QR step. 367 unitarily invariant norm. 164 graded matrix.345 469 Ward. 169 QL algorithm.24. Z.23. 170 inertia.168.418 Wu.. 160 tridiagonal QR algorithm.F.. 192-193 making a Hermitian matrix real. 170-171 combination of Rayleigh quotient and Wilkinson shifts. 23. 23. 132 Weierstrass. C... 70 Wilkinson.237 vector. 203 Vu.201-202 accuracy. tridiagonal QR algorithm Wilkinson.346 Weierstrass form. 365 Yang. 162 operation count. 159-162. J.. 10 backward stability. 156 Watkins. Mark. 169 explicitly shifted. 111. 8J. 162. 170. 170 deflation.. 195 use of fast root finders. 169 local convergence. 193-195. 264. K. 167 variations. H. 160 Wilkinson's contribution.229. 423 von Neumann. 424 Wilkinson shift.170 graded matrix.346. 14 unitary similarity. 420 Wilkinson diagram. 167 PWK method. 421 component. 345 Zhang. 195 Givens' reduction to. 191-193 operation count. 111-113.53. 169. 168-169 detecting negligible off-diagonal elements.H.163. 1. 10 VolII: Eigensystems unreduced Hessenberg matrix. see Hessenberg QR algorithm. 116 van der Vorst.394. 423 zero.156. C. 421 unit.365. 296. 167. 71.R. 128. 168-169. 164-167 operation count. K. 167.170 Twain.170. 169. C. 192 stability.170 Wilkinson shift.. see norm unitary matrix eigenvalues. 162-163 representation.52.

Sign up to vote on this title
UsefulNot useful