This action might not be possible to undo. Are you sure you want to continue?

Matrix Algorithms

This page intentionally left blank

Matrix Algorithms

Volume II: Eigensystems

G. W. Stewart

University of Maryland College Park, Maryland

Society for Industrial and Applied Mathematics Philadelphia

siam

Copyright © 2001 by the Society for Industrial and Applied Mathematics. 1098765432 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688. MATLAB is a registered trademark of The MathWorks, Inc., 3 Apple Hill Dr., Natick, MA 01760-2098, USA, tel. 508-647-7000, fax 508-647-7001, info@mathworks.com, http://www. mathworks. com. The Library of Congress has catalogued Volume I as follows: Library of Congress Cataloging-in-Publication Data Stewart, G. W. (Gilbert W.) Matrix algorithms / G. W. Stewart. p. cm. Includes bibliographical references and index. Contents: v.1. Basic decompositions ISBN 0-89871-414-1 (v. 1 : pbk.) 1. Matrices. I. Title. QA188.S714 1998 512.9'434-dc21 0-89871-414-1 (Volume I) 0-89871-503-2 (Volume II) 0-89871-4184 (set)

siam ELLEUTL is a registered trademark.

98-22445

CONTENTS

Algorithms Preface 1 Eigensystems 1 The Algebra of Eigensystems 1.1 Eigenvalues and eigenvectors Definitions. Existence of eigenpairs: Special cases. The characteristic equation and the existence of eigenpairs. Matrices of order two. Block triangular matrices. Real matrices. 1.2 Geometric multiplicity and defective matrices Geometric multiplicity. Defectiveness. Simple eigenvalues. 1.3 Similarity transformations 1.4 Schur Decompositions Unitary similarities. Schur form. Nilpotent matrices. Hermitian matrices. Normal matrices. 1.5 Block diagonalization and Sylvester's equation From block triangular to block diagonal matrices. Sylvester's equation. An algorithm. Block diagonalization redux. 1.6 Jordan form 1.7 Notes and references General references. History and Nomenclature. Schur form. Sylvester's equation. Block diagonalization. Jordan canonical form. 2 Norms, Spectral Radii, and Matrix Powers 2.1 Matrix and vector norms Definition. Examples of norms. Operator norms. Norms and rounding error. Limitations of norms. Consistency. Norms and convergence. 2.2 The spectral radius Definition. Norms and the spectral radius. 2.3 Matrix powers Asymptotic bounds. Limitations of the bounds.

v

xv xvii 1 1 2

6 8 10 15 20 23

25 25

30 33

vi

CONTENTS

2.4 Notes and references Norms. Norms and the spectral radius. Matrix powers. 3 Perturbation Theory 3.1 The perturbation of eigenvalues The continuity of eigenvalues. Gerschgorin theory. Hermitian matrices. 3.2 Simple eigenpairs Left and right eigenvectors. First-order perturbation expansions. Rigorous bounds. Qualitative statements. The condition of an eigenvalue. The condition of an eigenvector. Properties of sep. Hermitian matrices. 3.3 Notes and references General references. Continuity of eigenvalues. Gerschgorin theory. Eigenvalues of Hermitian matrices. Simple eigenpairs. Left and right eigenvectors. First-order perturbation expansions. Rigorous bounds. Condition numbers. Sep. Relative and structured perturbation theory. 36 37 37

44

52

2 The QR Algorithm 1 The Power and Inverse Power Methods 1.1 The power method Convergence. Computation and normalization. Choice of a starting vector. Convergence criteria and the residual. Optimal residuals and the Rayleigh quotient. Shifts and spectral enhancement. Implementation. The effects of rounding error. Assessment of the power method. 1.2 The inverse power method Shift-and-invert enhancement. The inverse power method. An example. The effects of rounding error. The Rayleigh quotient method. 1.3 Notes and references The power method. Residuals and backward error. The Rayleigh quotient. Shift of origin. The inverse power and Rayleigh quotient methods. 2 The Explicitly Shifted QR Algorithm 2.1 The QR algorithm and the inverse power method Schur from the bottom up. The QR choice. Error bounds. Rates of convergence. General comments on convergence. 2.2 The unshifted QR algorithm The QR decomposition of a matrix polynomial. The QR algorithm and the power method. Convergence of the unshifted QR algorithm. 2.3 Hessenberg form

55 56 56

66

69

71 72

75

80

CONTENTS

Householder transformations. Reduction to Hessenberg form. Plane rotations. Invariance of Hessenberg form. 2.4 The explicitly shifted algorithm Detecting negligible subdiagonals. Deflation. Back searching. The Wilkinson shift. Reduction to Schur form. Eigenvectors. 2.5 Bits and pieces Ad hoc shifts. Aggressive deflation. Cheap eigenvalues. Balancing. Graded matrices. 2.6 Notes and references Historical. Variants of the QR algorithm. The QR algorithm, the inverse power method, and the power method. Local convergence. Householder transformations and plane rotations. Hessenberg form. The Hessenberg QR algorithm. Deflation criteria. The Wilkinson shift. Aggressive deflation. Preprocessing. Graded matrices. 3 The Implicitly Shifted QR Algorithm 3.1 Real Schur form Real Schur form. A preliminary algorithm. 3.2 The uniqueness of Hessenberg reduction 3.3 The implicit double shift A general strategy. Getting started. Reduction back to Hessenberg form. Deflation. Computing the real Schur form. Eigenvectors. 3.4 Notes and references Implicit shifting. Processing 2x2 blocks. The multishifted QR algorithm. 4 The Generalized Eigenvalue Problem 4.1 The theoretical background Definitions. Regular pencils and infinite eigenvalues. Equivalence transformations. Generalized Schur form. Multiplicity of eigenvalues. Projective representation of eigenvalues. Generalized shifting. Left and right eigenvectors of a simple eigenvalue. Perturbation theory. First-order expansions: Eigenvalues. The chordal metric. The condition of an eigenvalue. Perturbation expansions: Eigenvectors. The condition of an eigenvector. 4.2 Real Schur and Hessenberg-triangular forms Generalized real Schur form. Hessenberg-triangular form. 4.3 The doubly shifted QZ algorithm Overview. Getting started. The QZ step. 4.4 Important miscellanea Balancing. Back searching. Infinite eigenvalues. The 2x2 problem. Eigenvectors. Vol II: Eigensystems

vii

94 104

110

113 113 116 117

128 129 130

143 147 152

viii

4.5 Notes and References The algebraic background. Perturbation theory. The QZ algorithm.

CONTENTS

155

3 The Symmetric Eigenvalue Problem 1 The QR Algorithm 1.1 Reduction to tridiagonal form The storage of symmetric matrices. Reduction to tridiagonal form. Making Hermitian tridiagonal matrices real. 1.2 The symmetric tridiagonal QR algorithm The implicitly shifted QR step. Choice of shift. Local convergence. Deflation. Graded matrices. Variations. 1.3 Notes and references General references. Reduction to tridiagonal form. The tridiagonal QR algorithm. 2 A Clutch of Algorithms 2.1 Rank-one updating Reduction to standard form. Deflation: Small components of z. Deflation: Nearly equal di. When to deflate. The secular equation. Solving the secular equation. Computing eigenvectors. Concluding comments. 2.2 A Divide-and-conquer algorithm Generalities. Derivation of the algorithm. Complexity. 2.3 Eigenvalues and eigenvectors of band matrices The storage of band matrices. Tridiagonalization of a symmetric band matrix. Inertia. Inertia and the LU decomposition. The inertia of a tridiagonal matrix. Calculation of a selected eigenvalue. Selected eigenvectors. 2.4 Notes and References Updating the spectral decomposition. The divide-and-conquer algorithm. Reduction to tridiagonal form. Inertia and bisection. Eigenvectors by the inverse power method. Jacobi's method. 3 The Singular Value Decomposition 3.1 Background Existence and nomenclature. Uniqueness. Relation to the spectral decomposition. Characterization and perturbation of singular values. First-order perturbation expansions. The condition of singular vectors. 3.2 The cross-product algorithm Inaccurate singular values. Inaccurate right singular vectors. Inaccurate left singular vectors. Assessment of the algorithm. 3.3 Reduction to bidiagonal form The reduction. Real bidiagonal form.

157 158 158

163

169

171 171

181 186

201

203 203

210

215

Uniqueness of simple eigenspaces. Perturbation theory. Spectral projections.2 Wilkinson's algorithm The algorithm. Final comments. 2. 1. The generalized singular value and CS decompositions. The Rayleigh quotient.2 Residual analysis Optimal residuals. 3.2 Simple eigenspaces Definition.4 Residual bounds for Hermitian matrices Vol II: Eigensystems ix 217 226 226 229 229 231 234 239 240 240 244 247 248 248 251 258 262 . The QRD-SVD algorithm. Jacobi methods.6 Notes and references The singular value decomposition. 4. Residual bounds for eigenvalues of Hermitian matrices. Uniqueness of simple eigenspaces. 2. The Rayleigh quotient and the condition of L. Wilkinson's algorithm. 2 Perturbation Theory 2. Condition of eigenvalues. The differential qd algorithm.3 Notes and references Perturbation theory. Backward error. Negligible superdiagonal elements. Downdating a singular value decomposition.1 Canonical angles Definitions. 1. Deflation. Computing canonical angles. Ill-conditioned B.5 A hybrid QRD-SVD algorithm 3. The quantity sep. Divide-and-conquer algorithms. 2. Back searching. 4 Eigenspaces and Their Approximation 1 Eigenspaces 1. 4.4 The QR algorithm The bidiagonal QR step.3 Notes and references Eigenspaces. Residual bounds. Computing the shift.1 Theory The fundamental theorem.CONTENTS 3. Combinations of subspaces. Bounds in terms of E. Computation of R-T A R . Band matrices. The resolvent. Existence of eigenspaces. Choice of subspaces.1 . Block diagonalization and spectral representations. 4 The Symmetric Positive Definite (S/PD) Generalized Eigenvalue Problem 4.3 Perturbation theory The perturbation theorem. Subspaces of unequal dimensions. The cross-product algorithm. The definite generalized eigenvalue problem. Reduction to bidiagonal form and the QR algorithm. Block deflation.1 Definitions Eigenbases. Angles between the subspaces.

Convergence of Ritz values revisited. Harmonic Ritz vectors. Residual bounds for eigenvalues. Residuals and convergence. Refined Ritz vectors. Rayleigh-Ritz and generalized eigenvalue problems. Computing Arnoldi 297 297 298 .4 Harmonic Ritz vectors Computation of harmonic Ritz pairs. Convergence of Ritz blocks and bases. Elementary properties. Convergence of Ritz vectors.4 Notes and references Sources. Multiple eigenvalues. 3 Krylov Subspaces 3. 2. 4. Residual eigenvalue bounds for Hermitian matrices. Convergence of refined Ritz vectors. Uniqueness.x CONTENTS Residual bounds for eigenspaces. Canonical angle. Block Krylov sequences.3 Block Krylov sequences 3.2 Convergence Setting the scene. Block deflation and residual bounds.5 Notes and references General references. 3.1 Krylov sequences and Krylov spaces Introduction and definition. Assessment of the bounds. Convergence (Hermitian matrices). Non-Hermitian matrices. Optimal residuals and the Rayleigh quotient.2 Convergence The general approach. Perturbation theory. Convergence. 4. Hermitian matrices. Convergence (non-Hermitian matrices). 264 266 266 269 277 279 280 281 284 289 292 295 5 Krylov Sequence Methods 1 Krylov Decompositions 1. Backward error. Residual bounds for eigenspaces and eigenvalues of Hermitian matrices.1 Arnoldi decompositions The Arnoldi decomposition. 4 Rayleigh-Ritz Approximation 4. The polynomial connection. The Rayleigh quotient. 4. Termination.5 Notes and references Historical.3 Refined Ritz vectors Definition. Reduced Arnoldi decompositions. Convergence of the Ritz value. Krylov sequences. 4.1 Rayleigh-Ritz methods Two difficulties. Chebyshev polynomials. 3. The orthogonal Rayleigh-Ritz procedure. Convergence redux. The computation of refined Ritz vectors.

Reduction to Arnoldi form. Convergence. Deflation in the Krylov-Schur method. Exact shifts. Implementation.3 Krylov decompositions Definition. Translation. and filter polynomials.1 Reorthogonalization and the Lanczos algorithm Lanczos with reorthogonalization.3 Stability. Harmonic Ritz vectors.2 Lanczos decompositions Lanczos decompositions. The rational Krylov method.2 Krylov-Schur restarting Exchanging eigenvalues and eigenblocks. The rational Krylov transformation. and shifts. Reverse communication. and deflation Stability. Forward instability. The equivalence of Arnoldi and Krylov-Schur. Comparison of the methods. Stability of deflation.3 Semiorthogonal methods xi 306 308 314 315 316 317 325 332 342 344 347 348 351 352 Vol II: Eigensystems . 2. 2.1 The implicitly restarted Arnoldi method Filter polynomials. Stability. Convergence criteria. The restarted Arnoldi cycle. 2. Practical considerations. 3 The Lanczos Algorithm 3.4 Rational Krylov Transformations Krylov sequences. Shift-and-invert enhancement and Orthogonalization. convergence.5 Notes and references Arnoldi's method.CONTENTS decompositions. Deflation in an Arnoldi decomposition.5 Notes and references Lanczos and Arnoldi. The Lanczos recurrence. 2. 1. The Krylov-Schur cycle. 1. Shift-and-invert enhancement and the Rayleigh quotient. Nonorthogonal bases. Deflation. Deflation. Equivalence to an Arnoldi decomposition. 3. Forward instability. restarting. 1.2 Full reorthogonalization and restarting Implicit restarting. Similarity transformations. Orthogonalization and shift-and-invert enhancement. 2 The Restarted Arnoldi Method 2. Krylov-spectral restarting. Exchanging eigenvalues. Choice of tolerance. 1. Convergence with shift-and-invert enhancement.4 Computation of refined and Harmonic Ritz vectors Refined Ritz vectors. Reorthogonalization. inverses. The Krylov-Schur method. ARPACK. Stability and shift-and-invert enhancement. Convergence and deflation. 3. Implicitly restarted Arnoldi. Krylov decompositions.

365 367 368 370 379 6 Alternatives 1 Subspace Iteration 1.xii CONTENTS Semiorthogonal bases. 4. The QR connection.1 Transformation and convergence Transformation to an ordinary eigenproblem. Schur-Rayleigh-Ritz refinement. The effects on the Rayleigh quotient. Full reorthogonalization. 1.3 Symmetric matrices Economization of storage. 4. When to orthogonalize.4 Notes and references The Lanczos algorithm. The B-Arnoldi method. B-Orthogonalization. The restarted B-Lanczos method. Deflation. The B-Lanczos method with periodic reorthogonalization. The QR factorization of a semiorthogonal basis. Block Lanczos algorithms. Restarting. Chebyshev acceleration. The size of the reorthogonalization coefficients. Convergence.2 Positive definite B The B inner product. Semidefmite B. Harmonic Ritz vectors. Bounding the residual.2 The Jacobi-Davidson method The basic Jacobi-Davidson method. Software. Residuals and backward error. The effects on the Ritz vectors. When to perform an SRR step. Updating HK. Orthogonality. Real matrices. No reorthogonalization. Hermitian 381 381 382 386 391 394 395 396 404 . Chebyshev acceleration.1 Theory Convergence. Orthogonal corrections. Semiorthogonal methods. Semidefinite B.2 Implementation A general algorithm. 2 Newton-Based Methods 2.1 Approximate Newton methods A general approximate Newton method.4 Notes and references Subspace iteration. The singular value decomposition. Analysis. Three approximations. 2. An example. Filter polynomials. Estimating the loss of orthogonality. The bi-Lanczos algorithm. 1. Extending Schur decompositions. Computing residual norms. 3. The correction equation. Periodic reorthogonalization. Residuals and backward error.3 Notes and references The generalized shift-and-invert transformation and the Lanczos algorithm. Local convergence. 1. 4 The Generalized Eigenvalue Problem 4.

The correction equation: Iterative solution. Solving projected systems. Hermitian matrices. and matrices 2 Linear algebra 3 Positive definite matrices 4 The LU decomposition 5 The Cholesky decomposition 6 The QR decomposition 7 Projectors References Index xiii 418 421 421 424 425 425 426 426 427 429 451 Vol II: Eigensystems . 2. Appendix: Background 1 Scalars. Inexact Newton methods. The correction equation: Exact solution.3 Notes and references Newton's method and the eigenvalue problem.CONTENTS matrices. The Jacobi-Davidson method. vectors.

This page intentionally left blank .

2 Reduction to bidiagonal form 216 xv .ALGORITHMS Chapter 1.6 Computation of selected eigenvectors 199 3. Eigensystems 1.1 Solution of Sylvester's Equation Chapter 1.8 2.3 The symmetric tridiagonal QR step 166 2.1 Rank-one update of the spectral decomposition 172 2.9 3.5 2.6 2.2 2. .2 Hermitian tridiagonal to real tridiagonal form 164 1.1 2.5 Finding a specified eigenvalue 194 2.2 Eigenvectors of a rank-one update 180 2.4 Counting left eigenvalues 192 2.1 Reduction to tridiagonal form 161 1.1 3.2 2.4 4.2 3.3 3.4 2.7 2.1 4. The Symmetric Eigenvalue Problem 1.2 2.3 Divide-and-conquer eigensystem of a symmetric tridiagonal matrix .1 1.3 2.1 The cross-product algorithm 211 3. The QR Algorithm The shifted power method The inverse power method Generation of Householder transformations Reduction to upper Hessenberg form Generation of a plane rotation Application of a plane rotation Basic QR step for a Hessenberg matrix Finding deflation rows in an upper Hessenberg matrix Computation of the Wilkinson shift Schur form of an upper Hessenberg matrix Right eigenvectors of an upper triangular matrix Starting an implicit QR double shift Doubly shifted QR step Finding deflation rows in a real upper Hessenberg matrix Real Schur form of a real upper Hessenberg matrix Reduction to Hessenberg-triangular form The doubly shifted QZ step 18 64 67 83 85 90 90 93 96 98 99 103 120 122 124 125 146 151 Chapter 3. 184 2.

1 The Arnoldi step 1. Krylov Sequence Methods 1. Alternatives 1.4 Solution of projected equations 2.4 3.3 Krylov decomposition to Arnoldi decomposition 2.3 Harmonic extension of a partial Schur decomposition 2.1 Subspace iteration 2.2 The restarted Arnoldi cycle 2.5 3.4 Deflating in a Krylov-Schur decomposition 2.1 4.xvi 3.5 The rational Krylov transformation 3.1 Lanczos procedure with orthogonalization 3.5 Davidson's method .2 Estimation of uTK+1uj 3.4 Periodic reorthogonalization II: Orthogonalization step Chapter 6.2 Extending a partial Schur decomposition 2.3 Periodic orthogonalization I: Outer loop 3.1 Filtering an Arnoldi decomposition 2.1 ALGORITHMS 218 221 225 226 232 234 304 308 313 321 322 330 341 344 349 357 360 361 387 405 410 413 415 419 Chapter 5.T A R .2 Complex bidiagonal to real bidiagonal form Bidiagonal QR step Back searching a bidiagonal matrix Hybrid QR-SVD algorithm Solution of the S/PD generalized eigenvalue problem The computation of C = R .3 The Krylov-Schur cycle 2.3 3.1 The basic Jacobi-Davidson algorithm 2.6 4.2 The Lanczos recurrence 1.

with such topics as roundingerror analysis and perturbation theory introduced impromptu as needed. and my guiding principle has been that if something is worth explaining it is worth explaining fully. which in its various manifestations has become the universal workhorse in this field. In the fourth chapter the focus shifts to large matrices for which the computation of a complete eigensystem is not possible. The series is self-contained. We then return to algorithms in the fifth chapter. the focus of the series is on algorithms. which treats Krylov sequence methods—in particular the Arnoldi and the Lanczos methods. Up to this point the present volume shares the concern of the first volume with dense matrices and their decompositions. The first volume treated basic decompositions. My aim is to bring the reader to the point where he or she can go to the research literature to augment what is in the series. The first chapter is devoted to an exposition of the underlying mathematical theory. sparse direct methods. The subject of this volume is computations involving the eigenvalues and eigenvectors of a matrix. Eigensystems. This has necessarily restricted the scope of the series. The second chapter treats the QR algorithm. The general approach to such problems is to calculate approximations to subspaces corresponding to groups of eigenvalues—eigenspaces or invariant subspaces they are called. In xvii . and special topics. but I hope the selection of topics will give the reader a sound basis for further study. in the fourth chapter we will present the algebraic and analytic theory of eigenspaces. is the second volume in a projected five-volume series entitled Matrix Algorithms. they are intended to teach. Although strictly speaking the individual volumes are not textbooks. The third chapter deals with symmetric matrices and the singular value decomposition. The reader is assumed to have a knowledge of elementary analysis and linear algebra and a reasonable amount of programming experience— about what you would expect from a beginning graduate engineer or an advanced undergraduate in an honors program. Consequently. Accordingly. including fast algorithms for structured matrices. The three following this volume will treat iterative methods for linear systems.PREFACE This book. It seems to me that these people will be chiefly interested in the methods themselves— how they are derived and how they can be adapted to particular problems. My intended audience is the nonspecialist whose needs cannot be satisfied by black boxes. The second and third chapters also treat the generalized eigenvalue problem.

and David Watkins for their comments and suggestions. Special thanks go to the editors of Templates for the Solution of Algebraic Eigenvalue Problems. she has turned an onerous job into a pleasant collaboration. I posted it on the web and benefited greatly from the many comments and suggestions I received. John Pryce.. I am indebted to the Mathematical and Computational Sciences Division of NIST for the use of their research facilities. Austin Dubrulle. The book is divided into numbered chapters. David Dureisseix.5) refers to the fifth equations in section three of the current chapter. with the material in previous volumes. Henk van der Vorst. it is impossible to redevelop the necessary theoretical and computational material in each volume. However. I have not hesitated to reference the first volume—especially its algorithms—where I thought it appropriate.g. Thomas Strohmer. On reflection. Algorithm 1:3. as I was preparing to write this volume. I have come to feel that the reader will be better served by semi-independent volumes. Kelly Thomas and Sam Young have seen the present volume through production. Vickie Kearn first encouraged me to start the project and has helped it along too many ways to list. Nicola Mastronardi. so that (3. I have checked the algorithms against MATLAB implementations.2. Mei Kobayashi. While I was writing this volume. With this second volume. Daniel Kressner.1. Be on guard! A word on organization. References to items outside the current chapter are made explicitly—e. In the first place. I conceived the series being a continuous exposition of the art of matrix computations. Arnold Neumaier. have aided the writing and production of this book. SIAM publications and its people.xviii PREFACE the last chapter we consider alternative methods such as subspace iteration and the Jacobi-Davidson method. I have therefore been forced to assume that the reader is familiar. I have had to come to grips with how the the volumes of this work hang together. much of the necessary material is slipped unobtrusively into the present volume and more background can be found in the appendix. However. Many methods treated in this volume are summarized by displays of pseudocode (see the list of algorithms following the table of contents). Andrzej Mackiewicz. In 1997. with heavy cross referencing between the volumes. they often leave out error checks that would clutter the presentation. who allowed me access to their excellent compendium while it was in preparation. followed by unnumbered subsubsections. Chapter 1. sections.g.7. and subsections. in a general way. such a procedure is not proof against transcription errors. To make life easier. These summaries are for purposes of illustration and should not be regarded as finished implementations. Initially. Moreover. Numbering is by section. it is difficult to verify the correctness of algorithms written in pseudocode. Theorem 2. As usual. References to Volume I of this series [265] are preceded by the Roman numeral I—e. past and present. However. Briggs.. M. Wherever possible. I should like to thank K. Jean Anderson. Special mention should be made of my long-time copy editor and email friend. Richard Lehoucq invited me to a .

This book is dedicated to these pioneers and their coworkers. but any list would surely include Vel Kahan. Stewart College Park. when I had worked on the problem. But any unified exposition makes it easy to forget that the foundations for these advances had been laid by people who were willing to work in an unexplored field. where confusion and uncertainty were the norm. Beresford Parlett. Shmuel Kaniel. G.PREFACE xix workshop on large-scale eigenvalue problems held at Argonne National Laboratory. I was stunned by the progress made in the area since the late sixties and early seventies. Naming names carries the risk of omitting someone. MD Vol II: Eigensystems . W. and Yousef Saad. Chris Paige. The details of these new and improved algorithms are the subject of the second half of this volume.

This page intentionally left blank .

and reduction by similarity transformations. but a significant part of the book was devoted to the mathematical foundations. eigenvectors. and we will only be able to survey the basics. Here we will pay special attention to the asymptotic behavior of powers of matrices. The second and third sections are devoted to the analysis of eigensystems. This happy mixture of the theoretical and computational has continued to characterize the area— computational advances go hand in hand with deeper mathematical understanding. The predominant theme of this section is the systematic reduction of matrices to simpler forms by similarity transformations.1 ElGENSYSTEMS In 1965 James Hardy Wilkinson published his classic work on numerical linear algebra. the theory of eigensystems is best treated in terms of complex matrices. Consequently. with an emphasis on the reduction of a matrix by similarity transformations to a simpler form. and their reality often results in computational savings. His purpose was to survey the computational state of the art. The majority of eigenvalue problems involve real matrices. We then present Schur's reduction of a triangular matrix to triangular form by unitary similarity transformations—a decomposition that serves as 1 . However. The second section concerns norms and the their relation to the spectrum of a matrix. We begin with basic definitions and their consequences. One of these forms — the Schur decomposition—is the goal of the QR algorithm. The third section is devoted to perturbation theory—how eigensystems behave when the original matrix is perturbed. The Algebraic Eigenvalue Problem. the workhorse of dense eigenvalue computations. 1. THE ALGEBRA OF EIGENSYSTEMS In this section we will develop the classical theory of eigenvalues. throughout this chapter: Unless otherwise stated A is a complex matrix of order n. some of which had been developed by Wilkinson himself. This is a large subject. which are important in the analysis of many matrix algorithms. The first section consists of an exposition of the classic algebraic theory of eigensy stems. The purpose of this chapter is to provide the mathematical background for the first half of this volume.

The advantage of having this representation is that multiplication by A is reduced to a scalar operation along the eigenvector.2). one might require that XHXx = 1 or that some component of x be equal to one. Here are some comments on this definition. Further reduction to block diagonal form is treated in § 1. To keep track of multiplicities we define the spectrum of a matrix to be a multiset. when the eigenvector corresponding to an eigenvalue is unique up to a scalar multiple. Since the two are inseparable. then (A. The multiset of eigenvalues of A is called the SPECTRUM of A and is denoted by A( A). • Eigenvalues can be repeated. EIGENSYSTEMS a springboard for computing most others. Perhaps the best way to think of an eigenvector x of a matrix A is that it represents a direction which remains invariant under multiplication by A. The reduction depends on our ability to solve a matrix equation called Sylvester's equation. Left eigenpairs and eigenvectors will always be qualified. one often specifies a normalization ofx. i. To remove this trivial nonuniqueness.5. all equal to one (see Theorem 1. it is easy to show that if Ax = Xx then Thus. For example.1.e. In any event. Finally in §1. the effect of a power of A along x can be determined by taking the corresponding power of A. a set in which a member may be repeated. we will formalize their pairing in the following definition.. The pair (A. it is natural to regard the identity matrix of order n as having n eigenvalues. which we investigate in some detail. • If (A. we will refer to the one we are currently interested in as the eigenvector. y) is called a LEFT EIGENPAIR ify is nonzero and yHA = Xy11. For example. EIGENVALUES AND EIGENVECTORS Definitions This volume has two heroes: the eigenvalue and its companion the eigenvector. The corresponding eigenvalue A is then the representation of A in the subspace spanned by the eigenvector x. Definition 1. 1. For example.6 we present the Jordan canonical form. Let A be of order n. x) is called an EIGENPAIR or a RIGHT EIGENPAIR of A if The scalar X is called an EIGENVALUE of A and the vector x is called an EIGENVECTOR. • The unqualified terms "eigenpair" and "eigenvector" refer to right eigenpairs and right eigenvectors.1. .2 CHAPTER 1. rx) is also an eigenpair. The pair (A. x) is an eigenpair and r ^ 0.

then it has a null vector x. • Triangular matrices. If no diagonal element of AH is equal to akk.A must be singular.A is singular.SEC. x) is an eigenpair of A. then Thus (a. then ctkkl—AH is nonsingular. A/ . if (A. • Singular matrices. before we formally establish the existence of eigenpairs. The characteristic equation and the existence of eigenpairs We now turn to the existence of eigenpairs. say &. Conversely..Since any diagonal element of A must have a first appearance on the diagonal. However. if XI . and (A. If (A. Thus the eigenvalues of A are Vol II: Eigensystems . x) is an eigenvector of A it must have a last nonzero component. x) is an eigenpair of A.. then A has a null vector. and it is easily verified that is an eigenvector of A corresponding the the eigenvalue akk.. denotes the z'th unit vector (i. it follows that the diagonal elements of A are eigenvalues of A. e{) is an eigenpair of A. 1. i. x) is an eigenpair of A. the vector whose «th component is one and whose other components are zero). it will be convenient to examine three important cases in which a matrix must by its nature have eigenpairs. Conversely. then \x Ax = 0. Let A be a triangular matrix partitioned in the form where AH is of order k.. If A is singular. It then follows from the equation that akk£k = ^£k or A = akk. Since x is nonzero. a vector x / 0 for which Ax = 0.e. If A is diagonal and e. THE ALGEBRA OF EIGENSYSTEMS Existence of eigenpairs: Special cases 3 In a moment we will show that any matrix has an eigenpair.Thus the eigenvalues of a triangular matrix are precisely its diagonal elements. or equivalently (\I-A)x = Q. It follows that (0.e. • Diagonal matrices.

To do so. This theorem and the accompanying definition has two consequences. counting algebraic multiplicities every matrix of order n has n eigenvalues. Since a matrix is singular if and only if its determinant is zero. are distinct and The eigenvalues of A are the scalars A. EIGENSYSTEMS the scalars for which A/ — A is singular. Since a polynomial of degree n has exactly n roots. yHA = Ay H . We will also use the phrase "counting multiplicities" as a reminder that multiplicities count. in addition to a right null vector. Matrices of order two In many applications it is useful to have the the characteristic polynomial of a 2x2 matrix.2. the eigenvalues of A are the scalars for which The function ^(A) is a monic polynomial of degree n called the characteristic polynomial of A. • It follows from the theorem that we do not need to distinguish between "right" and "left" eigenvalues. • It is important to keep track of the algebraic multiplicity of the eigenvalues. Let A be of order n. say. i.3).4 CHAPTER 1. a matrix of order n has n eigenvalues. Equivalently. then. We have thus shown that the eigenvalues of A are the roots of the characteristic equation (1. it has a left null vector yH.. t/H is a left eigenvector corresponding to A. Thus every eigenvalue has both a left and right eigenvector. More precisely. we have already defined A( A) to be a multiset. For this matrix An easily memorized equivalent is .e. Thus. andletpA(^) = det(A/ — A) be its characteristic polynomial Thenpp(X) can be factored uniquely in the form where the numbers A. Theorem 1. which are said to have ALGEBRAIC MULTIPLICITY mi. from the fundamental theorem of algebra we have the following theorem. For if A/ — A is singular..

22 7. In the presence of the screen this plane wave is diffracted. x in R.22 is a cross-sectional view of two parallel plates with a screen obstructing in part the plane z = 0. The concentration u satisfies Express u everywhere in terms of its values inside R and derive an integral equation for u(x). z) satisfies . the plane wave elv/Az could propagate in the channel. If the screen were absent.) 7. FIGURE 7. The absorption coefficient is the constant k2 outside the bounded region R and a function k2 .53 Figure 7. A point source is located at x0 in the exterior of JR. The boundary condition on the plates and on the screen is that the normal derivative of the field vanishes. Characterize this normal derivative by an integral equation.kl(x) inside R. The total field u can be written where us(x.§7. Compare this result with diffraction by a thin disk of the shape of the aperture (with vanishing field on the disk.52 Consider steady diffusion in an absorbing medium.16] WIENER-HOPF METHOD 329 derivative of the field in the aperture.

It follows that the complex eigenvalues of a real matrix occur in conjugate pairs. If (A. but it can be less. x). where A is complex. GEOMETRIC MULTIPLICITY AND DEFECTIVE MATRICES Geometric multiplicity The fact that we refer to the multiplicity of an eigenvalue as a root of the characteristic equation as its algebraic multiplicity suggests that there is another kind of multiplicity. the real and imaginary parts of the eigenvector x corresponding to a complex eigenvalue A are linearly independent. the real and imaginary parts ofx are linearly independent 1. Actually. Then the set of all eigenvectors corresponding to A together with the zero vector is the null space of XI — A. To see this write x = y + i z and assume that. The theorem shows that the set of eigenvectors corresponding to an eigenvalue along with the zero vector forms a subspace. It follows that or is an eigenvector of A corresponding to A. we get Ax = Ax.6 CHAPTER 1. we must have z = ry for some real scalar r. if we have computed a complex eigenpair. that need not be true. A is also real—a contradiction. then its characteristic polynomial is real. if A is real. The following theorem sets the stage for defining it. Let X be an eigenvalue of A.12). The GEOMETRIC MULTIPLICITY of an eigenvalue A is the dimension of the subspace spanned by its eigenvectors orequivalentlym\l(XI — A). Now the complex zeros of a real polynomial occur in conjugate pairs. and such pairs have the same algebraic multiplicity. Unfortunately. Since A and x are real and x ^ 0. Moreover. then on taking conjugates we find that Ax = Xx. EIGENSYSTEMS of a real matrix have a special structure that we can exploit to eliminate the use of complex arithmetic. Its proof is left as an exercise. then so is (A. Theorem 1. Moreover. Then for y and z to be linearly dependent. Hence on multiplying the relation Ax = \x by 1 — i r. Thus.5. The geometric multiplicity is never greater than the algebraic multiplicity (see the notes following Theorem 1. we have effectively computed the conjugate eigenpair. we can say more. It is natural to conjecture that the algebraic and geometric multiplicities of an eigenvalue must be the same. y is nonzero. Definition 1. Specifically. Theorem 13. as the following example shows.4. This suggests the following definition. The dimension of that subspace is the maximal number of linearly independent eigenvectors associated with the eigenvalue. say. The complex eigenvalues of a real matrix A occur in conjugate pairs of equal algebraic multiplicity. x) is a complex eigenpairofA. For if Ax — Xx. It follows that the vector is real and nonzero. We may summarize these results in the following theorem.2. .

for example. as the following example shows. a defective matrix has too few eigenvectors. it follows that Thus we can replace the process of computing Aky by multiplication of y by A with the less expensive process of taking the powers of the eigenvalues and summing their products with their eigenvectors. A matrix with one or more defective eigenvalues is also said to be defective.-. Example 1. the perturbed eigenvalues are ±10~8—eight orders of magnitude greater than the perturbation in the matrix. from which it follows that 0 is an eigenvalue of algebraic multiplicity two. 6 — 10~16. l)-element. This represents a dramatic perturbation of the eigenvalue. Unfortunately. Example 1. let us agree to say that the matrix A has a complete system of eigenvectors if it has n linearly independent eigenvectors Xi corresponding to eigenvalues Az-.) Then any vector y can be expressed in the form An expansion like this is particularly useful in numerical computations. Hence the geometric multiplicity of the eigenvalue 0 is one. Defectiveness The matrix J2(0) is an example of a Jordan block.6 shows that not all matrices have complete systems of eigenvectors.SEC. It is also an example of a defective matrix. On the other hand. First. x z ) form a complete system. VolII: Eigensystems . Let 7 The characteristic equation of J2(0) is A2 = 0. 1. If.7.. To see why this is a problem. n).. . The matrix differs from J2(0) by a perturbation ofe in its (2. since AkXi — X^Xi (i — 1. (We also say that the eigenpairs (A.. The second difficulty with defective matrices is that defective eigenvalues are very sensitive to perturbations in the matrix elements.6. For example. Defectiveness introduces two practical difficulties. the null space of J2(0) is spanned by ei. Specifically. THE ALGEBRA OF EIGENSYSTEMS Example 1. we say that an eigenvalue whose algebraic multiplicity is less than its geometric multiplicity is defective. But its eigenvalues are ±V/e.

The class of transformations and the final form will depend on the problem. Let A be of order n and let U be nonsingular. We also say that B is obtained from A by a SIMILARITY TRANSFORMATION. but those eigenvectors will be nearly linearly dependent. We will also say the the corresponding eigenvector and eigenpair are simple. one of the most successful algorithms for solving least squares problems is based on the orthogonal reduction of X to triangular form. EIGENSYSTEMS Because rounding a defective matrix will generally cause its eigenvalues to become simple. In fact. Thus defectiveness is a phenomenon we will have to live with. though distinct. in transforming the least squares problem of minimizing 11 y—X b \ \ 2 one uses orthogonal matrices because they preserve the 2-norm. it is sometimes argued that defective matrices do not exist in practice. Example 1. Xi) (i — 1.9. However. . Moreover.3. Then the matrices A and B = U~1AU are said to be SIMILAR. In other cases we must come to grips with it. the perturbed eigenvalues.12).. Simple eigenvalues We have just seen that multiple eigenvalues may be accompanied by theoretical and practical difficulties that do not affect their nonmultiple brethren. where defectiveness is only incidental to the problem. by using a Schur decomposition (Theorem 1.6) problematic. It may have a complete system of eigenvectors. SIMILARITY TRANSFORMATIONS A key strategy in matrix computations is to transform the matrix in question to a form that makes the problem at hand easy to solve. Let The the individual relations Ax{ — A t # t .. we can work around it—for example. To distinguish this latter class of eigenvalues. we will say that an eigenvalue is simple if it has algebraic multiplicity one. 1. The following is an important example of a similarity transformation. For example.-.can be combined in the matrix equation . For the eigenvalue problem.n).8 CHAPTER 1. the transformations of choice are called similarity transformations. which makes the computation of expansions like (1. Definition 1.8. We could wish our tools were sharper. Let the matrix A have a complete system ofeigenpairs (A. In some cases. are just as sensitive as the original multiple eigenvalues. a matrix that is near a defective matrix is not much better than a defective matrix..

Thus A and B have the same characteristic polynomial and hence the same eigenval ues with the same multiplicities. which is the product of its eigenvalues. Hence we may write Thus we have shown that a matrix with a complete system ofeigenpairs can be reduced to diagonal form by a similarity transformation. then (A. is the sum of the eigenvalues of A counting multiplicities and hence is unchanged by similarity transformations.. i. similarity transformations preserve functions of the eigenvalues.U~1AU be similar to A. x) is an eigenpair of A. the matrix X is nonsingular. by reversing the above argument.10.e.11. then so that (A. Then the eigenvalues of A and B are the same and have the same multiplicities. A nice feature of similarity transformations is that they affect the eigensystem of a matrix in a systematic way. If (A. then the columns ofX are eigenvectors of A. Let A be of order n. Let A be of order n and let B . or) is an eigenpair of A. • In addition to preserving the eigenvalues of a matrix (and transforming eigenvectors in a predictable manner). U~lx) is an eigenpair of B.SEC. is clearly unchanged by similarity transformations. The following theorem shows that the trace is also invariant. The trace of A. VolII: Eigensystems . we see that if A can be diagonalized by a similarity transformation X~l AX. Theorem 1. whose columns are eigenvectors of A. THE ALGEBRA OF EIGENSYSTEMS 9 Because the eigenvectors Xi are linearly independent. which form a complete system. If (A. as the following theorem shows. The determinant of a matrix. Theorem 1. Conversely. 1. U lx) is an eigenpair of B.

. we can write Moreover. For an example. EIGENSYSTEMS Proof. Schur form In asking how far one can reduce a matrix by a class of transformations.26). we can hope to reduce a matrix A to triangular form by a unitary similarity. From the expression on the left. More important. Let the eigenvalues of A be A i . Chapter 2. The following theorem shows that this hope is realized. if. The characteristic polynomial of A can be written in two ways: By expanding the determinant on the right. But these elements are constrained by n(n+l)/2 conditions: n(n-l)/2 orthogonality conditions and n normality conditions.1.7)]. so that a unitary transformation can easily be undone. We have seen (Example 1. Since a triangular matrix has n(n-l)/2 zero elements. we find that the same coefficient is the negative of the sum of the eigenvalues of A.9) that a matrix with a complete system of eigenpairs can be reduced to diagonal form by a similarity transformation. Unfortunately. we will now investigate the existence of intermediate forms that stop before full diagonality. Just print out the trace after each transformation. Unitary transformations play a leading role in matrix computations. see (2. SCHUR DECOMPOSITIONS Unitary similarities This subsection is devoted to the reduction of matrices by unitary and orthogonal similarities. it is useful to ask how many degrees of freedom the transforming matrices possesses. where U is unitary and we perturb B by an error to get B = B -f F.4.10 CHAPTER 1. This leaves n(nl)/2 degrees of freedom. A unitary matrix has n2 elements. . say. its matrix of eigenvectors may be nearly singular. \\E\\2 = \\F\\2 [see (2. . For example. A n . if we set a small element of B to zero—say when an iteration is deemed to have converged—it is equivalent to making a perturbation of the same size in A. we find that the coefficient of A n 1 is the negative of the sum of the diagonal elements of A. If it changes significantly. Thus we can throw an error made in a computed transformation back on the original matrix without magnification. 1. there is a bug in the code. so that they do not magnify errors. They are trivial to invert. • This result is invaluable in debugging matrix algorithms that proceed by similarity transformations. Consequently. not all matrices can be so reduced. and even when one can. They have spectral norm one. . .

SEC. Then there is a unitary matrix U such that where T is upper triangular. By appropriate choice ofU. the eigenvalues of A. A n . Let x\ be an eigenvector corresponding to AI and assume without loss of generality that Z H X = 1.. . . . may be made to appear in any order. Let the matrix X — (x\ X^) be unitary. Xn be an ordering of the eigenvalues of A. If we set then it is easily verified that which has the required form.. which are the diagonal elements of T.12 (Schur).5). An appearing on its diagonal in the prescribed order. . .. Because X is unitary we h a v e x ^ x i — 1 a n d x ^ X i — 0. • The decomposition A = UTU11 is called a Schur decomposition of A. Then the last equality following from the relation Ax\ — \\x\. . By an obvious induction. Let A be of order n. the eigenvalues of M are A 2 . Proof. 1. Hence Now by (1. THE ALGEBRA OF EIGENSYSTEMS 11 Theorem 1.. . • Here are some comments on this theorem.g. Let AI . is an upper triangular matrix with the eigenvalues A 2 . . In addition multiple eigenvalues can cause the decomposition not to be unique—e. . we may assume that there is a unitary matrix V such that V^MV — T^ where T^. It is not unique because it depends on the order of the eigenvalues in T. there may be more Vol II: Eigensystems .

Chapter 2]. if one has an eigenpair at hand. more complicated forms in theoretical and computational applications. Then it is easy to show that the last n—k components of any eigenvector corresponding to A must be zero [see (1. Unique or not. However. For example. EIGENSYSTEMS than one choice of the eigenvector x\ in the proof of Theorem 1. Nilpotent matrices We now turn to applications of the Schur decomposition. To prove the same fact via the Schur decomposition. to be treated later. Then (1. it provides a constructive method for reducing the eigenvalue problem to a smaller one in which the eigenpair has been removed [see (1. A matrix A is nilpotent if Ak = 0 for some fc > 1. It follows that the eigenvectors corresponding to A are confined to a subspace of dimension k and cannot span a subspace of dimension greater than k. Let be the QR decomposition of X. if T is an upper triangular matrix of order n with zeros on its diagonal. In general. the Jordan canonical form. whose square is zero.9) is equivalent to The right-hand side of this relation is upper triangular with the eigenvalues of A on its diagonal. the columns of U are called Schur vectors. This process is known as deflation and is an important technique in eigencalculations. • When A has a complete system of eigenvectors.7. then the diagonal and first k-l superdiagonals of Tk are zero. let A have algebraic multiplicity k.19). the Schur vectors bear an interesting relation to the eigenvectors. Let X be the matrix of eigenvectors of A. shows that the geometric multiplicity of an eigenvalue cannot exceed its algebraic multiplicity. xi). Hence UHAU is a Schur decomposition of A. as can easily be verified. The zero matrix is a trivial example of a nilpotent matrix. beginning with an analysis of nilpotent matrices.12 CHAPTER 1. Hence Tn = 0. • Given the eigenpair (Ai. since it requires a ready supply of eigenpairs. the Schur vectors are the columns of the Q-factor of the matrix of eigenvectors arranged in the prescribed order. so that where A is the diagonal matrix of eigenvalues. A nontrivial example is the matrix </2(0) of Example 1. • The proof of Theorem 1. the unitary matrix X = (xj ^2) in the proof can be constructed as a Householder transformation [see (2. • Schur decompositions can often replace other. .8)]. In other words. so that all the others are distinct from A.12 is not constructive. The following theorem generalizes the above observations. Let the first k diagonals of T = UHAU be A.12.2)].

Thus T is both upper and lower triangular and hence is diagonal. then so that T itself is Hermitian. Proof. We call the decomposition the SPECTRAL DECOMPOSITION OF A Vol II: Eigensystems .1). THE ALGEBRA OF EIGENSYSTEMS 13 Theorem 1. Let Ak = 0 and let (A. from the relation we see that if ttt. if A is Hermitian and T = U^AU is a Schur form of A. Moreover.denotes the ith column of U then (A. suppose that the eigenvalues of A are all zero.-. Then any Schur decomposition of A has the form where T has zeros on its diagonal.13. In this case. ttt-) is an eigenpair of A. Specifically. say. and it has a complete orthonormal system of eigenvectors. its eigenvalues are real. Since the diagonal elements of a Hermitian matrix are real. A matrix A of order n is nilpotentifand only if its eigenvalues are all zero. the Az are real. To summarize: If A is Hermitian. It follows that Hermitian matrices The elementary facts about Hermitian matrices can be obtained almost effortlessly from the Schur form.SEC. Then from (1. Conversely. and it follows that A = 0. 1. x) be an eigenpair of A.

15. By computing the (l. Theorem 1. We claim that T is also diagonal. so that T22 is normal.14. partition By Theorem 1. It then follows from the relation TET = TTH that ^22^22 = ^22^22. then UHAU is also normal.15.l)-blocks of each side of the relation A H A = AAH we find that Since for any matrix B it follows that If A is a normal matrix and U is unitary. It follows that: A normal matrix has a complete orthonormal system ofeigen vectors. and unitary matrices. An obvious induction now establishes the claim. This property is true of a more general class of matrices. (1 -11) We add for completeness that the eigenvalues of a unitary matrix have absolute value one. To see this. which uses the Frobenius norm || • ||p to be introduced in §2. The eigenvalues of a skew Hermitian matrix are zero or purely imaginary. skew Hermitian matrices (matrices for which AH = —A). t12 = 0. the matrix T is normal. The three naturally occurring examples of normal matrices are Hermitian matrices. if T is the triangular factor in a Schur decomposition of A. Let the normal matrix A be partitioned in the form Then Proof. . EIGENSYSTEMS The key observation in establishing (1.10) is that the triangular factor in any Schur decomposition of a Hermitian matrix is diagonal. A matrix A is NORMAL if A H A = AA H . Consequently.14 Normal matrices CHAPTER 1. Definition 1. A key fact about normal matrices is contained in the following theorem.1.

for example. But we make no such assumption. we must first investigate conditions under which the equation has a solution. which has no complete system of eigenvectors. We must. Such a matrix could. be obtained by partitioning a Schur decomposition. preserve the block upper triangularity of A. of course. such as a block diagonal form. THE ALGEBRA OF EIGENSYSTEMS 1. Thus we are led to a transformation of the form whose inverse is Applying this transformation to A we get Thus to achieve block diagonality we must have This equation is linear in the elements of Q. However. cannot be diagonalized. But only the first Schur vector—the first column of the orthogonal factor—is an eigenvector.SEC. We have seen (Example 1.5. Thus to proceed beyond Schur form we must look to coarser decompositions. Unfortunately. in which case L and M would be upper triangular. We can make the transformation even simpler by requiring its diagonal blocks to be identity matrices. we can block diagonalize A. a defective matrix. From block triangular to block diagonal matrices Let A have the block triangular form where L and M are square. which can be accomplished by making the transformation itself block upper triangular. BLOCK DIAGONALIZATION AND SYLVESTER'S EQUATION 15 A Schur decomposition of a matrix A represents a solution of the eigenvalue problem in the sense that the eigenvalues of the matrix in question can be obtained by inspecting the diagonals of the triangular factor. then the diagonalizing transformation provides a complete set of eigenvectors. 1. Vol II: Eigensystems . If we can solve it.9) that if we can diagonalize A. Suppose that we wish to reduce A to block diagonal form by a similarity transformation—preferably a simple one.

Hence. say of orders n and m. The Sylvester operator S is nonsingular if and only if the spectra of A and B are disjoint. Specifically. we use the fact that an operator is nonsingular if every linear system in the operator has a solution. y] be a left eigenpair of B.. The first step is to transform this system into a more convenient form. and S is singular. if A and B have a common eigenvalue— then X is a "null vector" of S. if and only if Proof. This equation is called a Sylvester equation. we have the following theorem. Then Hence if we can choose A = // — that is. and it occurs in many important applications. Theorem 1. i. Then Sylvester's equation can be written in the form If we set and we may write the transformed equation in the form . It turns out that the nonsingularity depends on the spectra of A and B. x) be a right eigenpair of A and let (//. Let X = xyH.e. EIGENSYSTEMS Let us consider the general problem of solving an equation of the form where A and B are square. To prove the converse. our problem reduces to that ot determining when S is nonsingular. If we introduce the Sylvester operator S: defined by then we may write Sylvester's equation in the form 1 he operator ^ is linear in X .16 Sylvester's equation CHAPTER 1. let (A. Consider the sy stem SX — C.16. To prove the necessity of the condition. Let T = VHBV be a Schur decomposition of B. Here we will be concerned with sufficient conditions for the existence of a solution.

a = //. In general. Moreover. Hence vi.. We have found a solution Y of the equation Hence is a solution of We may restate this theorem in a way that will prove useful in the sequel. A . From the second column of the system (1. Thus we have the following corollary. The eigenvalues of S are the differences of the eigenvalues of A and B. X) is an eigenpair of S. suppose that we have found 3/1. as argued above.17. then (1. is well defined. Hence 3/2 is well defined.. Conversely.From the fcth column of (1. Let SX = crX.12) shows that (A — /*) € A(S). Corollary 1. (A — aI)X . Vol II: Eigensystems . It follows from Theorem 1.SEC. THE ALGEBRA OF EIGENSYSTEMS Let us partition this system in the form 17 The first column of this system has the form Now tn is an eigenvalue of B and hence is not an eigenvalue of A.16 that this can be true if and only if there is a A € A( A) and \i € A(£) such that A .13) we get The right-hand side of this equation is well defined and the matrix on the left is nonsineular. and the first column of Y is well defined as the solution of the equation (A-tuI)yi = di.XB = 0 so that the Sylvester operator X >-> (A — <?I)X — XB is singular. if A € A( A) and \L € A(B). or equivalently a = X — /z. 1. . yk-\. so that (<r.till is nonsingular.. Thus A — t\\I is nonsingular. Equivalently.13) we get the equation Because y\ is well defined the right-hand side of this system is well defined.

the procedure of transforming the Sylvester equation into the form AY .18 CHAPTER 1. as in the proof of Theorem 1. It is summarized in Algorithm 1.XB = C. In the code below all partitionings are by columns.YT = D and solving columnwise for Y represents an algorithm for solving Sylvester's equation. 2. These observations form the basis for one of the best algorithms for solving dense Sylvester equations. 5.1: Solution of Sylvester's Equation o An algorithm Theorem 1. But it is instructive to reflect on the the second part of the proof.16. For a general matrix A these systems would require O(mn3) operations to solve.r^/). 4. Compute a Schur decomposition A — USU^ Compute a Schur decomposition B = VTVH D = UHCV for k = 1 to ro Solve the triangular system (S-tkkI)yk k = dk + X^i" tikl/i end for k * = UYVH Algorithm 1. it involves solving m systems whose matrices are (A . If we now solve columnwise for y. Unless m is quite small. 1. As it stands.16 and its corollary are sufficient for our immediate purposes. 7. this algorithm solves Sylvester's equation AX . which can be solved in 0(n 2 ) operations. Then we can write Sylvester's equation in the form or equivalently where Y = UHXV and D = UHCV. this operation count is prohibitive. the systems have upper triangular matrices of the form S — tkkl. Specifically. let S = UHAU be a Schur decomposition of A. 6. However. More details are given in the notes and references.1. by a further transformation of the problem we can reduce the count for solving the systems to O(mn2). EIGENSYSTEMS Given matrices A £ C n x n and B € C m x m . 3. . If we have a method for computing Schur decompositions.

SEC.12) we can find a unitary matrix X such that T = XHAX is upper triangular with the eigenvalues of A appearing in the order of their appearance in the multisets C\..18.18 we can reduce T to the form In the same way we may apply Theorem 1. Moreover.And so on. Suppose. 1. THE ALGEBRA OF EIGENSYSTEMS Block diagonalization redux Let us now return to our original goal of block diagonalizing the matrix 19 We have seen that a sufficient condition for a diagonalizing similarity to exist is that the equation LQ . . Hence we have the following theorem. By the Schur decomposition theorem (Theorem 1.U£fc of disjoint multisets. that A( A) can be partitioned into the union C\ U. Vol II: Eigensystems . a sufficient condition for this Sylvester equation to have a solution is that L and M have disjoint spectra. to give the following theorem. Theorem 1... for example.18 to M to cut out a block L^ corresponding to £2. Let and suppose that Then there is a unique matrix Q satisfying such that The importance of this theorem is that it can be applied recursively. Then the eigenvalues of LI are disjoint from those of M. £*:• Partition where A(£i) = LI ..QM = —H have a solution. and by Theorem 1. .

where Li is of order raz and has only the eigenvalue A. .20 CHAPTER 1. U Ck of disjoint multisets. .. . Then there is a nonsingular matrix X such that X-lAX = di&g(Ll. we may replace X^ Yi. it allows us to assume. if Si is nonsingular. The answer is the classical Jordan canonical form. ..19 and its corollaries enjoy a certain uniqueness.. Corollary 1. if all the eigenvalues of A are distinct. ra^. Let the matrix A of order n have distinct eigenvalues AI. . JORDAN FORM The reports of my death are greatly exaggerated. which is the topic of this subsection.. that Yi is orthonormal.. Then there is a nonsingularmatrix X such that where If the sets £. we obtain the following corollary. say. However. Then the spaces spanned by the Xi and Yi are unique (this fact is nontrivial to prove.19 by XiSi. and S^lL{Si. \k with multiplicities mi. This lack of uniqueness can be useful.-. and Li in Theorem 1. . Finally. It is natural to ask how much further we can go by relaxing the requirement that the transformation be unitary. We begin with a definition.Lk). 246). A is diagonalizable and has a complete system of eigenvectors. Hence: If A has distinct eigenvalues. TJ 1. For example. EIGENSYSTEMS Theorem 1.20. see p. Mark Twain We have seen that any matrix can be reduced to triangular form by a unitary similarity transformation. YiS~^.6. Let A(A) be the union C\ U . then the blocks Li in this corollary are scalars..19... Let X — (Xi • • • Xk) be partitioned conformally with the pardon of A( A) and let X~l = (Yi .. The decompositions in Theorem 1.• Yk) also be partitioned conformally. correspond to the distinct eigenvalues of A.

j (i = 1.14) where (1. l.-j along with their associated eigenvalues are characteristic of the matrix A. The hard part is reducing the individual blocks to the J.14) and (1./. Any reduction of A into Jordan blocks will result in the same number of blocks of the same order having the same eigenvalues. Theorem 1. so A has geometric multiplicity one.15) Here are some comments about this theorem. • The Jordan canonical form has a limited uniqueness. . . The Jordan canonical form—a decomposition of a matrix into Jordan blocks—is described in the following theorem.SEC.22. The spaces spanned by the Xi are unique. . However.Then there are unique integers m. THE ALGEBRA OF EIGENSYSTEMS Definition 1. the columns of the Xi are not.21. A^ of algebraic multiplicities m i . But rankfAJ — J m (A)] is m-1. A JORDAN BLOCK Jm(X) isa matrix of order m of the form 21 From an algebraic point of view a Jordan block is as defective as a matrix can be.. j = 1.-) satisfying and a nonsingular matrix X such that (1. Let AeCnXn have k distinct eigenvalues AI. in (1. • The proof of the theorem is quite involved. . each block corresponding to a distinct eigenvalue of A. Principal vectors may be chosen in different ways. ... Vol II: Eigensystems . The multiplicities rat and the numbers m. For more see the notes and references. ... &.15). It begins with a reduction to block diagonal form. But in spite of this nonuniqueness. . we still speak of the Jordan canonical form of a matrix.. . mk.. The eigenvalue A of the matrix J m (A) has algebraic multiplicity m..

14). Perturbations in the matrix send the eigenvalues flying. Under the circumstances there is an increasing tendency to turn whenever possible to friendlier decompositions—for example. .15). Note that the first and only the first principal vector associated with a Jordan block is an eigenvector. and reports of its death are greatly exaggerated. EIGENSYSTEMS • We can partition X at several levels corresponding to levels in the decomposition.22 CHAPTER l. But otherwise attempts to compute the Jordan canonical form of a matrix have not been very successful. Nonetheless the simple relations (1. The spaces spanned by the columns of the Xi are examples of eigenspaces (to be treated in Chapter 4). If we now partition X{ in the form in conformity with (1. Finallv. It then follows from the relation AX = XJ that i. the action of A on the matrix Xi is entirely specified by the matrix Jt. However. if we write then the vectors satisfy A set of vectors satisfying (1. First partition in conformity with (1.7). the matrix becomes diagonalizable (Example 1.e. in principle. as a mathematical probe the Jordan canonical form is still useful.17) is called a sequence of principal vectors of A.. so that. it can be shown that Thus the actions of A on the matrices Xij are specified by the individual Jordan blocks in the decomposition. The simplicity of its blocks make it a keen tool for conjecturing new theorems and proving them. then it may be possible to recover the original form. Against this must be set the fact that the form is virtually uncomputable.17) make calculations involving principal vectors quite easy. to the Schur decomposition. The Jordan canonical form is now less frequently used than previously. If we know the nature of the defectiveness in the original matrix.

Ruhe. however." not for the polynomial equation but for the system Ax — Xx.7. 96]. Ch. for example. More or less systematic treatments of eigensystems are found in books on matrix computations. as is Watkins's Fundamentals of Matrix Computations [288]. Golub and Van Loan's Matrix Computations [95] is the currently definitive survey of the field. 1. Eigenvalues have been called many things in their day. History and Nomenclature A nice treatment of the history of determinants and matrices. "characteristic" has an inconveniently large number of syllables and survives only in the terms "characteristic Vol II: Eigensystems . Special mention should be made of Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide edited by Bai. 33]. There are too many of these books to survey here. Demmel. and van der Vorst [10]. Chatelin's Eigenvalues of Matrices [39] and Saad's Numerical Methods for Large Eigenvalue Problems: Theory and Algorithms [233] are advanced treatises that cover both theory and computation. 1762] to solve systems of differential equations with constant coefficients. see [213. At some point Hilbert's Eigenwerte inverted themselves and became attached to matrices. insightful treatment with a stress on geometry. The classics of the field are Householder's Theory of Matrices in Numerical Analysis [121] and Wilkinson's Algebraic Eigenvalue Problem [300].) The nontrivial solutions of the equations Ax = Xx were introduced by Lagrange [155. They contain a wealth of useful information. However. Datta's Numerical Linear Algebra and Applications [52] is an excellent text for the classroom. p. In 1829 Cauchy [31] took up the problem and introduced the term "characteristic equation. including eigenvalues.119]. The term "characteristic value" is a reasonable translation of Eigenwert. should be made of Bellman's classic Introduction to Matrix Analysis [18] and Horn and Johnson's two books on introductory and advanced matrix analysis [118. Trefethen and Bau's Numerical Linear Algebra [278] is a high-level. is given by Kline [147. Our term "eigenvalue" derives from the German Eigenwert. Special mention. It consists of sketches of all the important algorithms for large eigenproblems with brief treatments of the theoretical and computational background and an invaluable bibliography. It is a perfect complement to this volume and should be in the hands of anyone seriously interested in large eigenproblems. He makes the interesting point that many of the results we think of as belonging to matrix theory were derived as theorems on determinants and bilinear and quadratic forms. NOTES AND REFERENCES General references 23 The algebra and analysis of eigensystems is treated in varying degrees by most books on linear and applied linear algebra.SEC. which was introduced by Hilbert [116. (Matrices were introduced by Cayley in 1858 [32]. THE ALGEBRA OF EIGENSYSTEMS 1. 1904] to denote for integral equations the reciprocal of our matrix eigenvalue. Dongarra. The nomenclature for integral equations remains ambiguous to this day. but they only gained wide acceptance in the twentieth century.

The linearity of the Sylvester operator means that it has a matrix representation once a convention is chosen for representing X as a vector. and an extensive bibliography. and it can often replace more sophisticated decompositions—the Jordan canonical form. Schur form The proof of the existence of Schur decompositions is essentially the one given by Schur in 1909 [234] — an early example of the use of partitioned matrices. historical comments.24 CHAPTER 1. The earliest use I can find is in Parlett's Symmetric Eigenvalue Problem [206]. The day when purists and pedants could legitimately object to "eigenvalue" as a hybrid of German and English has long since passed. The decomposition can be computed stably by the QR algorithm (see §§2-3." "eigenspace. and Van Loan [90] have proposed a Hessenberg-Schur variant that saves some operations.1 is a stripped down version of one due to Bartels and Stewart [14].XB = 0 [273. The term "eigenpair" used to denote an eigenvalue and eigenvector is a recent innovation. which contains error and perturbation analyses." (For symmetric matrices the characteristic equation and its equivalents are also called the secular equation owing to its connection with the secular perturbations in the orbits of planets. EIGENSYSTEMS equation" and "characteristic polynomial. Golub. Ch. The German eigen has become a thoroughly naturalized English prefix meaning having to do with eigenvalues and eigenvectors. Sylvester considered the homogeneous case AX . The difference is that a real Schur form is used to keep the arithmetic real. For A and B of order two he showed that the equation has a nontrivial solution if and only if A and B have a common eigenvalue (essentially Theorem 1. In this representation the Sylvester equation becomes where ® is the Kronecker product defined by Algorithm 1. Nash." or "eigenexpansion" without fear of being misunderstood. Higham [114. . Chapter 2). 1884].16) and then stated that it easily generalizes.) Other terms are "latent value" and "proper value" from the French valeur propre. The customary representation is vec(X) in which the columns of X are stacked atop one another in their natural order. J. Sylvester's equation Sylvester's equation is so called because J. for example—in matrix algorithms. Thus we can use "eigensystem. 15] devotes a chapter to Sylvester's equation.

7 shows that eigenvalues that should be clustered do not have to lie very close to one another.1. a function that in some sense measures the size of a matrix. and reduce the matrix to block diagonal form. a topic which will often recur in the course of this volume. this sensible-sounding procedure is difficult to implement. schemes for automatically block diagonalizing a matrix by a well-conditioned similarity transformation have many ad hoc features. an algebra of eigensy stems can be erected on the definition. NORMS. We now turn to the analytic properties of eigensystems. The idea is to generate clusters of eigenvalues associated with nearly dependent eigenvectors. For more see [17. 2. 57. We begin with a brief review of matrix and vector norms.99]. Vol II: Eigensystems . For matrices the metric is generated by a norm. as the following definition shows. find bases for their eigenspaces. one can afford to lavish computation on their subsequent analysis. But once the existence of eigenpairs is established. All the above comments about computing block diagonal forms of a matrix apply to the Jordan form—and then some. In the next subsection we study the relation of norms to the magnitude of the largest eigenvalue of a matrix—the spectral radius of the matrix.140. If the blocks are small. The first section of this chapter was devoted to that algebra. Thus to say that A has linear elementary divisors is to say that A is nondefective or that A is diagonalizable. Leaving aside the possibility that the smallest cluster may consist of all the eigenvalues. the problem is to find the clusters. NORMS.139. Jordan canonical form The Jordan canonical form was established in 1874 by Camile Jordan [136].61. It is a generalization of the absolute value of a scalar. MATRIX AND VECTOR NORMS Definition Analysis happens when a metric meets an algebraic structure. Example 1. Unfortunately. 2. Consequently. Block diagonalization would seem to be an attractive way out of this problem. which measure the size of matrices and vectors. For more on norms see §1:4. See [96. The characteristic polynomials of the Jordan blocks of a matrix A are called the elementary divisors of A.SEC. one frequently encounters matrices with very ill-conditioned systems of eigenvectors. 2. SPECTRAL RADII. SPECTRAL RADII.60. The results are then applied to determine the behavior of powers of a matrix. AND MATRIX POWERS Eigenpairs are defined by a nonlinear equation and hence are creatures of analysis. AND MATRIX POWERS Block diagonalization 25 Although rounding error prevents matrices that are exactly defective from arising in practice. 218] for the issues.1.

In particular. The third condition—called the triangle inequality— specifies the interaction of the norm with the matrix sum. and we call a norm on either space a vector norm.1. A MATRIX NORM ON C mxn is a function \\-\\: C m x n ->• ]R satisfying the following three conditions: The first two conditions are natural properties of any measure of size. It is frequently used in the equivalent form In this work we will always identify C n x l with the space of n-vectors C n . Examples of norms The norms most widely used in matrix computations are the 1-norm.26 CHAPTER 1. these norms are defined for vectors. EIGENSYSTEMS Definition 2. The Frobenius norm of a vector x is usually written \\x\\2. in which case their definitions take the following forms: The Frobenius norm has the useful characterization which in the case of a vector reduces to The Frobenius vector norm is also called the Euclidean norm because in E2 and R3 the quantity \\x\\p is the ordinary Euclidean length x. the Frobenius norm. for reasons that will become evident shortly. and the oo-norm defined by Each of these norms is actually a family of matrix norms in the sense that each is defined for matrices of all orders. .

suppose that the machine in question has rounding unit eM— roughly the largest number for which fl(l + e) = 1. It follows that Vol II: Eigensystems .SEC. let us examine what happens when a matrix A is rounded. and the Frobenius norm. The vector 1-norm generates the matrix 1-norm and the vector co-norm generates the matrix oo-norm. where the elements ot L are o^ et-j. Norms and rounding error As an example of the use of norms. the oo-norm. It is written || • |J2. Then elementwise where In terms or matrices. NORMS. Because of property 7. AND MATRIX POWERS Operator norms 27 Let ^ be a vector norm on C m and v be a vector norm on C n . The vector 2-norm generates a new norm called the matrix 2-norm or the spectral norm. 2. we have n(A) = A + h. Denote by fl( A) the result of rounding the elements of A. The Frobenius norm is also unitarily invariant. SPECTRAL RADII. Then it is easily verified that the function || • ||M)I/ defined on C m x n by is a matrix norm. It is called the operator norm generated by fj.Unlike the 1-norm. Specifically. where fl denotes floating point evaluation. and v or the matrix norm subordinate to IJL and v. We have already seen examples of operator norms. it is difficult to calculate. Nonetheless it is of great importance in the analysis of matrix algorithms. we say that the 2-norm is unitarily invariant. Here is a list of some of its properties.

as the following example shows. then the matrix as a whole has a small normwise relative error.28 CHAPTER l. On the other hand. It is 10~5 for the first component. although (2. with balanced norms like the oo-norm. Unfortunately. What we have shown is that: In any absolute norm. The property of consistency does the same thing for the product of two matrices. Example 2. and 1 for the third. Limitations of norms We have just shown that if the elements of a matrix have small relative error. rounding a matrix produces a norm wise relative error that is bounded by the rounding unit. Unfortunately the converse is not true. . and in such applications one must either scale the problem to fit the norm or use a norm that gives greater weight to small elements. wehave||£|| < \\A\\cM or The left-hand side of (2. Frobenius. Consistency The triangle inequality provides a useful way of bounding the norm of the sum of two matrices. Now in many applications the accuracy of small elements is unimportant.2) is called the normwise relative error in fl( A). Let Then it is easily verified that so that x has low normwise relative error. and infinity norms are absolute norms.2) is still a good approximation. a small normwise relative error guarantees a similar relative error only in the largest elements of the matrix. the relative errors in the components vary. the commonly used one. the 2-norm is not. EIGENSYSTEMS Thus in any norm that satisfies 11 \A\ \\ = \\A\\ (such a norm is called an absolute norm). In general. But in others it is. Incidentally. which has no accuracy at all. 10~2 for the second. the smaller elements can be quite inaccurate.2.

C m X n .3. For example.SEC. 2. SPECTRAL RADII. and || • ||/tn be matrix norms defined on <C /Xm . F. a n d C / x n .n. • An important special case is a matrix norm || • || defined on 7£ nxn . if then • The 1-. These norms are CONSISTENT if for all A eC / x m and#e<C m X n we have Four comments on this definition. oo. the function denned by is a norm. whenever the product AB is defined.2. In fact there are many such norms. 2-. || • ||m. Then it is easily verified that the function i> defined by is a vector norm. and oo-norms as well as the Frobenius norm are consistent in the sense that for p = 1. NORMS. For let y 6 Cn be nonzero. But it is not a consistent norm. AND MATRIX POWERS 29 Definition 2. For example. But Vol II: Eigensystems .\\itTn. In this case consistency reduces to • Matrix norms do not have to be consistent. • The Frobenius and 2-norms are consistent in the sense that Given a consistent matrix norm || • || on C n x n it is natural to ask if there is a vector norm */ on C n that is consistent with || • ||. Let \\ .

This is an immediate consequence of the following theorem on the equivalence of norms. . The proof of the following theorem is left as an exercise. 2.6. if a sequence converges in one norm. Let \JL and v be norms on C m x n . In particular. If\\ • || is consistent.5. For a proof see Theorem 1:4. then so is Vsim(A). n). Thus convergence in any norm is equivalent to the componentwise convergence of the elements of a matrix. j = 1. Theorem 2. This distance function can in turn be used to define a notion of convergence for matrices. m. if {Ak} is a sequence of matrices.6f/| -»• 0 (i = 1. Theorem 2. Let \\-\\bea norm onCHXn. we write At first glance.. Although all norms are mathematically equivalent. as a practical matter they may say very different things about a converging sequence.4.IfUis nonsingular. Then there are positive constants a and T such that for allAe€mxn Thus it does not matter which norm we choose to define convergence. We begin with a simple but far-reaching definition. Specifically. Norms and convergence Given a matrix norm || • || on C m x n . then the functions are norms.2. it is easy to see that \\Ak .B\\p -»• 0 if and only if |aj^ ..4.30 Thus we have shown that: CHAPTER 1.. we may define the distance between two matrices A and B by \\A — B\\. then it converges in any norm.. But in fact.11.. EIGENSYSTEMS Any consistent matrix norm onCnxn has a consistent vector norm. For an example. at least in the initial stages. . THE SPECTRAL RADIUS Definition In this subsection we will explore the relation of the largest eigenvalue of a matrix to its norms. . (2-4) Another useful construction is to define a norm by multiplying by a matrix.. this definition of convergence appears to depend on the choice of the norm || • ||. see the discussion following Example 2.

p(Ak) = p(A)k. Since J2(0) is nonzero. the Jordan block has spectral radius zero. If (A. It gets its name from the fact that it is the radius of the smallest circle centered at the origin that contains the eigenvalues of A. Specifically. Theorem 2. there is no norm that reproduces the spectral radius of J2(0). It also tracks the powers of a matrix with greater fidelity than a norm. there are norms that approximate it. whereas the best we can say for a consistent norm is that || Ak\\ < \\A\f.SEC.7. Let A be of order n. In general. Theorem 2. x) for which |A| = p(A) a DOMINANT EIGENPAIR. a consistent norm will exceed the spectral radius of A. The eigenvalue A is called a DOMINANT EIGENVALUE and its eigenvector is called a DOMINANT EIGENVECTOR. but ||«/o(0)||oo = 1. In other words.8. NORMS. 2. Thus the spectral radius is the magnitude of the largest eigenvalue of A. In fact. We begin our investigation of norms and spectral radii with a trivial but important theorem. Then Proof.6. However.4)]. AND MATRIX POWERS 31 Definition 2. Let i/ be a vector norm consistent with the matrix norm || • || [see (2. as the following theorem shows. then Since A is an arbitrary eigenvalue of A. Let A be of order n. and let \\ • || be a consistent norm on C n x n . any norm of J2(0) must be positive. the matrix must be large. something worse is true. x) is an eigenpair of A with j/(z) = 1.e (depending on A and e) such that Vol II: Eigensystems . For example. Then for any e > 0 there is a consistent norm II • IU. Let A be of order n. Then the SPECTRAL RADIUS of A is the number We call any eigenpair (A. If it is large. SPECTRAL RADII. Norms and the spectral radius The spectral radius has a normlike flavor.

e by then by Theorem 2. If the dominant eigenvalues of A are nondefective. then there is a consistent norm \\-\\A (depending on A) such that Proof. there is an 77 such that HD"1 JVjD||oo < c. we can diagonalize it to get the matrix AI . we can reduce A to the form where AI is a diagonal matrix containing the dominant eigenvalues of A. and by (2. Since B\ is not defective. EIGENSYSTEMS If the dominant eigenvalues of A are nondefective. by Theorem 1. B?). we can choose a diagonal matrix DJJ — diag(/. Let be a Schur form of A in which A is diagonal and N is strictly upper triangular. let Then for i < j the (i. For 77 > 0.7) it satisfies (2. where B\ contains the dominant eigenvalues of A and BI contains the others.6). A2 is a diagonal matrix containing the remain eigenvalues of A.19 we can reduce A to the form diag(i?i.32 CHAPTER 1. and N is strictly upper triangular. Specifically. It follows that If we now define || • \\A.j)-element of D 1ND is Vijrf *. Since the diagonals of A2 are strictly less than p(A). D^) as above so that The norm \\-\\A defined by .4 the norm || • \\^t£ is consistent. Since for i < j we have TJJ~I ->• 0 as T? -> 0. We can then reduce BI to Schur form to get the matrix A2 + N.

\ \ there are positive constants cr (depending on the norm) and TA.e be a consistent norm such that || A||4ie < p( A) + e. 2. it can be used to establish valuable theoretical results.9. By the equivalence of norms.(.3. Then Again by the equivalence of norms.11). as we shall show in the next subsection. Theorem 2. It is therefore important to have criteria for when we observe a decrease. there is a constant TA. SPECTRAL RADII. Proof. Hence For the upper bound.SEC. let || • \\A. we now turn to the asymptotic behavior of the powers of a matrix. and the norm) such that If the dominant eigenvalues of A are nondefective. AND MATRIX POWERS now does the job.c||£|U. Equally important. (depending on A.Z > 0 such that ||J?|| < TA. We will first establish the lower bound. Then It follows that || Ak\\2 > p(A)h. there is a constant a > 0 such that \\B\\ > o-\\B\\2 for all B. Then Vol II: Eigensystems . We will see that this norm may be highly skewed with respect to the usual coordinate system—see the discussion following Example (2. For this reason. e. • 33 Theorem 2. MATRIX POWERS Many matrix processes and algorithms are driven by powers of matrices.8 shows that the spectral radius of a matrix can be approximated to arbitrary accuracy by some consistent norm. Asymptotic bounds The basic result is contained in the following theorem. Nonetheless. Let A be of order n and let e > 0 be given.« for all B. For example. NORMS. x) be a dominant eigenpair of A with ||a:||2 = 1. we may take e = 0. Let (A. 2. Then for any norm \ \ . the errors in certain iterative algorithms for solving linear systems increase decrease as the powers of a matrix called the iteration matrix. we need to know the rate of decrease.

99) not only take a long time to assume their asymptotic behavior.99 = -0. The powers of the Jordan block J2(0.9 that: A matrix is convergent if and only if its spectral radius is less than one. but they exhibit a hellish transient in the process. The lower plot shows the natural logarithms of the local convergence ratios. It follows immediately from Theorem 2.1 (called the hump) plots these norms against the power k. the convergence ratio is |.01. Example 2. Fork = 1. It is instructive to look at a simple case of a defective dominant eigenvalue. . we can repeat the proof of the upper bound with a consistent norm || • \\A (whose existence is guaranteed by Theorem 2.g. Example 2. where c can be arbitrarily small.. A matrix whose powers converge to zero is said to be convergent. EIGENSYSTEMS If the dominant eigenvalues of A are nondefective.5) are Hence and The ratios in (2. Instead of decreasing. We have to add "essentially" here because if a dominant eigenvalue is defective. the ratio is about 0. The co-norm of the powers of J?f 0. e.8) for which \\A\\A = p(A). The local convergence ratios are never quite as small as the asymptotic ratio one-half.11.99) are The upper graph in Figure 2. the bound must be in terms of of p(A) + €.99 only when k = 563. • The theorem shows that the powers of A grow or shrink essentially as p(A)k. From these graphs we see that the powers of J2(0. For k = 10.34 CHAPTER 1.10.5)fc. whose norms decrease monotonically at a rate that quickly approaches 2~k. the norms increase to a maximum of about 37 when k = 99. Here is a more extreme example. Thereafter they decrease. but they fall below their initial value of 1. so that the norms decrease from the outset. but they quickly approach it.55. Limitations of the bounds One should not be lulled by the behavior of J2(0.9) are called LOCAL CONVERGENCE RATIOS because they measure the amount by which the successor of a member of the sequence is reduced (or increased). which eventually approximate the asymptotic value of log 0.

and sometimes their effects are rather subtle.C given by Theorem 2. l)-element is multiplied by 1000.99)] + e is near p[J2(0.2)-element of J2 (0. In matrix analysis one frequently introduces a norm designed to make a construction or a proof easy. so that the norm remains small.99)].001. is damped by the factor 0.8 has the form The (2.99)fc.1: The hump and convergence ratios Which example—2. Many people will go through life powering matrices and never see anything as dramatic as J2(0.8) gives no hint of the ill behavior of J2(0. so that />[J2(0. If we choose e = 0.SEC. 2. this VolII: Eigensystems .1)element of «/2(0.001. Of course the norm of a general matrix would be unnaturally large. and it is instructive to see why. Mathematically. There is a lesson to be learned from this. The bound (2. However. such examples do occur.99)fc. then the norm || • \\A. SPECTRAL RADII. which accounts for the hump. AND MATRIX POWERS 35 Figure 2. because the (2.99)* is zero and hence is not magnified by this factor. But the (2. NORMS.11—is typical? The answer is: It depends.10 or 2. Be on guard.99).

7 to Frobenius. But unless the norm bears a reasonable relation to the coordinate system one is working in (the usual unit vectors for Example 2. If A has a defective eigenvalue near one.1 of this series.4.8. but the gist of what is happening this. Norms and the spectral radius In his Solution of Equations and Systems of Equations Ostrowski [ 194. Since the only nonzero elements of N are ones on its superdiagonal." Ostrowski's proof is an elegant and amusing application of the Jordan form.36 CHAPTER 1. which are described in [114. However. the location of the pseudospectrum in the complex plane explains the behavior of the matrix powers. the defective eigenvalue near one will. §17.9. The reader should be aware that some authors include consistency in the definition of a matrix norm.4. simple explanation.7 or in §1:1.1 has lead to various attempts to bound maxfc || Ak\\. EIGENSYSTEMS is sound practice.1]. we have Matrix powers For a detailed treatment of matrix powers—exact and computed—see [ 114. A matrix norm and a consistent vector norm are sometimes called compatible norms.11). Ch. though he gives no reference. cause the pseudospectrum to extend outside the unit circle. The computed jth column of Ak satisfies where ||£"j||/m|| is of order of the rounding unit. for sufficiently large e. NOTES AND REFERENCES Norms For treatments of matrix and vector norms see the general references in §1. the results may be disappointing or misleading. the perturbations Ei may create eigenvalues of magnitude greater than one. the powers of a matrix can exhibit erratic behavior. and a nominally convergent matrix can even diverge. the approach has the disadvantage that one must choose a suitable value for the free parameter €—not an easy problem. There is no one. in which case the asymptotic behavior will not be described by Theorem 2. 17]. Let A be the diagonal matrix of eigenvalues of A and let X~l(t~l A)X = e"1 A + N be the Jordan form of €~l A.1960] attributes Theorem 2. . The existence of the hump illustrated in Figure 2. 2. When computed in finite precision. stating that it "is unpublished. An important approach to this problem is the pseudospectrum In many instances. In the example of the last paragraph. In the same book Ostrowski gives essentially Theorem 2.

the current high state of the art is largely due to L. There are two reasons for studying this problem.ox. 277. We will treat these generalizations in Chapter 4.comlab.1 we will consider the perturbation of individual eigenvalues. The perturbation theory of eigenvalues and eigenvectors is a very large subject. where ||£||/||yl|| is of the order of the rounding unit for the computer in question. First. Second. For example. 279]. a useful tool for probing the sensitivity of individual eigenvalues. PERTURBATION THEORY 37 Although the pseudospectrum has been introduced independently several times. The second topic is Gerschgorin's theorem.SEC. and in §3. PERTURBATION THEORY In this section we will be concerned with the sensitivity of eigenvalues and eigenvectors of a matrix to perturbations in the elements of the matrix. VolII: Eigensystems . THE PERTURBATION OF EIGENVALUES This subsection is concerned with the problem of assessing the effects of a perturbation of a matrix on its eigenvalues. and this expression is not differentiable at e = 0. the results are often of use in the design and analysis of algorithms. For more see the Pseudospectra Gateway: http://web. the eigenvalues of any matrix are continuous functions of the elements of the matrix. Nonetheless.uk/projects/pseudospectra/ 3. 3. N. the eigenvalues are continuous at € = 0.ac. We will consider three topics. In fact. and we will only be able to skim the most useful results.2 generalize to eigenspace. A knowledge of the sensitivity of the eigenpair will enable us to predict the accuracy of the computed eigenvalue. In §3. The first is a bound that establishes the continuity of the eigenvalues of a matrix. Most of the results in §3. many of the algorithms we will consider return approximate eigenpairs (A.2 we will consider the perturbation of eigenpairs. The continuity of eigenvalues A fundamental difficulty with the perturbation of eigenvalues is that they need not be differentiable functions of the elements of their matrix. Trefethen [276.x) that satisfy (A + E)x = Xx. 3. We conclude with results on the perturbation of the eigenvalues of Hermitian matrices and the singular values of general matrices. as the following theorem shows.1. the eigenvalues of the matrix are 1 ± ^/e.

.1). Let A be such an eigenvalue and let E be small enough so that the right-hand side of (3. is the primitive nth root of unity. . Hence: The simple real eigenvalues of a real matrix remain real under sufficiently small real perturbations. EIGENSYSTEMS Theorem 3. . n such that • There are three points to be made about this theorem.1). Let the eigenvalues of A — A + E be A j .. It is fairly easy to show that for every eigenvalue A of A there must be an eigenvalue A of A satisfying the bound (3. each satisfying the bound (3. since if it were its complex conjugate would also be an eigenvalue—a second eigenvalue that satisfies the bound. For this reason. For example. . Garden variety eigenvalues are more likely to suffer a perturbation that is linear in the error..1) is less than half the distance between A and its neighbors.1. Let A the (possibly multiple) eigenvalues be A i. A cannot be complex. • The theorem has a useful implication for the behavior of the simple real eigenvalues of a real matrix under real perturbations. .1 (Eisner). However. A n . Then there is exactly one eigenvalue A of A satisfying the bound (for otherwise there would be too few eigenvalues of A to pair up with the other eigenvalues of A). .1) is used more in theory than in practice.. this does not preclude the possibility of two eigenvalues of A attaching themselves to a simple eigenvalue of A and vice versa. Moreover. Thus a perturbation of order e has the potential of introducing a perturbation of order e« in the eigenvalues of a matrix.1). Then there is a permutation j\. • The theorem establishes the continuity of eigenvalues in a very strong sense. Theorem 3. A n . to an eigenvalue A of multiplicity m of A there correspond m eigenvalues of A. i_ The exponent £ in the factor \\E\\% is necessary. .. However. . it is important to stress that this example is extreme—the eigenvalue in question is totally defective. the bound (3. pairs up the eigenvalues of A and A in such a way that each pair satisfies the bound (3.. and vice versa...38 CHAPTER 1. jn of the integers 1. In particular. the eigenvalues of are where u.. though perhaps magnified by a large order constant.. on the other hand.

. then that union contains exactly k eigenvalues of A. Now suppose that. Theorem 3. x) be an eigenpair of A. As t proceeds from zero to one.-.2 (Gerschgorin). A cure for this problem is to derive bounds for individual eigenvalues. (Jt=i Q^ is disjoint from the other disks. Initially the union VolII: Eigensystems . Let A be of order n and let the GERSCHGORIN DISKS of A be defined by Then Moreover.1 is that it bounds the perturbation of all the eigenvalues of any matrix. Let Qi(t) be the Gerschgorin disks for A(t) = D + tB.-. we find that which says that A 6 (/. Let & be a component of x with maximum modulus. Let A = D + B. The statement about the disjoint union can be established by a continuity argument.. the radii of the disks Qi(i) increase monotonically to the radii of the Gerschorin disks of A.. From the ith row of the equation Ax = Xx we have or Taking absolute values and remembering that |£j| < 1. Let (A.SEC. which makes it too large for everyday work. and they lie in the the (degenerate) disks ft(0). Proof. where D — diag(an. ann). 3. and assume without loss of generality that £t = 1. Gerschgorin's theorem provides a method for doing this. Then for 0 < t < 1. say. The eigenvalues of A(Q) are the diagonal elements a. if the union ofk of the setsQi are disjoint from the others. . U(i) = \Ji=i Gji(t) is also disjoint from the other disks Qi(i). It must therefore be large enough to accommodate the worst possible case.-. PERTURBATION THEORY Gerschgorin theory 39 The problem with the bound in Theorem 3.

Then . 1.3. Example 3. 2. • Gerschgorin's theorem is a general-purpose tool that has a variety of uses.6e-04. Example 3.1. it is especially useful in obtaining bounds on clusters of eigenvalues. as t increases. The first disk is disjoint from the others. 4. 1. which violates continuity. Hence. Thus the Gerschgorin disks locate the eigenvalues near two fairly well.4e-03. -9.4.4e-04. A has three eigenvalues \2.3. and their deviation is approximately the size of the error. Hence. However. Consider the matrix The radii of the Gerschgorin disks for this matrix are 2. U(t) can neither lose nor acquire eigenvalues. EIGENSYSTEMS £/(0) contains exactly k eigenvalues of A(0). -7. Moreover. which is illustrated in the following example.4-10"3.2.40 CHAPTER 1.2e-03.1). The remaining three disks are centered at 2 and are not disjoint.5e-03. the actual deviations are 1. 1. We can remedy this defect by the method of diagonal similarities. Let A be the matrix of Example 3. AS.7e-03. but they badly overestimate the deviation of the eigenvalue near one. from which we conclude that A has an eigenvalue AI that satisfies\Xi — 1| < 2. and let P = diag(l(T3.0e-03. In perturbation theory.3e-06.1. and A4 that satisfy At first glance this would appear to be a perfectly satisfactory result. For that would require an eigenvalue to jump between U(i) and the union of the other disks. then the above examples show that the eigenvalues of A lie near its diagonal elements. U( 1) = U?=i GK contains exactly k eigenvalues of A. If A is regarded as a perturbation of diag(l.2).

see the notes and references. • Although we have proceeded informally. Hermitian matrices In this subsection we will state without proof some useful results on the eigenvalues of Hermitian matrices. PERTURBATION THEORY and the radii of the Gerschgorin disks are 41 2. Theorem 3. Hence we know that A.7e-02. A trivial but useful consequence is that Another important consequence is the following interlacing theorem. For more. 9. the first disk remains disjoint from the others. 8. 6.SEC.3 is an off-diagonal perturbation of a diagonal matrix.(MO~3 of (wo. They have far-reaching consequences. • The matrix in Example 3.4) and (3. it is possible to formalize the method of diagonal similarities.5) are called respectively the min-max and the max-min characterizations. What we have shown—at least for this matrix—is that if a diagonal element is well separated from the other diagonal elements then an off-diagonal perturbation moves the eigenvalue by a small amount that is generally proportional to the square of the perturbation.3) we know that the remaining eigenvalues are within 4. whose eigenvalues are the same as DAD~l. 3.4-10"6 of one. Nonetheless.5 (Fischer). On the other hand. from (3. There are two comments to be made about this example.4e-01.1e-02. choosing the matrix D by inspection. Let the Hermitian matrix A have eigenvalues Then and The formulas (3. The first result is a far-reaching characterization of the eigenvalues of a Hermitian matrix. has an eigenvalue within 2. The similarity transformation DAD~l has reduced the radius of the first disk while increasing the radii of the others. VolII: Eigensystems .4e-06.

In the terminology of perturbation theory. In Theorem 3.. Corollary 3. the eigenvalues \i and //. then U^AU is a principal submatrix of A of order m. satisfy i.8. then the eigenvalues m of B satisfy (3. there is a difference. Theorem 3. a perturbation in the matrix makes a .8. Hence we have the following corollary. the pairing is explicitly determined by the order of the eigenvalues.42 CHAPTER 1. In Theorem 3.1 this theorem is about pairs of eigenvalues. It is noteworthy that like Theorem 3.1 the pairing is not explicitly given. However.6. If U consists of m columns of the identity matrix. A. The inequality (3.6). EIGENSYSTEMS Theorem 3.7. Let A beHermitian of order n with eigenvalues U Let U e Cnxm be orthonormal and let the eigenvalue ofUHAU be Then The theorem is called the interlacing theorem because when m = n-1. they interlace. If B is a principal submatrix of A of order m.e.8) says that the eigenvalues of a Hermitian matrix are perfectly conditioned—that is. Let the eigenvalues of A. (3. Let A and A = A + E be Hermitian. and Ebe Then Consequently. The final consequence of Fischer's theorem is the following theorem on the perturbation of eigenvalues.7) gives an interval in which the ith eigenvalue must lie in terms of the smallest and largest eigenvalues of E.

However.9. 1. unfortunately. Example 3.) If we perturb A by a matrix E of2-norm 10~4 we get a matrix whose eigenvalues are l.9e-04.10 (Hoffman-Wielandt).0002e-01. In fact. 1. the relative error in At is A rule of thumb says that if the relative error in a quantity is about 10 *. PERTURBATION THEORY 43 perturbation that is no larger in the eigenvalues.10"1. The perfect condition of the eigenvalues of a Hermitian matrix does not. Theorem 3. then the quantity is accurate to about k decimal digits. In the notation of Theorem 3. Thus (3.8e-07. It is seen that the relative errors become progressively larger as the eigenvalues get smaller.8e-07. imply that all the eigenvalues ar determined to high relative accuracy. 1. as the following example shows. The absolute errors in the eigenvalues.OOOOe+00.8e-02. 1.8 holds for the Frobenius norm.10~2. (We have reported only the first five figures of the element of A. are in general smaller than the bound would lead one to expect. The example not only illustrates the relative sensitivity of the smaller eigenvalues. The relative errors in these eigenvalues are 2.6e-05. Since the Frobenius norm of a matrix is greater than or equal to its 2-norm. The matrix has eigenvalues 1.9e-05. The orem 3.9) implies that the smaller th eigenvalue the fewer the accurate digits. 1.8e-05. 9. andlO"3. 1. 1. 2.0180e-03. but it shows the limitations of bounds based on norms.8. 1.9839e-03. a stronger result is also true.6e-03. 3.SEC. Vol II: Eigensystems .

It turns out that the perturbation theory for an eigenvector goes hand in hand with that of its eigenvalue. This insures that (up to a scalar factor) we have a unique eigenvector to study. x) is a simple eigenpair of A. we have yHx = 1. x). SIMPLE EIGENPAIRS CHAPTER 1. y is the left eigenvector of A corresponding to A. From the Cauchy inequality we know that where 7 < 1 with equality if and only if x and y are linearly dependent. . z) be a simple eigenpair of A.18 there is a nonsingular matrix (x X) with inverse (y y)H such that It is easily seen by writing that yRA — Aj/H and Y^A = MYE. In this section we will examine the sensitivity of eigenvectors to perturbations in the matrix. The angle between x and y will play an important role in the perturbation theory to follow. Left and right eigenvectors Let (A. In this case A corollary of this assertion is that if x is a simple eigenvector then its left eigen vector y is not orthogonal to x. then there is a left eigenvector y of A such thatyHx = 1. In R2 and R3 it is easy to see that the constant 7 is the cosine of the angle between the lines subtended by x and y. By Theorem 1.10) above (x X) * = (y YE)H.44 3. EIGENSYSTEMS Up to now we have been concerned with eigenvalues. In particular.2. It follows that: If (A. In general we define the angle between nonzero vectors x and y by Since in (3. and we will actually treat the perturbation of eigenpairs. We will restrict ourselves to a simple eigenpair (A.

x) of A. Thus to get a unique representation we must impose a normalization on x. We must now set up an equation from which (p and p may be determined. x) as E approaches zero. Specifically. Ignoring such products. x) that approaches (A. or Since Ax = Ax. Here we will require that yHx = 1. we get the linearized equation We must now solve the linearized equation for (p and p.11 below) that if E is small enough. It can be shown (Theorem 3. we can write where (p is to be determined. The first order of business is to find suitable representations of A and x. Before establishing that fact. we will firs address the problem of approximating the perturbed eigenpair. The key to a first-order analysis is to note that since (p and p approach zero along with Et products of these terms will become negligible when E is small enough.SEC. 3. Since A is a scalar. we are given a simple eigenpair (A. The natural equation is Ax — Xx.10). The problem of representing x is complicated by the fact that x is determined only up to a scalar factor. we have Unfortunately this equation is nonlinear in the parameters (p and p. PERTURBATION THEORY First-order perturbation expansions 45 We now turn to the perturbation theory of simple eigenpairs. and use the facts that Vol II: Eigensystems . This is easily seen to be equivalent to representing x in the form where p is to be determined. and a small perturbation A = A -f E. To solve for <p. Since yHx = 1. we have yHAXp = XyHXp = 0 and yHXp = 0.13) by Y H . multiply by yE. then there is a simple eigenpair (A. we have To solve for p multiply (3. We will suppose that A has been block-diagonalized as in (3. Since yHX = 0.

Since A is a simple eigenvalue. Theorem 3. nor have we shown how good the approximations actually are.11. EIGENSYSTEMS to get Mp + YHEx = \p. x) be a simple eigenpair of A. Let (A. The following theorem. fills in these gaps. which we give without proof. see the notes and references. Hence AJ — M is nonsingular and In summary: The first-order expansion of the eigenpair (A. Let (x X) be a nonsingular matrix such that where and set Let || • || be a consistent norm and define If then there is a scalar (p and a vector p such that is an eigenpair of A with .) under the perturbation A + E of A is Rigorous bounds We have derived approximations to the perturbed eigenpair. we have not established that a perturbed pair exists to be approximated. a. However. the eigenvalues of M are not equal to A.46 CHAPTER l. For more.

3. Turning first to the eigenvalues. then From this it follows that dX/daij is y{Xj. we can write The quantity yHAx in (3.20) implies that A is a differentiable function of the elements of its matrix.E||2). the simple eigenvalues are not only continuous. then VolII: Eigensystems . normalized so that yHx = 1.2)] Again by (3..21) is called a Rayleigh quotient. Generalizations of this scalar quantity will play an important role in algorithms for large eigenproblems. Moreover from (3.20) our first-order approximation yHEx to (p is accurate up to terms of 0(||. ||/12||. it follows from (3. enough to give a subsubsection to each. But to orient ourselves.18) it follows that \\p\\ is also O(||E||). 2) is a simple eigenpair of A and y is the corresponding left eigenvector. Since A + yuEx = yEAx. Specifically.20) that In other words. since <p = A — A. Qualitative statements The bounds in Theorem 3. ||/£||. 47 and There are many things to be said about this theorem. and ||F||22 are all 0(||£||). but they do not exhibit the 0(|| £ ||*) behavior that we have observed in defective matrices [see (3. we will first describe the qualitative behavior. so that E represents a perturbation of size e in the (i. For if we take E = ee^e. Equation (3. In matrix terms: If (A. j)-element of A. PERTURBATION THEORY Moreover.11 say very specific things about how a simple eigenpair behaves under perturbation.SEC. we have that |y?n|.

even when the perturbation in the directions between x and x is small. that it is easy to contrive large perturbations that leave A unchanged.yHEx). Turning now to more informative bounds. and the condition number will be an accurate reflection of the sensitivity A. On the other hand. for most norms in common use the secant gives a reasonable idea of the sensitivity.4. If it is small. In our case. it provides a meaningful condition number for the perturbed eigenvalue. If a condition number for a quantity is large. Hence the bound will generally be in the ballpark. First. the quantity is said to be well conditioned. we have from (3. The reason is that the vector p can be large. But it does not directly provide a condition number for the perturbed eigenvector. EIGENSYSTEMS Turning now to the eigenvector x.48 CHAPTER 1. we would not have ended up with as elegant a condition number as the secant of the angle between the left and the right eigenvectors. the condition number changes with the norm. however. they are useful only to the extent that the bound that generates them is sharp.M) The condition of an eigenvalue l /2i to be a very good approximation to x. the quantity is said to be /// conditioned. The condition of an eigenvector Theorem 3. if we had not chosen the 2-norm. Such numbers are called condition numbers. Keep in mind. the bound on A was derived by taking norms in a sharp approximation to the perturbed eigenvalue A.19) it follows that Thus we can expect x -f X(\I .23). Second.12).Hence Ignoring the 0(|| JE?||2) term in (3. In particular. we have so that the eigenvector is also continuous. /^ 24) Two caveats apply to condition numbers in general. Since A = A + yHEx + (<p. and. . since x — x = Xp.11 is useful in establishing the existence of a perturbed eigenpair.) Thus: The condition number of a simple eigenvalue is the secant of the angle between its left and right eigenvectors.20) By (3. we begin with the eigenvalue A. (For a general discussion of condition numbers see §1:2. from (3. as we have just seen. Moreover. we see that the secant of the angle between x and y is a constant that mediates the effect of the error E on the eigenvalue A. we know that ||x||2||y||2 = sec ^-(x-> y}.

For the rest of this subsection we will assume that this change has been made. we have But \XHX\ = cos(z. Then we can replace X in the block diagonal reduction (3.11 assume thatY is orthonormal. S~1MS). as above. and that the condition (3. 3. PERTURBATION THEORY 49 To see this. Theorem 3. It then follows that the approximation p to p in (3. Specifically.15) by XS and Y by YS~H. Then Proof. By choosing S appropriately.12.17) is satisfied in the 2-norm. Then \\E\\- Proof.14) becomes S~lp. Let x and x be nonzero. let S be nonsingular. We begin by choosing a distinguished basis for the reduction (3. Although we defined the angle between two vectors in terms of the cosine. VolII: Eigensystems . Hence We are now in a position to bound the sine of the angle between x and x. we will. Without loss of generality assume that ||or||2 = \\x\\z = 1.SEC.15) to block diagonal form. From the unitary invariance of || • |J2. The following lemma shows how do do this. and let Y be an orthonormal basis for 7£(x)-1. replace X by XS and Y by Y5~~H. We have (A + E)x = \x or Multiplying this relation by Yu and recalling that YHA = MYU. In Theorem 3. Then the matrix (x Y) is unitary. with S chosen so that the new Y is orthonormal. so that the block diagonal form becomes diag(A. we can make p as large as we like compared to The cure for this problem is to bound the angle between x and x.£). Lemma 3.13. we will actually calculate the sine of the angle. we have Since A 4 M.

We will now consider some of the properties of the functions sep. The answer is that it is a lower bound on the separation of A from the spectra of M. Otherwise put if an eigenvalue of M is near A we can expect the eigenvector x to be sensitive to perturbations in A.11 it follows that A —»• A as E -*• 0. as the following example shows. Properties of sep The first and most natural question to ask of sep is how it got its name.50 CHAPTER l. sep l (A.7 The result now follows on inverting this relation. M) is a condition number for the eigenvector x. For if an eigenvalue of M is near A. M) can be far smaller than the separation of A from the spectrum of M. then a small perturbation of A could make them equal and turn the one-dimensional space spanned by x into a two-dimensional space. we have the following theorem. Let A = 0 and let M = Wm be themxm upper triangular matrix . Unfortunately. M) is a lower bound on the separation of A and the eigenvalues of M. this is just common sense. Proof. Hence we may rewrite (3. • Thus sep(A.16) From Theorem 3. Specifically. EIGENSYSTEMS and by Lemma 3. sep( A. Theorem 3. By Theorem 2.14.15. Example 3.25) in the form Consequently. Actually.12 and the definition (3. In any consistent norm.

8). as a consequence of the following theorem. is an eigenvector of A and x is the corresponding eigenvector of A = A + E. It is an instructive exercise to use Wm to construct an example of this phenomenon.] Theorem 3. and sec Z(z.15) of Theorem 3. PERTURBATION THEORY 51 Since all the eigenvalues ofWm are one. On the other hand.J . is weaker than (3. It follows that the angle between x and y is zero.11. The simple eigenvectors of a Hermitian matrix also behave nicely. whatever its multiplicity. then Thus for Hermitian matrices the physical separation of eigenvalues—as distinguished from the lower bound sep—indicates the condition of its eigenvectors. Hermitian matrices If A is Hermitian then the right eigenvector x corresponding to the eigenvalue A is also the left eigenvector y. This example suggests the possibility that an eigenvector whose eigenvalue is well separated from its companions can still be sensitive to perturbations in its matrix. of course.SEC. Now in the decomposition (3. the distance between A and the spectrum of Wm is one.26). so that M is Hermitian. Let A be real and M Hermitian. It follows that if a. in the co-norm But sothatyw-^loo = 2 m . Then in the 2-norm. VolII: Eigensystems . with increasing m. we may take X = Y. whose proof is left as an exercise. [Hint: See (3. is perfectly conditioned. Hence which. Thus the simple eigenvalues of a Hermitian matrix are optimally conditioned. 3. goes quickly to zero in spite of the excellent separation of A = 0 from the eigenvalues ofWm. which shows that any eigenvalue of a Hermitian matrix. y) = 1—the smallest possible value.16. This result.

Continuity of eigenvalues For an extensive survey of general results on the perturbation of eigenvalues of nonnormal matrices. Ch. and Krause [24]. Also see [23. [177]. VIII].4. [168].118. 1931] is a consequence of the fact that a strictly diagonally dominant matrix is nonsingular.1 is due to Eisner [69. Simple eigenpairs The results in this subsection were.70] as improved by Bhatia. The approach taken in this section is along the line of Stewart and Sun's Matrix Perturbation Theory [269]. Left and right eigenvectors The fact that left and right eigenvectors corresponding to a simple eigenvalue cannot be orthogonal is well known and can be easily deduced from a Schur decomposition. [39] [118]. The technique is quite flexible. Books whose titles include the words "matrix analysis" or "matrix inequalities" usually contain material on perturbation theory—e. [18].. Gerschgorin theory Gerschgorin's theorem [82. Eigenvalues of Hermitian matrices Much of the material presented here can be found in §1:1. Theorem 3.119. 269] for further references and historical notes. This fact has been repeatedly rediscovered. leading Olga Taussky to write a survey entitled "A Recurring Theorem on Determinants" [274]. Both books are a valuable source of references. [20]. [119]. developed for eigenspaces.3. Bhatia [22] treats the perturbation of eigenvalues with special emphasis on Hermitian and normal matrices. NOTES AND REFERENCES General references CHAPTER 1. which will be treated in §2. Kato's Perturbation Theory for Linear Operators [146] presents the theory as seen by a functional analyst (although it includes a valuable treatment of the finite-dimensional case). There are several books devoted more or less exclusively to perturbation theory.52 3. In his subsequent Matrix Analysis [23] he treats eigenspaces of Hermitian matrices. Gerschgorin also showed that an isolated union of k disks contains exactly k eigenvalues and introduced the use of diagonal similarities to reduce the radii of the disks. For a formalization of the procedure see [175.285]. Eisner. Chapter 4. for the most part.4 of this series. EIGENSYSTEMS Most texts on numerical linear algebra contain at least some material on the perturbation of eigenvalues and eigenvectors (especially [121] and [300]). see Bhatia's Matrix Analysis [23. as Wilkinson has demonstrated in his Algebraic Eigenvalue Problem [300].g. .

A classroom example is the behavior of the pendulum under small oscillations. then the matrix must be near one with a multiple eigenvalue [303]. is that if the left and right eigenvectors of a matrix are nearly orthogonal. In matrix analysis first-order expansions are useful in three ways. First. pp. in perturbation theory a first-order expansion can yield very sharp asymptotic bounds.26). Since ||a:yH||2 = IHMMh = sec/(#. the approximations they furnish are often used in deriving and analyzing matrix algorithms. although for this to happen the eigenvalue must be defective. Its drawback is Vol II: Eigensystems . This result generalizes to representations of a matrix on an eigenspace [see (2. Second.13 is a generalization to non-Hermitian matrices of the sin# theorem of Davis and Kahan [54]. In fact.22) is an example]. Third. the bound becomes where Sep The justification for the introduction of sep to replace the actual separation of eigenvalues in the complex plane is the simplicity of bounds it produces. If Y is not orthonormal. where x and y are normalized right and left eigenvectors corresponding to a simple eigenvalue A. the condition of a simple eigenvalue is the norm of its spectral projection. Newton's method is the prototypical example. a first-order expansion is often the best way to get a handle on the perturbation theory of a new object. I/ \s\ is. Rigorous bounds Theorem 3. the secant of the angle between x and y. An analytic version of the result.?/). Chapter 4]. 3. They have also been used to generate iterations for solving nonlinear equations. PERTURBATION THEORY 53 The converse is not true—a multiple eigenvalue can fail to have nonorthogonal left and right eigenvectors.11 is a consequence of more general theorems due to the author [250. It is so called because (xy^)z is the projection of z on 'Jl(x) along ^(y)1-. and on the basis of a first-order perturbation analysis notes that the sensitivity of A is determined by l/|s|.252]. Condition numbers Wilkinson [300. 68-69] introduces the quantity s = yEx. First-order perturbation expansions First-order perturbation expansions have long been used in the physical sciences to linearize nonlinear differential equations. formulas for derivatives can often be read off from a perturbation expansion [the assertion (3. also deducible from a Schur decomposition. the matrix xyH is the spectral projector corresponding to A. of course. We will return to these bounds when we treat eigenspaces in Chapter 4. Ifx and y are normalized so that yHx = 1.SEC. Theorem 3.

Relative perturbation of eigenvalues: Determine the relative error in eigenvalues resulting from perturbations in the matrix. however. 165. On the other hand it has the nice property that where || • || is the norm used to define sep. Multiplicative perturbations: i. 3. 125. . 266]. 4. the matrix whose eigenvalues are (to five figures) 2. The above categories are not disjoint.173. the matrix and its perturbations must have some special structure.g. l)-element is larger than the eigenvalue itself. 58. tridiagonal matrices. etc..e.15 shows. EIGENSYSTEMS that sep is by no means a simple quantity. for example. But we can discern some trends. as Example 3. Thus. if the perturbations are structured.0000-10 7.9995 and 4. 113. In other words.166. This is generally true of normwise perturbation. the smallest eigenvalue is insensitive to small relative perturbations in the elements of A.9 suggests that the small eigenvalues of a symmetric matrix are not well conditioned in a relative sense under perturbations of the matrix. perturbations of the form (/ + E) A(I + F). It is also impossible to present a complete bibliography. 5. Perturbation theory for graded matrices. 2. where E and F are small. Componentwise perturbation of eigenvectors: Determine the error (preferably relative) in the individual components of eigenvectors resulting from perturbations in the matrix. This phenomenon is unexplained by our perturbation theory. The subject is still in flux. Perturbation theory for matrices with structured nonzero elements: e. bidiagonal matrices.54 CHAPTER 1. Relative and structured perturbation theory Example 3..995740 '. The following references emphasize surveys or papers with extensive bibliographies: [56.0000 and 5. Consider. even though the perturbation in the (1. Generally. small eigenvalues may be well conditioned. a small perturbation in the arguments of sep makes an equally small perturbation in its value. 112. Over the last two decades researchers have begun to investigate these and related problems. 1. If we replace A by we get eigenvalues 1. and it is impossible to survey it here. but overlap—often considerably.

This algorithm. Thus if we had a noniterative general-purpose eigenvalue algorithm. The climax of that theme is the QR algorithm—the great success story of modern matrix computations. It can be adapted to reduce a real matrix in real arithmetic to a real variant of the Schur form. the singular value decomposition of a general matrix. can compute a Schur form of a general matrix in 0(n3) operations.2 THE QR ALGORITHM All general-purpose eigenvalue algorithms are necessarily iterative. An important theme in this story concerns methods based on powers of a matrix. In §2 we will present the general algorithm in the form generally used for complex matrices. We will therefore begin with an exposition of those methods. and the generalized Schur form of a matrix pencil. This statement is a consequence of Abel's famous proof that there is no algebraic formula for the roots of a general polynomial of degree greater than four. to any polynomial p(x) = xn — an-ixn~l — aix — ao there corresponds a companion matrix whose characteristic polynomial is p. And the underlying theory of the method continues to suggest new algorithms. The purpose of this chapter is to describe the QR algorithm and its variants. we could apply it to the companion matrix of an arbitrary polynomial to find its zeros in a finite number of steps—in effect producing a formula for the roots. The algorithm is closely related to two algorithms that are important in their own right—the power method and the inverse power method. Thus the history of eigenvalue computations is the story of iterative methods for approximating eigenvalues and eigenvectors. and in the following section we will present the implicitly 55 . Specifically. with proper attention to detail. Other forms of the algorithm compute the eigenvalues of and eigenvectors of a Hermitian matrix.

-)(« = 1. its various manifestations all have a family resemblance. the first term in (1. 1.1. its implementation raises problems that will recur throughout this volume.. The effects of rounding error . THE POWER METHOD The basic idea behind the power is simple. we devote this subsection to a detailed treatment of the algorithms. We begin with the power method. In spite of their simplicity. £. any nonzero vector u0 can be written in the form Now Akxi = X+Xi. Convergence of the method 2.. Nonetheless. For this reason and because of the connection of the methods to the QR algorithm. The QR algorithm is a complicated device that generates a lot of code. The symmetric eigenvalue problem and the singular value decomposition will be treated separately in the next chapter.. Although the basic idea of the power method is simple. Computation and normalization 3. then as k —>• oo. Implementation 8. For this reason we will lavish attention on one variant—the explicitly shifted QR algorithm for complex matrices—and sketch the other variants in greater or lesser detail as is appropriate.-1 (i > 2) and 71 ^ 0.. Shifts of origin 7. . so that If |Ai| > |A. Choice of starting vector 4. Finally. 1. THE QR ALGORITHM shifted form for real matrices. THE POWER AND INVERSE POWER METHODS The power and inverse power methods are extremely simple methods for approximating an eigenvector of a matrix. so that Aku becomes an increasingly accurate approximation to a multiple of the dominant eigenvector x\. in §4 we treat the generalized eigenvalue problem. 1. The Rayleigh quotient 6. We will treat these problems in the following order. Let A be a nondefective matrix and let (A.1) dominates.56 CHAPTER 2. they have a lot to teach us about the art of finding eigenvectors. Convergence criteria and the residual 5. n) be a complete set of eigenpairs of A. A word on the algorithms. Since the Xi are linearly independent.

there is a constant a such that where p(M) is the spectral radius ofM.4). Proof.12. Theorem 1. Chapter 1. Chapter 1.3). all that is needed is that A have a single dominant eigenpair ( AI . Let UQ be a nonzero starting vector. we made the simplifying assumption that A is nondefective. But Vol II: Eigensystems . x\) and let A be decomposed in the form (1. decomposed in the form (1.SEC. It is easy to verify that Let the columns of Y form an orthonormal basis for the subspace orthogonal to x\. Then by Lemma 3.6. by Theorem 1. we can find a nonsingular matrix X = (x\ X^) such that Without loss of generality we may assume that Since x and the columns of X form a basis for C n . x\) in the sense of Definition 2.2). 1/71 ^ 0.19. 1.1. then In particular for any e > 0. Let A have a unique dominant eigenpair(\i. where X = (x\ X^) satisfies (1. we can write any vector UQ in the form In this notation we have the following theorem. In fact. THE POWER AND INVERSE POWER METHODS 57 Convergence In motivating the power method. Chapter 1. Since this eigenvalue is necessarily simple.

if A has a complete system of eigenvectors with its eigenvalues ordered so that |Ai| > |A2| > • • • > |An|.5). while (not converged) 3. In many applications the matrix A will be large and sparse—that is. Substituting these inequalities in (1. the error conK. Since p(M) < |Ai|. if the powers of M are badly behaved.9. Example 1. If we were to naively implement this formula. Chapter 1.This suggests the following algorithm.58 and CHAPTER 2. For example.2. THE QR ALGORITHM IfriA?*! + X2Mkc2\\2 > |7l||A?| . then the convergence will be retarded. • By Theorem 2. uk+i = Auk (1. Computation and normalization We have written the kth iterate u^ in the power method in the form ilk = Akv. we see that if 71 is small in comparison to c2. However. a statement made precise by (1. k = 0 2. end while For a dense matrix.6). we would end up first computing Bk — Ak and then computing wjt = BkUQ. Chapter 1. we may have to wait a long time to observe this asymptotic convergence. k verges to zero at an asymptotic rate of[p(M)/\\i\] . we get (1.6) follows directly from Theorem 2. each iteration of this program requires about n2 flam. In this case we say that UQ is deficient in\\.8) 4. this operation count does not tell the entire story. although its asymptotic rate will be unchanged.7) and dividing by |7i||Ai|.11. if M is nondefective then we may take e = 0 in (1. then the convergence is as |A 2 /Ai| fc . The inequality (1.5) shows that the error in the eigenvector approximation produced by thepower method behaves like [IIM^I^/IAil^].Q. However. An alternative is to observe that iik+i = Auk. k = k+l 5. A tridiagonal matrix of order n has the form illustrated below for n = . 1. it may take far fewer than n2 flam to compute its product with a vector.6). where a flam denotes a pair of floating-point additions and multiplications. see Example 2. • Equation (1. This procedure requires about kn3 flam to compute Uk—a grotesquely large operation count. • From (1.||Mfc||2||C2||2.6). it will have many zero elements. If the matrix is represented by an appropriate data structure. Chapter 1. • Three comments on this theorem. We will return to this point later.9. Let us consider the example of a tridiagonal matrix.

3. 56. Similarly. Uk = Uk/'\\Uk\\> Vol II: Eigensystems . Thus we have reduced the work to form the product Ax from 0(n2) to O(n). 3. We shall not always get as dramatic improvement in operation counts as we did in the above example. As it stands the program (1. Uk is scaled so that \\Uk\\ = 1 in some relevant norm.1.g. we do not divide Uk by \\uk\\. Then asymptotically we have Uk — 10*70^0. The cure for this problem is to note that only the direction of the vector Uk matters: its size has no relevance to its quality as an approximate eigenvector. 2. 7. b. 2. Such a process cannot continue long without overflowing the machine on which it is run. But it serves to show that algorithms based on matrix-vector multiplies can often be economized.In the sequel we will dispense with such minor economies and write.e. 1. e. if AI = 0.. suppose that A! = 10.8) will tend to overflow or underflow. k=Q while (not converged) Mfc+i = Auk 4.SEC. then the following program computes y = Tx.. THE POWER AND INVERSE POWER METHODS 6: 59 If the nonzero diagonals of T are stored as indicated above in arrays a. the vectors Uk will quickly underflow. k = k+l (T = 1/KH Ui — auk end while Since floating-point divides are expensive compared to multiplies. i. We can therefore scale iik to form a new vector Uk that is within the bounds of the machine. Usually. This can be done in the course of the iteration as the following program shows. Instead we multiply it by l/||tifc||. 1. y[l] = a[l]*a:[l] + 6[l]*a:[2] for z = 2 to n—1 y[i] — c[i—l]*x[i—1] + a[i]*^[z] + 6[z]*x[z'4-l] end for t y[n] = c[n-I]*x[n-l] + a[n]*z[n] This program has an operation count of In fladd+3n flmlt. 1. and c. 5. For example. the magnitude of Uk increases by a factor of ten for each iteration. In most cases the savings will be small compared to the work in forming Auk. 4.

For example. It it not surprising that a small value of 71. then the dominant eigenvector of Ak will furnish a good starting approximation to the dominant eigenvector of Ak+i. the vector e consisting of all ones is often a good choice for a matrix whose elements are positive.5).Y^UQ. -.g. Thus we cannot simply quit when the norm of error ek = Uk — x\ is sufficiently small. In some cases one will have at hand an approximation to the eigenvector x\. it is customary to take UQ to be a random vector—usually a vector of random normal deviates. then 71 will be zero. The problem is that a highly structured problem may have such a patterned eigenvector. a natural criterion is to stop when \\Uk — Uk-i\\ is sufficiently small. which makes the ratio large. For example. if UQ is random. if we are trying to find the dominant eigenvectors of a sequence of nearby matrices A\. Convergence criteria and the residual The chief problem with determining the convergence of a sequence of vectors u^ is that the limit—in our case x\ —is unknown. a vector whose elements alternate between plus and minus one. The reasoning is as follows..then or . A 2 . we may expect the ratio ||c2/7i ||2 to vary about the quantity Lacking special knowledge about x\. If the sequence converges swiftly enough. For example. if ||eJ| < alle^-ill. Such approximations are natural starting values for the power method. can retard convergence. the structure of the problem may actually dictate such a choice. As a rule.. THE QR ALGORITHM The ratio ||c2||/|7i| from the expansion UQ = jiXi + X^c? is a measure of how far the starting vector UQ is from the dominant eigenvector x\. one should avoid starting vectors consisting of small integers in a pattern —e. this is about as good as we can expect. then it can be shown that -I TT Since 7j = y^UQ and c^ ..60 Choice of a starting vector CHAPTER 2. where 0 < a < l. It is therefore important that the starting vector UQ should have a substantial value of 71. If that eigenvector is not the dominant eigenvector. On the other hand. If we write (x i X2) = (y\ YI) . When no approximation to x i is available. In fact this ratio appears explicitly in the convergence bound (1.

Although the residual does not give us a direct estimate of the error.SEC. 0.3. But if a is near one the error can be large when \\Uk — Uk-i \\ is small. it gives us something almost as good. as the following theorem shows (for generality we drop the iteration subscripts). and stop when r/t is sufficiently small. Theorem 1. An alternative is to compute the residual where ^ is a suitable approximation to AI. THE POWER AND INVERSE POWER METHODS 61 When a is. Moreover. say. Let Then there is a matrix such that and If A is Hermitian and then there is a Hermitian matrix satisfying (1. \\ek\\ can be no larger than twice ||Uk — Uk_i ||. Only if we have an estimate of a — which is tricky to obtain on the fly — can we relate the distance between successive approximations to the error.12).5. Vol II: Eigensystems . 1.

For general results on such structured backward perturbations see the notes and references. Again. we have not specified how to choose //. It suggests that we stop iterating when the residual is sufficiently small. A may be computed. then any errors in the approximate eigenvector can be accounted for by a perturbation of A that is smaller than the errors already in A.10) will not generally share the sparsity pattern of A—in fact it is likely to be dense. But the matrix E in (1. note that From this it follows that r and u are orthogonal eigenvectors of EEE with eigenvalues IHH/IHII. The proof in the non-Hermitian case is a matter of direct verification of the equalities (1. A natural choice is the value of \i that minimizes the residual in some norm. A may be measured. instead A will be contaminated with errors. we naturally take the vector u to be the fcth iterate Uk. For the norm of E. Now the zeros in a sparse matrix usually arise from the underlying problem and are not subject to perturbation. in general.62 CHAPTER 2. .11) and (1. For example. Optimal residuals and the Rayleigh quotient In computing the residual r = A u—fj. much larger than the rounding unit.12). if your results are inaccurate. This is not the whole story.13) of IJL makes r orthogonal to u. the equalities (1. For the Hermitian case we use the fact that the choice (1. As is often the case. u) is an exact eigenpair of a nearby matrix A + E. from which (1. Now if the residual norm is smaller than the error level. in which case A will have errors that are. in which case A will be contaminated with rounding error—errors of order of the rounding unit.14) follow from the fact that \\E\\\ is the largest eigenvalue of E^E and ||. we may take so that E alters only the diagonal elements of A. Fortunately.However. The justification for this criterion is the following. there is more than one E that will make (/^. In other words.E||p is the sum of the eigenvalues of ERE. For example. then the pair (//.Since the remaining eigenvalues are zero. u) an exact eigenpair of A. the inaccuracy is no greater than that induced by the error already present. THE QR ALGORITHM Proof.u to determine convergence of the power method. In most problems we will not be given a pristine matrix A. In many applications the matrix A will be sparse. Thus Theorem 1.12) follows immediately. the problem is most easily solved when the norm in question is the 2-norm. • The theorem says that if the residual is small.3 gives a backward perturbation E that is unsuitable for sparse matrices.

Thus in the power method. Generalizing from these two Rayleigh quotients. • A brief comment.16). This quantity is clearly minimized when fj. Then the quantity is called a RAYLEIGH QUOTIENT. • The proof characterizes the norm of the optimal residual as the norm of g in the partition (1. then on to the main point. We have already seen another Rayleigh quotient in (3. Definition 1. Let (u U) be unitary and set Then Since || • ||2 is unitarily invariant. Vol II: Eigensystems .21).2. This characterization of the optimal residual norm will prove useful in the sequel. the Rayleigh quotients u^Auk/u^Uk provide a sequence of increasingly accurate approximations to A.SEC. Let u and v be vectors with v^u ^ 0. This Rayleigh quotient provides a highly accurate approximation to the corresponding eigenvalue of A + E. The orthogonality of r and fj. — v — WH Au. l. Proof. Then ||rM||2 is minimized when In this case r^ J_ u.4. the Rayleigh quotient will approximate the corresponding eigenvalue. Chapter 1. we make the following definition. The quantity WH Au/u^u is an example of a Rayleigh quotient. By continuity when u or v is near an eigenvector. Letu ^ 0 and for any yu setr M = Au — ^LU.5. If either u or v is an eigenvector corresponding to an eigenvalue A of A. where we will treat the approximation of eigenspaces and their eigenblocks. then the Rayleigh quotient reproduces that eigenvalue. Chapter 4. THE POWER AND INVERSE POWER METHODS 63 Theorem 1. There it had the form yH(A + E)x where x and y (normalized so that yHx = 1) were left and right eigenvectors corresponding to a simple eigenvalue Xof A. We will return to Rayleigh quotients in §2. Without loss of generality we may assume that 11 u \ \2 = 1. is established by direct verification.

Anorm = \\A\\p 2.5. /j.99. while (true) 4. the convergence will be slow. For example. r = y — /j.0. If this ratio is near one. end while Algorithm 1. 1. this algorithm iterates the shifted power method to convergence. In the next subsubsection we will consider a very powerful spectral enhancement—the shift and invert operation. . we pay for this power in the coin of increased computational costs. and -0.64 CHAPTER 2.5. But it can only improve performance by so much.1: The shifted power method o Shifts and spectral enhancement If A is nondefective and the eigenvalues of A satisfy then the power method almost always converges to the dominant eigenvector x\ with rate |A 2 /Ai|. There are three things to say about this straightforward program. 1/3 is nearly the best convergence ratio we can obtain by shifting. If we shift A by AC = —0. Implementation Algorithm 1. .x 7. x = y — KX 8. and the convergence ratio becomes | —a dramatic improve This shifting of the matrix to improve the disposition of its eigenvalues is an example of spectral enhancement.K!)U in the form Au — KU at little additional cost. = x^y 6.. and —0. a shift AC..5. THE QR ALGORITHM Given matrix A. Sometimes we can improve the rate of convergence by working with the matrix A — AC/. if the eigenvalues of A are 1. However. whose eigenvalues \\ — AC. and a nonzero starting vector x. then the powermethod converges with ratio 0. x}. \n — K are shifted by the constant K.99.. In the above example. the eigenvalues become 1. x = x/\\x\\2 3. y — Ax 5. Shifting has the advantage that it is cheap: we can calculate (A . a convergence criterion e. ar = ar/||ar||2 9.1 implements the shifted power method. • Both the residual and the next iterate are computed from the intermediate vector . if (11 r 112 /Anorm < e) leave while fi 10.49. returning the approximate eigenpair (/z.

In particular. our convergence test is based on the pair (/x. the change is very small.2). then the computed residual corresponding to(A + E)x will also be reduced. and we cannot expect the power method to compute xi to an accuracy greater than that error. x). Chapter 1].18) imply that at each stage the power method is trying to compute the dominant eigenvector of a slightly perturbed matrix. Moreover. where x is the previous iterate. Nonetheless. by Theorem 3. the norm of A is computed outside the loop. THE POWER AND INVERSE POWER METHODS 65 y = Ax to avoid an extra multiplication by A. • In a working implementation one would specify a maximum number of iterations. where the relative error ||£||p/|| A||p is less than or equal to the convergence criterion e. This parsimonious style of coding may seem natural. Thus we can expect to see a steady decrease in the residual until the limiting accuracy is attained. but it is easy to forget when one is writing in a high-level language like MATLAB. For definiteness we will assume that ||ujb||2 = 1 and ||A||p = 1 in the following discussion. x). we can say that rounding error not only changes the vector Ax to (A + E)x. of the order of the rounding unit. but it also changes the residuals corresponding to these vectors.17) and the bound (1.3 we know that (//. However. if the residual corresponding to Ax is reduced far enough (and that need not be very far). Chapter 1. But can the limit be attained? The answer is not easy — surprisingly.13. we can expect no greater accuracy in the angle between the computed vector and xi than roughly neM||. in view of the simplicity of the convergence proof for the power method. The limiting accuracy of the power method is therefore quite satisfactory.SEC. and the code would give an error return when that number was exceeded. From Theorem 1. x) is an exact eigenpair of A + E. we return the vector x because it is almost certainly more accurate. 1. The effects of rounding error We must now consider the effects of rounding error on the power method. Thus. This perturbations induces an error in the eigenvector x\.A||p/sep(Ai. • On termination the algorithm returns an approximate eigenvector (//. Similarly. Strictly speaking. The major source of error in the method is the computation of the product AukAn elementary rounding-error analysis shows that for a dense matrix where Equation (1. x). we cannot say the same thing about (//. We will suppose that the computations are performed in standard floating-point arithmetic with rounding eM. This is only n times as large as the error we could expect from simply rounding the matrix [see (2. However. VolII: Eigensystems . M).

It has the disadvantages that it only computes the dominant eigenvector. However. for i > 1 the m approach (A. Moreover. the power method in its natural form is little used these days.4 . The inverse power method The inverse power method is simply the power method applied to the enhanced matrix (. For these reasons. we can compute a residual almost for free. simple dominant eigenvalue. 1. If we have an approximation K to AI. owing to the nature of the shift-and-invert enhancement. —*• AI. THE QR ALGORITHM The advantages of the power method are that it requires only matrix-vector multiplications and that it is essentially globally convergent.K!)~I. and the convergence can be very slow if the dominant eigenvalue is near in magnitude to the subdominant one. We begin with the enhancement. Thus by choosing K sufficiently near A! we can transform AI into a dominant eigenvalue.2. . and the power method cannot converge.66 Assessment of the power method CHAPTER 2. Specifically.1 to carry out the inverse power method.. then the eigenvalues of the matrix (A — At/)"1 are Now as K. The traditional enhancement for doing this is called the shift-and-invert enhancement. it plays an implicit role in the QR algorithm. We now turn to that variant. and suppose that Aj is simple. and a variant—the inverse power method—is widely used. This spectral enhancement is called the shift-and-invert enhancement. given a vector x and a shift K we generate a new iterate in the form In principle.. if a real matrix has a complex dominant eigenvalue. provided the matrix has a single. we could use Algorithm 1. On the other hand.. and that dominance can be made as large as we like. /^i —> oo. — A) . Shift-and-invert enhancement Let A have eigenvalues AI . then its conjugate is also an eigenvalue with the same absolute value. and its combination with the power method is called the inverse power method. . THE INVERSE POWER METHOD The difficulties with the power method can be overcome provided we can find a spectral enhancement that makes an eigenvalue of interest the dominant eigenvalue of the matrix. However. which are finite quantities. A n in no particular order.

on the other hand. let Since (A — Kl)y = x. we have If we set then the optimal residual for x is (see Theorem 1. while (true) Solve the system (A-Kl)y = x w = x/\\y\\2 p — x^w 3- % = y/\\y\\2 p. 2. 7. the decomposition is usually an LU decomposition computed by Gaussian elimination with partial pivoting. the Rayleigh quotient fi = K + p = xHAx converges to the corresponding eigenvalue of A.4) Note that as x converges to an eigenvector. Most linear system solvers first compute a decomposition and then use the decomposition to solve the system.2: The inverse power method o Specifically. a convergence criterion e. = K + p 6. 10. THE POWER AND INVERSE POWER METHODS 67 Given a matrix A.SEC.2 implements this scheme.K. this algorithm iterates the inverse power method to convergence. We will say that this process requires a factorization. a shift K. r = w — px x=x if (||r||2/||A||F < c) leave while fi end while Algorithm 1. 1. returning an approximate eigenpair (//. The subsequent solution of the linear system. requires only ra2 flam. Algorithm 1. 8. 9. It is evident that the major work in the algorithm consists in solving the system (A . or). 5. which costs about ^n3 flam. and a nonzero starting vector x.I)y — x in statement 2. 1. It is therefore important that the computation of the decomposition be placed outside the while Vol II: Eigensystems . With dense systems. and some care must be taken not to inadvertently increase this work. 4.

01 and took x to be a normalized random vector. In particular.6e-09 7.820e-02 4.7e-05 7.3e-08 3. But if K is near an eigenvalue. the matrix 3.4e-14 3.184e+00 2.699e+00 -2. If K approximates AI to working accuracy.7e-ll 7.487e-01 9. THE QR ALGORITHM loop in Algorithm 1.4e-15 4.2e-03 7.4e-10 1.68 CHAPTER 2. then A — K! is nearly singular.274e+00 8.4e-04 3.5. 9.606e+00 (shown only to four figures) has eigenvalues 1.2e-06 3. we get the following results.1e-18 The iteration converges to working accuracy in two iterations.3e-16 3. On the other hand if we take K = 1 + 10~8. We will return to this point repeatedly in the second half of this volume.095e-01 4.8e-13 8. In the following experiment.209e+00 -1. where the concern is with large eigenproblems.3.877e-01 1. if not dominant.307e+00 -1. the operations counts for computing a decomposition or factorization may be less than O(n3). and the system . This behavior is typical of the inverse power method applied to a well-conditioned eigenpair. the iteration converges in one iteration.510e+00 2.5e-07 7.399e+00 4. For example.2e-09 4.470e+00 2. An example The convergence of the inverse power method can be stunningly fast when the shift is near the eigenvalue in question.475e+00 -1. the faster the inverse power method converges.513e+00 -6.4e-12 3.765e+00 -2.158e-01 -1.4e-16 It is seen that each iteration adds about two significant figures of accuracy to the eigenvalue until the eigenvalue is calculated to working accuracy (£M = 10~16).2 were 9.814e+00 7. we took K = 1. they are usually an important.079e-01 1. The effects of rounding error The nearer the shift K is to an eigenvalue.2.776e-01 1.833e+00 -3.2. The errors in th Rayleigh quotients and the residual norms for several iterations of Algorithm 1.022e+00 6. When A is sparse.4.4e-10 3.907e+00 3. part of any calculation involving shift-and-invert enhancement. Nonetheless.298e-01 -3.906e-01 -5. one should avoid software that solves a linear system by invisibly computing a decomposition.

when AC is near an eigenvalue. the solution will be completely inaccurate. we can improve convergence by replacing AC wit ju. However. which is the subject of the rest of this chapter. it produces Rayleigh quotients \i that are generally more accurate than the shift AC. The Rayleigh quotient method We have seen that as we proceed with the inverse power method. THE POWER AND INVERSE POWER METHODS 69 (A — K. For example. To put the matter in another way. Thus for full. the power method goes back at least to 1913 [182]. in a disguised form it reappears in the QR algorithm. but most of the inaccuracy is in the direction of the eigenvector being approximated. a simulation of a discrete-time discrete-state Markov chain amounts to applying the power method to the matrix of transition probabilities.1937] gave an extensive analysis of the method. It can be shown that once it starts converging to a simple eigenvalue.I)y — x in statement 2 will be solved inaccurately. this error in y does not introduce any significant errors in the approximate eigenvector.g. In fact if AC approximates A to working accuracy. one must decompose the matrix A — AC/ to solve the system (A — AC/)?/ = x. 1. Since we are only concerned with the direction of the eigenvector. it converges at least quadratically.4). If a stable algorithm (e. Thus the inverse power method moves toward an approximate eigenpair that is for practical purposes indistinguishable from the exact eigenpair. If the eigenpair we are looking for is well conditioned. then the computed solution will satisfy where ||£||F/||A||F is of the order of the rounding unit. Yet the method still converges. The main difficulty with the Rayleigh quotient method is that each time the shift changes. Vol II: Eigensystems . Many people who use the power method do so unknowingly. 1. NOTES AND REFERENCES The power method As an explicit method for computing a dominant eigenpair. which accounts for the fact that the Rayleigh quotient iteration in its natural form is not much used.1933] used it to compute his numerical results. Hotelling in his famous paper on variance components [120.2 The result is called the Rayleigh quotient iteration. This amounts to adding the statement AC = /z after statement 8 in Algorithm 1.SEC..3. Gaussian elimination with partial pivoting) is used to solve the system (A — AC/)?/ — x. the vector y is indeed computed inaccurately.AC/ changes it only insignificantly. Since the rate of convergence depends on the accuracy of AC. Aitken [2. dense matrices the Rayleigh quotient method takes O(n3) operations for each iteration. this perturbation in A . Why? The answer to this question is furnished by the rounding-error analysis of the solution of linear systems (see §1:3.

who defined it for symmetric matrices. For further discussion of this matter see any general textbook on numerical linear algebra—e.3. 141] The result for Hermitian matrices in a more general form is due to Powell [214]. since seldom will we have the wherewithal to determine the accuracy of the current approximation to an eigenvalue or eigenvector. whose application was optimization. see [299..3 to respect the zero structure of a sparse matrix is a drawback only if the eigenpair in question is especially sensitive to perturbations in the zero elements. 278. Some of its many generalizations will appear throughout this volume. Even in this case one can apply a very general theorem of Oettli and Prager [185]. honed this tool to as fine an edge as its limitations allow. the power method was one of the standard methods for finding eigenvalues and eigenvectors on a digital computer. 577]. THE QR ALGORITHM For a time. The justification of a convergence criterion by appeal to a backward error result is part of the paradigm of backward rounding-error analysis. For more see The Algebraic Eigenvalue Problem [300. Whether this result can be generalized to Hermitian matrices is an open question. which says that a small residual implies a small backward error. The .262. 570-599] and [295].g. p. which allows one to specify the relative sizes of the elements in the backward error. The shift was actually entered by machine operators in binary by switches at the console before each iteration [300. The failure of the backward perturbation in Theorem 1. The fact that the Rayleigh quotient gives an optimal residual is due to Wilkinson [300. Wilkinson. provides a justification for tests based on the size of a suitably scaled residual. 95. Theorem 1. working at the National Physical Laboratory in England. The first part is due to Wilkinson—e. Ipsen [124] gives a survey of the method and its properties.g. The Rayleigh quotient The Rayleigh quotient is due to Lord Rayleigh (J. 299. pp. 1944]. 172]. A particularly important problem was to deflate eigenvalues as they are found so that they do not get in the way of finding smaller eigenvalues. p. [62.. 300]. Strutt). W. p. Shift of origin Wilkinson used shifting to good effect at the National Physical Laboratory. The inverse power and Rayleigh quotient methods The inverse power method is due to Wielandt [294. along with an extensive list of references.70 CHAPTER 2. before the advent of the QR algorithm. Residuals and backward error The use of a residual to determine convergence is inevitable in most problems.

then. end for k In other words. where Q is orthogonal and R is upper triangular 4.However. The method is called the explicitly shifted QR algorithm because the shift is explicitly subtracted from the diagonal of A and added back at the end of the process. Factor Ak—Kkl — QkRk. to perform one QR step— subtract the shift from the diagonal. for A: = 0. to get from one iterate to the next — i...1. 2. multiply the factors in reverse order. is due to Wilkinson (e. Chapter 3). when the off-diagonal part of its Schur form dominates the diagonal part [111]. A more subtle reason is the behavior of the residual when the matrix is strongly nonnormal—that is.e.191. Choose a shift AC^ 3. The matrices produced by the algorithm (2. see [300.. 2. and add back the shift to the diagonal. compute a QR factorization.2.g. 619-622]). the first residual will generally be extremely small and then will increase in size before decreasing again... The use of the Rayleigh quotient to compute the optimal residual mitigates this problem.9]. It is important to use the Rayleigh quotient to compute the optimal residual in Algorithm 1. is the algorithm worth considering? Vol II: Eigensy stems . The Rayleigh quotient iteration has been analyzed extensively in a series of papers by Ostrowski [188. For more see [96] and [124]. Ak+i = RkQk+KkI 5. The fact that A — AC/ is ill conditioned when AC is near an eigenvalue of A makesT certain that the vector in the inverse power method will be inaccurately calculated.g.189. Let AQ = A. Why.190.6-4.SEC.. 2. An obvious reason is that if we use the constant shift AC. the residual can never converge to zero.1) satisfy Thus Ak+i is unitarily similar to Ak. this fact alone is not sufficient reason for performing this sequence of operations.2. The analysis given here. 1. §4.192]. THE EXPLICITLY SHIFTED QR ALGORITHM The QR algorithm is an iterative method for reducing a matrix A to triangular form by unitary similarity transformations.6. In that case. which shows that the inaccuracies do no harm. THE EXPLICITLY SHIFTED QR ALGORITHM 71 method is chiefly used to compute eigenvectors of matrices whose eigenvalues can be approximated by other means (e. see Algorithm 2. see [206. pp. For additional results on the symmetric case. Its shifted form can be written simply—and cryptically—as follows.

B.1. The QR choice The above reduction is not an effective computational procedure because it requires that one be given eigenvectors of A.2) will be small because ||#||2 is the norm of the optimal residual qu A . A step of the QR algorithm can be regarded as an approximation to such a deflation. The QR algorithm implicitly chooses g to be a vector produced by the inverse power method with shift K. Chapter 1) we showed that if we have knowledge of an eigenvector we could deflate the corresponding eigenvalue of the matrix by a unitary similarity transformation. and the next two subsections are devoted to laying them out in detail. THE QR ALGORITHM The answer is that the algorithm is an elegant way of implementing the inverse power method with shift K^. For brevity we will drop the iteration subscripts and denote the current matrix by A and the next QR iterate by A. let Q — (Q* q) be unitary and write If (A.ftq11 (see Theorem 1. however. The remaining subsections are devoted to the practical implementation of the algorithm. If we repeat the procedure inductively on B. then It follows that Thus we have deflated the problem. If. Schur from the bottom up In establishing the existence of Schur decompositions (Theorem 1. .12. with the difference that the deflation occurs at the bottom of the matrix. Specifically.72 CHAPTER 2. At the same time it is a variant of the power method. 2. we choose q to be an approximate eigenvector of A. then g in (2. q) is a left eigenpair of A. This relation allows us to investigate the convergence of the algorithm. etc. we obtain an alternate proof of Schur's theorem—one in which the reduction is accomplished from the bottom up by left eigenvectors. THE QR ALGORITHM AND THE INVERSE POWER METHOD In this subsection we will show how the QR algorithm is related to the inverse power method.4). These relationships go far toward explaining the remarkable effectiveness of the algorithm.

2. If AC is close to an eigenvalue of A.Hence if g is small. Partition A in the form Then (//.AC/) = rnnej. Since Q is unitary. is the Rayleigh quotient eT A e n -^-"n • Error bounds To show that the QR algorithm converges to a deflated matrix.2) to be small. This shift is called the Rayleigh quotient shift because fj. Hence and Vol II: Eigensystems . we must show that the vector g in (2. then we can expect q to be a good approximation to a left eigenvector of A and hence #H in (2. THE EXPLICITLY SHIFTED QR ALGORITHM 73 To see this. let us rewrite the QR factorization of the shifted matrix A — AC/ in the form where because of the triangularity of R we have The last row of (2. let us write and also write the relation A — AC/ = RQ in the form From these partitions we proceed in stages. The above analysis also suggests a choice of shift. Specifically.SEC.4) is qH(A . e n ) is an approximate left eigenpair whose residual norm is \\g\\2.6) converges to zero. The key to the derivation is to work with a partitioned form of the relation of A — AC/ = QR.5) shows that the last column of the matrix Q generated by the QR algorithm is the result of the inverse power method with shift K applied to the vector ej. \i should approximate an eigenvalue of A and is a natural choice for a shift. We begin by giving a bound that relates the norms of g and g in successive iterates. Hence Equation (2. the norms of its last row and last column must be one.

8) we find that #H = peH. Computing the (2. The fact that ak can be bounded is a nontrivial corollary of the formal proof of convergences.7) in the form then we find that p = ftth + x((j. Hence from (2. However.10) and the fact that Finally. We will assume that we have constants T? and a such that Since the A* are orthogonally similar. THE QR ALGORITHM Next we want to show that e is small when g is small. We may now replace (2. we get #H — eH S. the proof is tedious and not particularly informative.K).74 CHAPTER 2. Rates of convergence Let us restore our iteration subscripts and write the bound (2.12) in the form This bound can be used to show that if go is sufficiently small and //o is sufficiently near a simple eigenvalue A of A then the iteration with Rayleigh quotient shift converges in the sense that (/jt -> 0 and fik -> A.10) and (2.11) we get This is our final bound. If we use the fact that Q is unitary to rewrite (2. we can simply take rj = ||A2||2. from (2.9) and (2. We will therefore assume that the iteration converges and determine the rate of convergence.13) by .7). If we assume that S is nonsingular and set then We now need a bound on p. Hence from (2. .1)-block of (2.

) Nonetheless. General comments on convergence Our convergence analysis seems to depend on the hypothesis that //^ is converging to a simple eigenvalue A. then Thus four iterations are sufficient to reduce the error by a factor corresponding to the IEEE double-precision rounding unit. if a2?/ = 1 and ||0o||2 = 10"1. For more on these points see the notes and references. 2.13). There are no theorems for the global convergence of the QR algorithm applied to a general matrix. If A is defective on the other hand. For example. however. that is. we have hk = gk. Subsequent eigenvalues are found with fewer and fewer iterations.7 below. THE UNSHIFTED QR ALGORITHM The relation of the QR algorithm to the inverse power method explains its ability to converge swiftly to a particular eigenvalue.2. whatever its multiplicity. THE EXPLICITLY SHIFTED QR ALGORITHM 75 (the second term vanishes because the Rayleigh quotient shift takes Kjt = Hk). the convergence of the QR algorithm remains quadratic. then so are the iterates Ak.SEC. and the convergence remains cubic. so that we may replace 77 by \\gk\\2. if A is multiple. However. the matrices S^1 can be unbounded. the convergence may be only linear.In particular. (It is a good exercise to see why this happens. For example. it assumes that fj. and it is blindingly fast. In addition. the algorithm also Vol II: Eigensystems . if A is Hermitian. this is true of the Wilkinson shift defined by Algorithm 2. 2. which we turn to now. which removes the term 0k\Hk . so that our uniform upper bound a no longer exists.Kjb|||flr/b||2 in (2. once it sets in.Thus our bound becomes This type of convergence is called cubic. the norm of gk+i is bounded by a quantity proportional to the square of the norm of gk. Why this happens is partly explained by the relation of the QR algorithm to the power method. A is necessarily nondefective. The typical behavior seems to be that the algorithm spends its first few iterations floundering and then homes in on an eigenvalue.Thus when the QR algorithm converges to a simple eigenvalue.We say that \\gk\\2 converges at least quadratically to zero.Kk\ = O(||0||fc) will give us quadratic convergence.Q is sufficiently near a simple eigenvalue of A. We have used the Rayleigh quotient shift in our analysis. If AQ is Hermitian. In particular. The complete convergence analysis is local. any shift for which \pk . is very fast. In practice. if A is nondefective. Quadratic convergence.

. We begin with a far-reaching characterization of the ArthQR iterate. Let Q0. . we get Hence An obvious induction on the product completes the proof. we will need a single transformation that moves us from the initial matrix A = AQ to the final matrix Ak+i ..When the algorithm has finished its work..1. We have Postmultiplying this relation by Rk-i. . The QR decomposition of a matrix polynomial Each step of the QR algorithm consists of a unitary similarity transformation Ajt+i = Q^AQk.. The explication of this relation is the subject of this subsection. . Let Then Proof.. ..76 CHAPTER 2. albeit more slowly.Qk and RQj. . THE QR ALGORITHM produces approximations to the other eigenvalues.. Kk starting with the matrix A. The reason for this is that the QR algorithm is also related to the power method. a process called accumulating the transformations. Rk be the orthogonal and triangular matrices generated by the QR algorithm with shifts KQ.This can be accomplished by multiplying the transformations together as we go along. The result is a matrix satisfying The matrix Qk has another useful characterization Theorem 2.

say by Householder triangularization. Let us say that a sequence of QR steps is unshified if the shifts Kk are zero.. The QR algorithm and the power method Theorem 2.15).) These formulas give us two ways of evaluating the QR factorization of p( A). Kk. If we use the first formula. if we know the zeros AC.. The result is the QR factorization QkRk ofp(A). Vol II: Eigensystems .SEC. Now if the conditions for convergence of the power method are met. the vectors q[ ' approach the dominant eigenvector of A. Let 77 be a polynomial of degree k whose zeros are «i. where AI is the dominant eigenvalue of A. accumulating the transformations Q j and the product of the triangular matrices Rj.. then we can simply apply the shifted QR algorithm with shifts KI . the matrices Qk and Rk from the unshifted QR algorithm satisfy Multiplying this relation by ei and recalling that Rk is upper triangular...1 can be interpreted as follows. For the present.. Then by (2. Kk. we simply formp(A) and calculate the QR decomposition of the result.. of p(r). 2. Convergence of the unshifted QR algorithm The relation of the QR algorithm to the power method is a special case of a more general theorem—under appropriate conditions the unshifted QR algorithm actually triangularizes A. we have Thus the first column of Qk is the normalized result of applying k iterations of the power method to ei. It follows that if we partition then gk —»• 0 and /^ —> AI.1 has more than one application..We can then define p( A) by either of the formulas (It is a good exercise to verify that these forms are equivalent. we are interested in what it says about the relation of the QR algorithm to the power method. However. THE EXPLICITLY SHIFTED QR ALGORITHM Theorem 2.

16) approaches zero — i. Write A*IA-fc = I + Ek. Proof. and letX = QR be the QR decomposition o f X .16). We have Now for i > j the (z. Chapter 1 ] that the columns of the Q-factor of X are the Schur vectors of A corresponding to the eigenvalues in the order (2. Let CHAPTER 2.2.kLA~k approaches the identity matrix. Let the diagonals of RRAkU be 61. . A.. which by (2.9). where L is unit lower triangular. THE QR ALGORITHM where Suppose that X l has anLU decomposition X l = LU.e.. Consequently. by the continuity of the QR factorization Qk —>• /...)-element of A fc £A k is lij(\i/Xj)h. 6nt and set Then the triangular factor in the decomposition has positive diagonal elements.. where Ek -> 0.78 Theorem 2..Then Since / -f REkR 1 —> /. If Ak has the QR decomposition Ah = Qk&k. Then Let QkRkbe the QR decomposition of / + REkR 1 . with . then there are diagonal matrices Dk with \Dk\ = I such that QkDk -> Q. • We know [see (1. It follows by the uniqueness of the QR decomposition that Qk = QQkD~\and Four comments on this theorem.

Specifically. but the latter need not. which is what we set out to show. if A — diag(0. Thus under the conditions of Theorem 2.1. But the QR algorithm can be very slow in extricating itself from disorder. and hence its Q-factor has the form Vol II: Eigensystems .2 the unshifted QR algorithm produces a sequence of matrices tending to a triangular matrix.SEC. Rounding error will generally cause the eigenvalues to be found in descending order.i. 2. For example. The former decomposition always exists. THE EXPLICITLY SHIFTED QR ALGORITHM 79 its columns suitably scaled the matrix Q k converges to the orthogonal part of the Schur decomposition of A. and has no LU decomposition. It follows that the QR iterates Ak = Q^AQk must converge to the triangular factor of the Schur decomposition of A. It is easily verified that the unshifted QR algorithm converges — actually it is stationary in this example — to a matrix whose eigenvalues are not in descending order.2. the product R(AkLA~k)(AkU) has the form This product is block upper triangular. • We have assumed that the matrix X of right eigenvectors has a QR decomposition and that the matrix X~l of left eigenvectors has an LU decomposition. This phenomenon is called disorder in the eigenvalues. write If 4fc)H> £J+i. • We can get further information about the behavior of the unshifted algorithm by partitioning the matrices involved in the proof of Theorem 2.0). It should be kept in mind that disorder is a creature of exact computations. and ^ .5.are zero. then so that the matrix X that orders the eigenvalues in descending order.

3. L\^ l.2 explicitly assumes that the eigenvalues of A are distinct and have distinct absolute values.. on which we must perform many iterations. • Theorem 2. but for a large matrix. the results on the unshifted algorithm give us insight into how the shifted algorithm behaves. and even when the conditions of Theorem 2.e. which converge to zero as \\i/\i_i\k and \Xi+i/\i\k respectively. Now in the unshifted algorithm these matrices tend to zero. The first point to note is that one step of the algorithm requires the computation of a QR decomposition .2 are satisfied. where T is the upper triangular factor of the Schur decomposition of A. This. while the other elements of the submatrix Ak[i:n. The foregoing does not make an impressive case for the unshifted QR algorithm. As the shifts «& converge. we are in effect applying the unshifted algorithm to the matrix A — K&/. It follows that: Under the hypotheses of Theorem 2. This reduction may not be much. and by continuity Ak will tend toward a matrix of the form (2. Nonetheless. and i\^ i in r (2. Specifically. in fact. The rate of convergence is at least as fast as the approach of max{ | A. the QR iterates Ak tend to a block triangular form._i |. is what happens in practice.18) at the same rate as these matrices approach zero. However. the proof of the theorem can be adapted to show that if there are groups of eigenvalues of equal magnitude. by considering a suitable partitioning of the matrices in question./ A. suppose that the shifted algorithm is converging to an eigenvalue A. whose diagonal blocks—which themselves may not converge — contain the equimodular eigenvalues. i)-element of Ak converges to A.17) are exactly zero. l:i] converge to zero.80 CHAPTER 2.18) occurs only whenilL' . Ak decouples at its ith diagonal element. The block triangularization in (2. By inspection. HESSENBERG FORM We now turn to the practical implementation of the QR algorithm. | \i+i/ \i \}k to zero.2. Its convergence is unreliable. so that the last row is going swiftly to zero. 2. the (i. the convergence can be slow. It follows that i. THE QR ALGORITHM NowA f c = Q^AQk = Q%QHAQQk = Q^TQk. Hence we may expect some reduction in the subdiagonal elements of A other than those in the last row. its cumulative effect may be significant. we see that the most slowly converging elements are l\ ^ and i\^ t-.

it turns out. The cure to this problem is to first reduce the matrix A to a form in which the algorithm is cheaper to apply. Specifically. in §1:4. is upper Hessenberg form'. Real Householder transformations have been widely treated in the literature and in textbooks (in particular. we work with ever smaller submatrices. a QR step preserves Hessenberg form. emphasizing the complex case. (This is true even if we take advantage of the fact that as the matrix deflates. Moreover. A Householder transformation or elementary reflector is a matrix of the form where It is easy to see that a Householder transformation is Hermitian and unitary.} As we shall see later. a matrix of the form illustrated below forn = 6: (This representation. in which an 0 stands for a zero element and an X stands for an element that may be nonzero. This brings the total operation count down to an acceptable 0(n3). Vol II: Eigensystems . is called a Wilkinson diagram. Here we will give a brief treatment of the basics.) Such an operation count is far too high for the algorithm to be practical.The computation of this decomposition requires 0(n3) floating-point operations. The natural form. that is.SEC. 2. Since at least one QR step is needed to compute a single eigenvalue. a QR step for a Hessenberg matrix requires only 0(n 2 ) operations. Householder transformations The reduction to Hessenberg form is traditionally effected by a class of transformations called Householder transformations. This equation suggests the following algorithm. The product of a Householder transformation and a matrix A can be cheaply computed. THE EXPLICITLY SHIFTED QR ALGORITHM 81 of Ak. which overwrites A with HA. the operation count to find all the eigenvalues is O(n4).1 in this series).

1. the algorithm returns a row vector u such that a(I . Theorem 23.WWH such that . as usual. • If a is a row vector. A similar algorithm will compute the product AH. This observation leads to Algorithm 2. Five comments. • There is no danger of cancellation in statement 11 because the preceding statement forces u[l] to be nonnegative. Householder transformations are useful because they can introduce zeros into a matrix. • If a = 0.1)element changed to — 1. • If a is of order n and we replace the index 1 in housegen with n. Let Then The proof of this theorem is purely computational and is left as an exercise. and so on. The condition that the vector a have norm one and that its first component be nonnegative is rather restrictive. Since we generally represent row vectors as aH.UHU) — i/ef. We begin with the special case of a vector. to reduce a general vector a to a multiple of ei we can replace a by where p is a scalar of absolute value one that makes the first component of a nonnegative. However. which makes H the identity with its (1.82 CHAPTER 2. we will always invoke housegen in the form In this case our Householder transformation is J . we obtain a new function that produces a Householder transformation H = I . the routine returns u = \/2.uu . W H . this algorithm requires about 2mn flam. THE QR ALGORITHM If A is rax n. Let a be a vector such that \\a\\2 = 1 and a\ is real and nonnegative. which generates a Householder transformation that reduces a vector a to a multiple of ei.

end if 10. end housemen Algorithm 2. and the generation of Householder transformations for real vectors is usually done somewhat differently (see Algorithm 1:4. p = «[l]/|ti[l]| 7. However. Chapter 5. in this volume we will regard housegen as a generic program. Nonetheless. w. Reduction to Hessenberg form We have already mentioned that the principal use of Householder transformations is to introduce zeros into a matrix. 2.1).3.u/v/ttpl 13. housegen(a. i/ = — pv 14. in this case such constructions as p = «[l]/|u[l]| are unnecessarily elaborate. «[!] = ! + «[!] 12. 3. THE EXPLICITLY SHIFTED QR ALGORITHM 83 This algorithm takes a vector a and produces a vector u that generates a Householder transformation H = I — WWH such that Ha = I/GI.SEC. see Algorithm 1.1.1: Generation of Householder transformations o For an example of housegen2 applied to a row vector. 2. soisw. Let A be of order n and partition A in the form Let HI be a Householder transformation such that Vol II: Eigensystems . « . u — (p/v)*u 11. We will illustrate the technique with the reduction by unitary similarities of a matrix to Hessenberg form. else 8. fi 5. 1. p= 1 9. return . applicable to both real and complex vectors. • Whenaisreal. if(«[l]/0) 6. if (v = 0) tt[l] = \/2. t>) w=a ^=N|2 4.

. THE QR ALGORITHM Figure 2.1 illustrates the course of the reduction on a matrix of order five. H I ) . . . suppose that we have generated Householder transformations HI .84 CHAPTER 2. For the general step. then Thus we have annihilated all the elements in the first column below the first subdiagonal. Hk-i such that where AH is a Hessenberg matrix of order k — 1. then which reduces the elements below the subdiagonal of the fcth column to zero.1: Reduction to upper Hessenberg form o If we set HI = diag(l. Let Hk be a Householder transformation such that A If we set Hk = diag(/A. . JYjt). The elements with hats are the ones used to generate the Householder transformation that . Figure 2..

and the transformations are accumulated in the matrix Q.V*UH end k Accumulate the transformations Q[:. 11. 10. k]) Q[k+l:n. 5. 2.k] 15.H. k+l:n] . 2. Ar]. end hessreduce Algorithm 2. u.SEC. k+l:n] .n] = e n .^ = 0 Postmultiply the transformation v = H[l:n. 13. k+l:n] 16. VH = «H*Q[fc+l:n. 1. Q[:. k+l:n] = H[k+l:n.n-l]=.2 t o l b y . end for fc 19. hessreduce( A. THE EXPLICITLY SHIFTED QR ALGORITHM 85 This algorithm reduces a matrix A of order n to Hessenberg form by Householder transformations. H[k+l. 1. 9. 3. 12.Q) Reduce A H =A for k = I to ra-2 Generate the transformation housegen(H[k+l:n. k+l:n] .e n _i 4. u = Q[k+l:n.Q[k+l:n.W*VH 17. Q[k+l:n. 8.U*VH ^[^+2:71.Q[:.k] = ek 18. k+l:n] = H[l:n.l 14.k+l:n] H[k+l:n.2: Reduction to upper Hessenberg form o VolII: Eigensystems . 6. The reduced matrix is returned in the matrix H.k+l:n]*u H[l:n. forfc = n .k+l:n] .k] = u Premultiply the transformation VE = uH*H[k+l:n.

• An operation count for the algorithm can be obtained as follows. Specifically.The natural way to accumulate the transformations is to set Q = /. However. the reduction is complete. to the array Q[l:n. k+l:n]. since HI is almost full.86 CHAPTER 2. and a complex multiplication requires two additions and four multiplications. There are several comments to make about it. only affects Q[k+l:n." When A is real. fc+1. a complex addition requires two ordinary floating-point additions. Algorithm 2. When A is complex. the term flam is an abbreviation for "floating-point addition and multiplication.e. which gives a total of n3 flam. An alternative is to save the vectors generating the Householder transformations and accumulate them after the reduction in the order #l(---(#n-4(#n-3(#n-2g)))--')- It is easy to see that this order fills up Q from the southeast to the northwest. the flam actually represents a floating-point addition and multiplication on the machine in question. n]. additions and multiplications are composite operations. The generation of the transformations is negligible.2 implements the reduction to upper Hessenberg form. so that the application of Hj.. however. In this implementation we have saved the generating vectors in Q itself and initialize Q to the identity column by column as the loop regresses. and then accumulate the transformations in the order (•••(((g#i)#2)#3)--0#n-2. THE QR ALGORITHM produces the next matrix. we will have to apply the subsequent Hf. Hence: The* /^r\f»rotif^n fnimf fr\r A Irmnthm 1 O ic flam for the redcatiof of a flam for the accumulatiof oif Q • As mentioned above. Thus in complex arithmetic the flam is . Thus the count for the premultiplication is about The postmultiplication at the k stage requires about 2n(n—k) flam. This has the advantage that the reduction and the accumulation can be done in a single loop. • The matrix Q is the product H\H2 — -Jfn-2. n—2) transformations. The operation count for the accumulation of Q is the same as for the premultiplication. At the Arth stage the premultiplication requires about (n-fc) 2 floating-point additions and multiplications (flams) to compute v and a like number to update H. Note that after three (i.

This fact can be verified directly from the algorithm.. Most packages choose this option to save storage. then the algorithm overwrites A with its Hessenberg form. Am be generated by the recursion where Rk and Sk are unitary matrices generated from Ak.4.. The stability result just cited is typical of many algorithms that proceed by a sequence of orthogonal transformations. For later reference. the first column of the orthogonal kmatrix ! generated by Algo rithm 2. 2. • The algorithm is backward stable. • An extremely important observation is the following. We will use it later in deriving the implicit shift version of the QR algorithm. Here fl denotes floatingpoint computation with rounding unit eM and includes any errors made in generating and applying Rk and Sk. THE EXPLICITLY SHIFTED QR ALGORITHM 87 equivalent to four ordinary flams. Specifically.Set P = Rm-i'"Ri and Q = Si~-Sm-i. the computed Hessenberg form is the exact Hessenberg form of A + E. A^.2 is e1.. where ||E\\ is of the order of the rounding unit compared to \\A\\. For the implications of this result see the discussion following Theorem 1.22) is STABLE IN THE USUAL SENSE if there is a matrix E and a slowly growing function 7m such that Vol II: Eigensystems .SEC. and the operation counts above must be adjusted by a factor of four. If we replace H by A.• • #n-2 • Then there is a matrix E satisfvine such that the computed Hessenberg matrix H satisfies Here 7n is a slowly growing function of n. we make the following definition.3. In the sequel we will report operation counts in terms of the arithmetic operations—real or complex—in the algorithm and leave the conversion to machine operations to the reader. Let the sequence AI. In other words. let Hk be the exact Householder transformation generated from the computed Ak and let Q = HI . Definition 2. • The algorithm generates H in situ.. Then we say that the sequence (2.

The proof is constructive and involves the use of 2x2 unitary transformations.88 and Am = P(Al+E)Q. Specifically. Plane rotations can be used to introduce zeros into a 2-vector. but it is more efficient to use another class of transformation—the plane rotations. We could easily use Householder transformations for this purpose. let a and 6 be given with a ^ 0. j)-plane is a matrix of the form . If we set v = ^/\a\2 + |6|2 and then (note that c is real) We can apply plane rotations to a matrix to introduce zeros into the matrix. A plane rotation (also called a Givens rotation) is a matrix of the form where The transformation is called a rotation because in the real case its action on a two vector x is to rotate x clockwise through the angle 9 = cos"1 c. Plane rotations CHAPTER 2. THE QR ALGORITHM We have mentioned that a QR step leaves a Hessenberg matrix invariant. a plane rotation in the (t. Specifically.

They can also be simplified when a and 6 are real.24).)-plane on a matrix. 2. By an appropriate choice of c and s we can introduce a zero anywhere we want in the zth or jth row. a simple routine to combine two vectors will serve to implement the general application VolII: Eigensystems .23). and we can introduce a zero anywhere in the it\\ or jth column. Postmultiplication by P^ affects the columns in a similar way. These formulas can be considerably economized (for example. we could replace r by | Re(a)| + | Im(a)| + | Re(a)| +1 Im(a)|.3 generates a plane rotation satisfying (2. . • The constant r defined in statement 9 is a scaling factor designed to avoid overflows in statement 10 while rendering underflows harmless. thus avoiding some multiplications and a square root in statement 9). • The final values of the application of the rotation to a and 6 are returned in a and 6. Since the application of a rotation to a matrix alters only two rows of the matrix. which saves work when we apply the rotation to a matrix. Because our pseudocode can alter the arguments of routines this means that if a and 6 are elements of a matrix then those elements are reset to the values they would assume if the rotation were applied. If a is zero. the routine returns the rotation that exchanges two components of a vector. THE EXPLICITLY SHIFTED QR ALGORITHM 89 In other words. Four comments on this algorithm.. a rotation in the (i.j)-plane combines rows i and j and leaves the others undisturbed. then Thus premultiplication by a rotation in the (i. • For the sake of exposition we have stayed close to the formulas (2. j)-plane is an identity matrix in which a plane rotation has been embedded in the submatrix corresponding to rows and columns i and jTo see the effect of a rotation in the (i. then the routine returns the identity rotation. let X be a matrix and let If X and Y are partitioned by rows. The definition we have chosen makes c real.SEC. Algorithm 2. • A simpler and in many ways more natural definition of c and s is which makes the final value of a equal to v. • If b is zero.

end rot gen Algorithm 2. if (6 = 0) 3. The program overwrites a with its final value and b with zero. rotapp(c. 6 = 0 return 7.s. 6= 0 15. s = 0.4: Application of a plane rotation o . 6. v = rHVIa/rp+lfc/rl 2 11. c=|a|/i/ 12. c. return 4. ^ = a/\a\ 9r = H+l&l 10.3: Generation of a plane rotation o This algorithm takes a plane rotation P specified by c and s and two vectors x and y and overwrites them with 1. s = 1.*b/v 13.x.24) from the quantities a and b. 2. c = 1. 4. end if 8. end if 5. 5 = /j. c = 0. s) 2.y] £ = c*x-f5*2/ y = c*y—s*x ar = * end rotapp Algorithm 2. The cosine of the rotation is real. 5.90 CHAPTER 2. a = 6. 1. If(a = 0) 6. 3. a = z/*// 14. THE QR ALGORITHM The following algorithm generates a plane rotation satisfying (2. rotgen(a.

In particular. Thus our transformed pair of rows is 1 2 3 4 5 X 0 0 X X X X 0 X X In this example we have actually lost a zero element. leaving context to decide the actual count. The element of your choice becomes zero (X in column 2). Then the following table gives the calling sequences for four distinct applications of P to a matrix X. it is useful to think of the transformations as a game played on a Wilkinson diagram. A practical implementation would do something (e. X [ : .i].^X(j.X[:. 4. • Because the cosine is real.']. Algorithm 2. X[:.:}) XP : wtapp(c. Vol II: Eigensystems . in which case the count for each pair of elements is 8 flmlt -f 4 fladd. PX :rotapp(c. -s. We will often refer to these counts collectively as a flrot.X{i. j ] ) XPR:rotapp(c. 1..SEC. the cost of combining two components of x and y is 12 flmlt + 8 fladd. THE EXPLICITLY SHIFTED QR ALGORITHM 91 of plane rotations. A pair of O's remains a pair of O's (column 3).g. In the real case the count becomes 4 flmlt + 2 fladd. we must multiply real rotations by complex vectors. Here are the rules for one step of the game.-s. A mixed pair becomes a pair of X's (columns 4 and 5). write out the operations in scalar form) to insure that a storage allocator is not invoked every time rotapp is invoked. Here are some comments on the routine. 3. plane rotations are frequently used to move a nonzero element around in a matrix by successively annihilating it in one position and letting it pop up elsewhere—a process called bulge chasing. • The routine rotapp has many applications. 2. Begin by selecting two rows of the matrix (or two columns): 1 2 3 4 5 X X 0 X X 0 X 0 0 X Then after the transformation the situation is as follows. In designing algorithms like this. A pair of X's remains a pair of X's (column 1).s.j]) • The routine rotapp uses a vector t to contain intermediate values. Suppose that P is a rotation in the (i. In some algorithms.X[j.']} pKX:wtapp(c. j)plane defined by c and s. t].s.X(i. Transformations that can introduce a zero into a matrix also have the power to destroy zeros that are already there.X[:.4 is such a routine. 2.

. n » so that H is the result of the single. This step is illustrated by the Wilkinson diagrams in Figure 2.3 illustrates this postmultiplication step. Specifically.TQ = TP^ • • • P*J_i.92 CHAPTER 2.. after the secon . Thus the final matrix H is in Hessenberg form. Since shifting the step consists only of altering the diagonal elements at the beginning and end of the step. we consider only the unshifted step.5 implements these two steps. Thus we have shown that a QR step preserves upper Hessenberg form and requires 0(n 2 ) operations. In the first part.n with Pk. Let H be the upper Hessenberg matrix in question.. and we have shown: If a Qr step is performed on an upper GHessenberg matrix. Pn-i. Algorithm 2. we will use plane rotations to implement the basic QR step.e. • The application of plane rotations dominates the calculation. It is easy to see that the algorithm requires about n2 flrot. The QR step is divided into two parts. fc-|-l)-plane so that T = Pn-i. Whether the matrix is real or complex.n • • -Pi2. then Q is the Q-factor of H — i. this translates into O(n2) operations. Figure 2. The point to observe is that the postmultiplication by P^-+1 introduces a nonzero in the (i+1.2. • The pre.and postmultiplications can be interleaved. unshifted QR step. we determine rotations Pi2. t)-element but leaves the other zero elements undisturbed. the unitary Q of the QR algorithm. the resul.2: Reduction to triangular form in QR step Invariance of Hessenberg form The reason for reducing a matrix to Hessenberg form before applying the QR algorithm is that the form is invariant under a QR step and the operation count for performing a QR step is 0(n2).n ' • • Pi^H is upper triangular. THE QR ALGORITHM Figure 2..t is upper Hessenberg. Note that if <2H = -Pn-i. Two comments are in order..k+i acting in the (fc. The second step is to form H . To establish these facts.

$k) rotapp(ck. H[l:k+ltk+l]) end for k Algorithm 2. H[k+l. k]. k+l:n]) end for k fork = lton-l rotapp(ck. H[k. for k = 1 ton—1 rotgen(H[k. ck. k+l:n]. H[l:k+l. 1. k]. 6. sk. 7. 5. sk. 3. THE EXPLICITLY SHIFTED QR ALGORITHM 93 Figure 2. H[k+l. 2. 2.k]. 4.3: Back multiplication in QR step /s This algorithm applies one unshifted QR step to an upper Hessenberg matrix H.5: Basic QR step for a Hessenberg matrix o Vol II: Eigensystems .SEC.

the perturbation E is of a size with the perturbation due to rounding the elements of A. The process continues with the generation and application of Pk. In this subsection we will work our way to a practical algorithm for reducing an upper Hessenberg matrix H to Schur form.94 CHAPTER 2.26).to zero amounts to replacing A by A = A .. and we can save computations. However. we will assume that the current value A of A has been computed without error. setting /&*_!. When e equals the rounding unit £M.• to zero. so that there is an orthogonal matrix Q such that QHAQ — A. THE EXPLICITLY SHIFTED ALGORITHM Algorithm (2. If the elements of A are roughly the same .. • The method is stable in the usual sense (Definition 2. albeit slowly. then the problem deflates.-+! .n-i will tend rapidly to zero.k+i being followed by the application of P^_1 ^.-+!.•+!. A natural criterion is to choose a small number e and declare /it+i?l negligible provided where A is the matrix we started with. 2. If any subdiagonal element can be regarded as negligible.5) exists primarily to show the invariance of upper Hessenberg form under a QR step. Detecting negligible subdiagonals From the convergence theory in §2. if hi+i^ satisfies (2.2 suggest that other subdiagonal elements may also tend to zero..-ei+ieJ1. to claim this bonus. let us examine the consequences of setting /&.1 we have reason to expect that when the shifted QR algorithm is applied to an upper Hessenberg matrix H the element /&n.4. the results in §2.4)./&. THE QR ALGORITHM rotation (P^ in Figure 2.. At the same time. Now setting'/». If we set then Moreover. To simplify the discussion..2) has been premultiplied we can postmultiply by PJ^. we must first decide what it means for a subdiagonal element to be negligible. To see why. then In other words.• to zero amounts to making a normwise relative perturbation in A of size e.

we must distinguish two cases. The third subdiagonal element has also become negligible. The first is when we only wish to find the eigenvalues of A. A typical situation is illustrated by the following Wilkinson diagram: Here the seventh. In practice.>t-| + |/i.26) becomes less compelling. In that case we need only transform Vol II: Eigensystems . so that the remaining eigenvalues are contained in the leading principal submatrix of order three and in the principal submatrix of order four whose elements are labeled 6.SEC. For more on this point see the discussion following Theorem 1. As the algorithm progresses.27) easy to derive. It is worth observing that we used the Frobenius norm in (2. subdiagonal elements in other positions may also deflate. however. In such cases. Deflation Once a subdiagonal element has been deemed insignificant. If we decide to treat the latter first. eighth. the matrix frequently has small eigenvalues that are insensitive to small relative perturbations in its elements. we can deflate the problem. something the criterion (2. however.26) only because its unitary invariance made the bound (2. The usual cure for this problem is to adopt the criterion If by chance |/i. THE EXPLICITLY SHIFTED QR ALGORITHM 95 size. one can revert to (2. one can replace \\A\\p by any reasonably balanced norm—say the 1. then we need only perform QR steps between rows and columns four and seven.26). and ninth subdiagonal elements have become negligible. and tenth diagonal elements. 2. ninth. When the elements of A are not balanced—say when the matrix shows a systematic grading from larger to smaller elements as we pass from the northwest corner to the southeast—the backward error argument for the criterion (2. In the shifted QR algorithm subdiagonal elements will usually deflate beginning with the (n._i>z to zero only if it is small compared to the nearby elements of the matrix. n—1)-element and proceeding backward one element at a time.26) does not insure.3.or oo-norm.+i)Z-+i| is zero. then this is as small a perturbation as we can reasonably expect. In this case it is important that we set /i. revealing three eigenvalues at the eighth. In applying the rotations.

z'7. Negligible subdiagonal elements are set to . end if 14. il = il-l 13. the transformations must be premultiplied into the elements labeled 6 and c and postmultiplied into the elements labeled 6 and d. On the other hand. 1. end backsearch Algorithm 2. Algorithm 2. Back searching In (2. else 12. in which case the matrix deflates at rows il and 12. It either returns distinct indices il and i'2. 5. we must determine them on the fly by applying whatever criterion we choose to decide if a subdiagonal element is zero. in which case the QR step is to be taken between these indices. 4. THE QR ALGORITHM This algorithm takes a Hessenberg matrix H of order n and an index i (1 < t < n) and determines indices il. In particular. end while 15. while (*7>1) if (#[i7. In this case the transformations must also be accumulated into the matrix Q from the initial reductions to Hessenberg form. This is usually done by searching back along the subdiagonal from the current value of i2. In practice. 2.29). For details see Algorithm 2.29) we can determine by inspection the rows—call them il and /2—that define where the QR step is to be applied. 2. leave while 10. backsearch(H'. i2 < t such that one of the following two conditions holds. *2) /7 = i2 = l 3. 1. 7. ^. 1 < il < i2 < I. i7-l] is negligible) #[i7.8. and the QR algorithm is finished. 1 = il = i2. if we wish to compute a Schur decomposition of A. else 9.6 implements such a loop.6: Finding deflation rows in an upper Hessenberg matrix o the elements labeled b in (2. 6. in which case all the subdiagonal elements of H in rows 1 through I are zero.i7-l] = 0 if(il = i2) i2 = il = il-l 8.96 CHAPTER 2. we must apply the transformations to the entire matrix. end if 11. or the matrix is completely deflated.

i2]. However. 12] within which to perform a QR step. 2. as the method converges—that is. This shift can move the QR iteration into the complex plane. i2— 1] approaches zero.SEC. THE EXPLICITLY SHIFTED QR ALGORITHM zero as they are detected. note that Vol II: Eigensystems . some care must be taken to insure the stability of the resulting algorithm. since it gives local quadratic convergence to simple eigenvalues. Instead we should compute the largest root Amax first and then compute the smallest root A^n from the relation AminAmax = —q. The procedure is to compute the eigenvalues from the characteristic polynomial using the usual formula for the roots of a quadratic equation. To determine the largest root. we will consider the matrix Thus we want the eigenvalue of B nearest to d. if H is real. The computation of the shift requires the solution of a 2x2 eigenvalue problem. we must choose a shift. The problem simplifies if we work with the shifted matrix The Wilkinson shift then becomes K = A + d. However. as H [/2. i2}. To simplify our notation. The natural candidate is the Rayleigh-quotient shift K = H[i2. the shift approaches the Rayleigh-quotient shift so that we retain the quadratic convergence of the latter. where A is the smallest eigenvalue of B-dl. Moreover. The characteristic eauation of B — dl is The roots of this equation are To avoid cancellation we should not compute the smallest root directly from this formula. which is defined as the eigenvalue of the matrix that is nearest H[i2. the Rayleigh-quotient shift is also real and cannot approximate a complex eigenvalue. The cure is to use the Wilkinson shift. The Wilkinson shift 97 Once we have found a range [il.

d. This scaling factor can be replaced by the sum of the absolute values of the real and imaginary parts of the elements of B. c. • The routine hqr combined with hessreduce (Algorithm 2. 9 = (b/s)*(c/s) 6. p = 0. K=d 3. THE QR ALGORITHM B a =( 5) \c dj wilkshifi returns the eigenvalue of B that is nearest d. if (5 = 0) return fi 5.7 implements the computation described above.7: Computation of the Wilkinson shift o Consequently we want to choose the sign ± so that the middle term in (2. there is no theory . • The convergence theory developed in §2. the algorithm scales B so that its largest element is one in magnitude.98 Given the elements of the matrix CHAPTER 2. Algorithm 2.5*((a/s) . • The routine hqr is stable in the usual sense (Definition 2. if(g^O) 7. end wilkshift Algorithm 2. Algorithm 2.(d/s)) 8. r = -y/p2 + g 9.s*(<jf/(p+r)) 11. 5 = |o| + |6| + |c| + |d| 4. Unfortunately. 6. AC = AC . 1. Here are some comments on the algorithm. To avoid overflow. provided an appropriate deflation strategy is used. wilkshift(a. Reduction to Schur form We can now assemble the algorithmic fragments developed previously into an algorithm for the unitary triangularization of an upper Hessenberg matrix.30) is positive. end if 12.4). it will converge quadratically. which is cheaper to compute. if (Re(p)* Re(r) + Im(»* Im(r) < 0) r = -r fi 10.1 says that once hqr starts converging to a simple eigenvalue. K) 2.8 computes the Schur form of an upper Hessenberg matrix by the QR algorithm.2) gives an algorithm for computing the Schur form of a general matrix.

st-. ro?gen( . 25. 19. #[z7.i7] = ^T[i7. t]. wi7jbfc//r(#[*2-l. /2. fori = /7toi2-l 14. if (iter > maxiter) error return. iter . if(i2^0W/2)iter=Ofi 11.H" [z.«]. The similarity transformation is accumulated in Q.i7]~K 13. i+1]) ^[i.i+l:n]) 17. 20. Q. ^T[z. oldi2 = 12 8. Q[l:n. i+l:n]. end for i 18./2] + « end while end /i<7r Algorithm 2. 1. 24. 5t-) 15. for i = i/toi2-l rotapp(ci.8: Schur form of an upper Hessenberg matrix o VolII: Eigensystems . il = 1.-. iter — iter+1 6. i2 = n 3. hqr(H.K 16. cj.SEC. 23.i] = 5r[i. 21.i2].z2]. 5" [t+1. Q[l:n. 5"[i+l. st-.*] + « 22. end for z #[i2. i]. JT[i2. 2. fi 7. H[i+l.i2] = ^[i2.i2-l]. z7. rotapp(ci. fT[l:t+l. i+1] . backsearch(H'./2-l]. z'2) 9.0 4.«]. maxiter) 2. if (i2 = 1) leave fc^rfi 10. JI[i2. #[/2-l. while (true) 5.«+l]) rotapp(ci. s. i+1] = H[i+l. hqr overwrites it with a unitarily similar triangular matrix whose diagonals are the eigenvalues of H. Jf [!:«+!. THE EXPLICITLY SHIFTED QR ALGORITHM 99 Given an upper Hessenberg matrix H.K) 12. The routine gives an error return if more than maxiter iterations are required to deflate the matrix at any eigenvalue.

29). Typically. Eventually the subdiagonals of H become small enough so that quadratic convergence is seen from the outset. the pre. the demand for eigenvectors is so great that all packages have an option for their computation. which gives further savings. • The operation count is dominated by the application of plane rotations. the matrix deflates at the upper end. In some applications a Schur decomposition is all we need. Thus we can compute the eigenvectors of A by computing the eigenvectors of T and transforming them by We have already derived a formula for eigenvectors of a triangular matrix (see p. and the upper bound (2. somewhat less to compute the second eigenvalue. i'2]. The bound is approximately • If we are only interested in computing the eigenvalues of H. In most applications. THE QR ALGORITHM that says that the algorithm is globally convergent. Given il < 12. An overall operation count depends on the number of iterations to convergence and the behavior of the ranges [j7. If A = QTQU is the Schur decomposition of A and X is the matrix of right eigenvectors of T. We can derive an upper bound by assuming that il = 1 and that it takes k iterations to find an eigenvalue. Eigenvectors We have now seen how to compute the Schur decomposition of a general matrix A.5. if we partition Q^ . 3). Moreover. if the matrix is large enough. however.and postmultiplication of H requires about n(i2 — il) flrot. we no longer need to accumulate the transformations. neither of which is known a priori.31) becomes f fcn3 flrot. hqr gives an error return when in the course of computing any one eigenvalue a maximum iteration count is exceeded. i2] to 2(i2 . This reduces the operation count for a QR step in the range [i7. if A is defective it does not exist. hqr takes several iterations to deflate the first eigenvalue. Still later. Nonetheless. Most real-life implementations do not return immediately but instead try an ad hoc shift to get the algorithm back on track. so that il > 0.il)2 flrot. we may restrict the range of our transformations to the elements labeled 6 in the Wilkinson diagram (2. For more see §2. The accumulation of the transformations in Q requires the same amount of work to bring the total to 2ra(i2-i2) flrot. what is wanted is not the Schur vectors but the eigenvectors of A.100 CHAPTER 2. then the matrix of right eigenvectors of A is QX. Specifically. It should be stressed that the eigenvalue-eigenvector decomposition is not a stable decomposition—in fact. and so on. • To guard against the possibility of nonconvergence.

then 101 is an eigenvector of T. for j = nto 1 by -1 3. If we partition this equation in the form then the second row of the partition gives the equation (r . THE EXPLICITLY SHIFTED QR ALGORITHM and Tkk is a simple eigenvalue of T. x(j] = x x[j}/(T(jJ}-») 4. which we further reduce by a recursive application of the same procedure.32). z[l:j-l] = a[l:j-l] . whence This is a system of order one less than the original. The cure Vol II: Eigensystems . then some diagonal of TU may be equal to Tkk. end for k This is essentially the algorithm we will use to compute eigenvectors. 2. as in (2.SEC. First. However. in our final algorithm the shift /^ will be a diagonal of our original matrix T.33) we have (T* .j] 5.x[j]*T[l:j-l. It follows that if we write the fcth eigenvector in the form (k} then we can obtain x\ ' by solving the upper triangular system To derive an algorithm for solving (2. x = b 2. All this gives the following algorithm. it has two drawbacks. let us drop subscripts and superscripts and consider a general algorithm for solving the equation (T — ///)# = 6. If T has multiple eigenvalues.fil)x* + £t* = &*. in which case our algorithm will terminate attempting to divide by zero.32). 1. from which we get From the first row of (2.//)£ = (3.

the attempt to compute the eigenvectors can result in overflow. which can be adapted to compute right eigenvectors. Since simple eigenvectors are determined only up to a scalar factor. the eigenvector itself must be sensitive to small perturbations in A—i. The strategy is a variant of one found in LAPACK. if a component of the solution is in danger of overflowing. the diagonal element is adjusted to keep the divisor d from becoming too small. the algorithm has an operation count of |n3flam. we may scale the vector so that the offending component has modulus one. • The cost of rescaling can be comparatively large. This may result in the underflow of very small components.||/||TJ| < *y€M for some modest constant 7. in which case we replace it with a number whose reciprocal is safely below the overflow point. THE QR ALGORITHM for this problem is to not allow T[k. if underflows are set to zero.102 CHAPTER 2. For example. the cost is ^n3 flmlt.9 below we restrict it to be greater than 6M/^ unless that number is itself too small.27)].e. where underflow is a number just above the underflow point. if scaling is done at every step in the computation of every eigenvector. For a further discussion of these matters see §1:2. In such cases. • As we mentioned above. This means that for the procedure to introduce great inaccuracies in the computed eigenvector. Here are some comments. The second problem is that in some applications eigenvectors can have elements of widely varying sizes. it must be ill conditioned. • If there is no rescaling. in which smallnum is taken to be ^underflow. • The algorithm is column oriented in the sense that the matrix T is traversed along its columns. For this reason the algorithm rescales only when there is a serious danger of of overflow — specifically. by projecting the error made back to the original matrix A [see the discussion leading to (2.. we are free to scale the solution. This orientation is to be preferred for programming languages like FORTRAN that store arrays by columns. we see that the replacement corresponds to a small perturbation in A. Algorithm 2. however.3. It is worth remem- . In particular. This algorithm is to be preferred in languages like C that store their arrays by rows.9 computes the right eigenvectors of a triangular matrix. In Algorithm 2. Many people are somewhat nervous about replacing an essentially zero quantity by an arbitrary quantity of an appropriate size. However. • The algorithm is stable in the following sense. Each eigenvector Xi satisfies where ||jE". which is larger than the cost of the unsealed algorithm. &]—// to be too small. There is a row-oriented version of the basic algorithm for solving triangular systems. the results will generally be satisfactory. This implies that if the eigenvector is not ill conditioned then it will be accurately computed. when the component to be computed would be bigger than a number near the overflow point of the computer.

if(\X[j.k] = -T[l:k-l. end if 16.k]\ 14. 1. endforj 19.k] 15.k] 6. *]*T[l:j-l. X[l:j-l.k] 11. X[k+l:n. 2.k] = X\j. end righteigvec Algorithm 2. THE EXPLICITLY SHIFTED QR ALGORITHM 103 Given an upper triangular matrix T. righteigvec(T.k]/\\X[l:k. X[kjk] = l 7. bignum = a number near the overflow point 4.k]\/bignum>\d\) 13. X[l:k-l.k]\\2 20.SEC.j]-T[k. X[l:k. if (\d\ < dmin) d = dmin fi 12.9: Right eigenvectors of an upper triangular matrix o Vol II: Eigensystems . k} . forj = fc-ltolby-1 10. smallnum = a small number safely above the underflow point 3.k]/d 17. X) 2. d = T\j. k ] = 3*X[l:k. end for k 21. j] 18. k] \.X ( j .k] = Q 8. dmin = max{ eM * | T [k. smallnum} 9. s = \d\/\X{j. X [ l : k . this routine returns an upper triangular matrix X of the right eigenvectors of X. The columns of X are normalized to have 2-norm one. k] = X[l:j-l.k] = X[l:k. X\j. for k — n to 1 by -1 5.

Thus we can also obtain the left eigenvectors of a matrix by inverting the matrix of right eigenvectors.Thus we can also find left eigenvectors by solving triangular systems. However. If X is the matrix of right eigenvectors of a diagonal matrix. If biorthogonality is required. which continues to hold for the eigenvectors of the original matrix. However. there are modifications and extensions that can improve the performance of the algorithm.5. But this will not work in cases where we want only a few left and right eigenvectors. provides an easy diagnostic for routines that purport to compute eigenvectors: compare the scaled residual against the rounding unit. the resulting matrix may not be the inverse of the matrix of right eigenvectors — we say that the right and left eigenvectors have lost biorthogonality. then FH = X~l is the matrix of left eigenvectors. If we compute the left eigenvectors separately. A detailed treatment of these topics would lengthen this chapter beyond reasonable .34) that the residual T{ — Txi — taxi satisfies Thus the norm of the scaled residual for each computed eigenvector is near the rounding unit. then directly inverting the matrix of right eigenvectors will give better results. so that as a whole the eigenvalueeigenvector decomposition is not necessarily stable. • Although we have chosen to compute the entire upper triangular array of eigenvectors. there is an alternative. This fact. the methods are not numerically equivalent. • It follows from (2. A similar algorithm can be derived to compute left eigenvectors. THE QR ALGORITHM bering that Ei is different for each eigenvector zt-.104 CHAPTER 2. then is a left eigenvector of T corresponding to Tkk. Unfortunately. BITS AND PIECES The algorithm sketched in the last subsection is an effective method for finding the eigenvalues (and eigenvectors. 2. Specifically if we partition T as usual in the form and assume that Tkk is a simple eigenvector of T. essentially the same algorithm could be used to compute a few selected eigenvectors. if desired) of a general complex matrix.

we will give a brief survey of the following topics: the ad hoc shift. Suppose we begin the QR step at row i. In such an event the natural action is to give an error return. THE EXPLICITLY SHIFTED QR ALGORITHM 105 bounds. where and Vol II: Eigensystems .SEC.8). If that fails one can either try again or give an error return. Instead. i—l)-element of H. For purposes of illustration we will assume that the elements in this diagram are real and that d is positive. consider the Wilkinson diagram where the upper-left-hand x represents the (i—2. Aggressive deflation If two consecutive subdiagonals of H are small but neither is small enough to permit deflation. we can sometimes start the QR step at the diagonal to the right of the first small element. After the reduction to triangular form. we have the following diagram. The LAPACK routine CGEEV takes the shift to be | Re(/^. An alternative is to restart the iteration with a new shift. deflation at consecutive small subdiagonal elements. 2. even with the Wilkinson shift. The subsection concludes with a discussion of graded matrices. It does not make much sense to try more than one or two ad hoc shifts. where i is the row where the problem is supposed to deflate. Specifically. The problem usually manifests itself by the algorithm exceeding a preset limit on the number of iterations (see statement 6 in Algorithm 2. There is little general theory to guide the choice of ad hoc shifts (which is why they are called ad hoc)._i)| + | Re(/ij_i >2 _2)|. and balancing. sorting for eigenvalues. commonly referred to as an ad hoc shift. Ad hoc shifts Convergence of the Hessenberg QR algorithm is not guaranteed.

if it exists. Consequently. . Cheap eigenvalues If the original matrix A is highly structured. if the product £162 is so small that it is negligible by whatever deflation criterion is in force. Moreover. Since the product of two small numbers is much smaller than either. and the computed value of the element ccj will simply be €1 —unchange from the original. we may be able to start the QR step even when e\ or €2 are not small enough to deflate the matrix. we can ignore it. Note that it is the square of ei and the product €162 that says whether we can start a QR step. if A has the form then the single nonzero element in the second row is an eigenvalue. it is possible that some eigenvalues are available almost for free. Now suppose that €2 is so small that ft(d2+€2) = d2.106 CHAPTER 2. The result is a permutation of A that has the form where T is upper triangular. Thus for c\ and e2 sufficiently small. THE QR ALGORITHM The postmultiplication by the rotations does not change elements in the second column. Then the computed value of c will be one. this matrix can be permuted so that it has the form We can then repeat the process on the leading principal matrix of order ra-1. the computed value of the element sci = ei^/1' is €i €2/d. For example. and so on until it fails to isolate an eigenvalue. It takes at most n2 comparisons to isolate such a row. At an additional cost of O(n2} operations. we can assume that the matrix is deflated at the element ej —at least for the purpose of starting a QR step.

Because of its cheapness. if it has. the form 107 then it can be permuted to the form Proceeding as above. for example. In some cases these disparities are a a consequence of the structure of the underlying problem.SEC. 2. the same as for a single QR step. In this case balance can often be restored by the following procedure. Balancing Many applications generate matrices whose elements differ greatly in size. by a poor choice of units of measurement. Let p ^ 0. and let Then (for n = 4) If we now require that p be chosen so that the first row and column are balanced in the sense that Vol II: Eigensystems . THE EXPLICITLY SHIFTED QR ALGORITHM Turning now to A. In others. we can permute A into the form where S is upper triangular. Properly implemented the process requires only 0(n2) operations per eigenvalue. We proceed by balancing the elements of a matrix one row and column at a time. however. caused. they are accidental. Thus we have deflated the problem at both ends. say. the process is usually made an option in matrix packages. Let us consider the case of the first row and column.

a matrix is graded if it shows a systematic decrease or increase in its elements as one proceeds from one end of the matrix to the other. any package that incorporates balancing should include an option to turn it off. we can go on to balance the second. a very loose convergence criterion will be sufficient to effectively balance the matrix. Moreover. It turns out that the QR algorithm usually works well with matrices that are graded downward by diagonals.108 then we find that CHAPTER 2. the computed eigenvalues of A are 5. The balancing procedure is relatively inexpensive. the method can be shown to converge to a matrix in which all the rows and columns are balanced. at times it can make things worse. For example. However. THE QR ALGORITHM Having balanced the first row and column.1034150572761617e-33 1. For example. We therefore say that A is graded downward by diagonals. Each step of the iteration requires only 0(n) operations.0515156210790460e-16 -1. Balancing often improves the accuracy of computed eigenvalues and eigenvectors. the third. cycling back to the first if necessary. and so on.5940000000000001e+00 . the matrix exhibits a symmetric grading from large to small as we proceed along the diagonals.while is graded downward by columns. For this reason. Although the balancing of one row and column undoes the balancing of the others. If we set then is graded downward by rows. Graded matrices Loosely speaking.

from which it is seen that the smallest eigenvalue has no accuracy at all. If we compute the eigenvalues of B.0515156534681920e-16 -1.1189999999999992e-17 O.Oe+00. The computed eigenvalues of D~1AD have relative errors of l.1190000000000004e-17 7.1e-08.5940000000000005e+00 The relative errors in these eigenvalues are 3.3e+00. 2.OOOOOOOOOOOOOOOOe+00 This display shows that the first column of the Hessenberg form is the same as the one obtained by first setting the element 1.6859999999999994e-33 1. the second smallest is accurate to only eight decimal digits. THE EXPLICITLY SHIFTED QR ALGORITHM 109 which are almost fully accurate.4e-01. Row and column grading can also cause the algorithm to fail. we get the following results. However. inspection of special cases will often show that the algorithm causes a loss of important information by combining large and small elements. 6.1e-01. while the computed eigenvalues of DAD~l have relative errors of 1.4e-16. the algorithm does not do as well with other forms of grading. For example. In particular the relative errors in the eigenvalues of the altered matrix B are 1.8e-16. reverse the rows and columns of A to get the matrix which is graded upward by diagonals.Oe+16. 2. and the largest is correct to working precision. For example. 2. The two smaller eigenvalues of both matrices have no accuracy.6859999999999994e-33 6. There is no formal analysis of why the algorithm succeeds or fails on a graded matrix. 1.2900000000000000e-24 -7.Oe+00. consider the first column of B and the corresponding first column of the Hessenberg form of B: 6.6859999999999994e-33 1.3e+00. 2.SEC. O. This is a small error compared to the norm of B.6e+00. On the other hand. 3. 3.3e-16.290e-24 to zero. Vol n: Eigensystems . O. but the smallest eigenvalue of B is quite sensitive to it.

Kublanovskaya[154. (For a systematic treatment of this development see [122].110 CHAPTER 2. which states that if the function defined by the power series has one simple pole r at its radius of convergence then r — lim^oo ak/ak+i.1954]. the LU algorithm is numerically unstable.1961] and Francis [76. since it cannot change the direction of the grading.37) for poles beyond the radius of convergence.1884]. 2. balancing B has no effect. complete with a preliminary reduction to Hessenberg form and the implicit double shift for real matrices.1955] then showed that if the numbers from one stage of the qdalgorithm were arranged in a tridiagonal matrix T then the numbers for the next stage could be found by computing the LU factorization of T and multiplying the factors in reverse order. When dealing with a graDED MATRIX ARRANGE THE GERsding downward BY TOWS COLUMNS. balancing DAD l and D 1AD •estores the lost accuracy.Later Hadamard [106. Aitken [1.. Francis went beyond a simple substitution and produced the algorithm that we use today.6.1892] gave determinantal expressions in terms of the coefficients of (2. Rutishauser consolidated and extended the recurrences in his qd-algorithm [222. NOTES AND REFERENCES Historical The lineage of the QR algorithm can be traced back to a theorem of Konig [ 151. We do not claim that this strategy is foolproof. The result was what is known as the LR algorithm. . THE QR ALGORITHM Balancing can help in this case.1958]. Thus the algorithm preceded the decomposition. For example. He generalized the process to a full matrix and showed that under certain conditions the iterates converged to an upper triangular matrix whose diagonals were the eigenvalues of the original matrix. 223.1926] independently derived these expressions and gave recurrences for their computation. and therefore anyone employing it should take a hard look at the results to see if they make sense. This suggests the following rule of thumb. Except for special cases. like positive definite matrices. On the other hand. However.1961] independently proposed substituting the stable QR decomposition for the LU decomposition. The name of the QR decomposition comes from the fact that in Francis's notation the basic step was to decompose the matrix A into the product QR of an orthogonal matrix and an upper triangular matrix.) Rutishauser [224. For more historical material see Parlett's survey [204] and the paper [73] by Fernando and Parlett. I have heard that the Q was originally O for "orthogonal" and was changed to Q to avoid confusion with zero. Later Rutishauser introduced a shift to speed convergence [225. OR DDIGONALS AND BALANCE THE MATRRIX.

173]). mutatis mutandis.5 for notes and references on Householder transformations and plane rotations The treatment here differs from that in many textbooks in its emphasis on com plex transformations. Ch. Hessenberg form The reduction of general matrix to Hessenberg form is due to Householder [123] and Wilkinson [297]. The QR algorithm. is due to Francis [76]. As mentioned in the text. although the first explicit statement of the relation is in a textbook by Stewart [253. and the power method The relation of the QR algorithm to the inverse power method was known to J. For more on complex Householder transformations see [159]. His general approach to backward error analysis of orthogonal transformations is given in his Algebraic Eigenvalue Problem [300] and is the basis for most of the rounding-error results cited in this and the next chapter. 8]. no general analysis seems to have appeared in the literature.1. The shifted variants of the QR and QL algorithms exhibit fast convergence in the last and first rows. as was the convergence result in Theorem 2. 2. Theorem 2. QL. The stability analysis is due to Wilkinson. p. Kahan (see [206. which is so important to the implicitly double shifted algorithm. For the special case of the Rayleigh-quotient shift. however. The theory of the QR algorithm. [300. H. The proof given here is due to Wilkinson [301]. §§29-30. and LQ. Householder transformations and plane rotations See §1:4. however.SEC. Local convergence The local convergence results are adapted from [264]. applies to the other three variants. Watkins and Eisner [292] have given a very general convergence theory that holds for both the QR and LR algorithms. those of the RQ and the LQ. the inverse power method. which was originally established using determinants. It is possible to use nonorthogonal transformations to reduce a symmetric matrix to tridiagonal form—a process that bears the same relation to Givens' and Householder's method as Gaussian elimination bears to orthogonal triangularization.2. The relation to the power method was known from the first. For more details on the QL algorithm. RQ. in the first and last columns. this result is implied by work of Ostrowski on the Rayleigh-quotient method [193]. see §1. Wilkinson and W. who worked out the numerical details. is not as compelling as the case for GausVol II: Eigensystems . p. THE EXPLICITLY SHIFTED QR ALGORITHM Variants of the QR algorithm 111 There are four factorizations of a matrix into a unitary and a triangular matrix. where R and L stand for upper and lower triangular. 352]. Chapter 3. The case for nonorthogonal reduction. which we may represent symbolically by QR.3.1. the iteration converges quadratically to a nondefective multiple eigenvalue.

For further details. . and the method is not used in the major packages. The Hessenberg QR algorithm As this subsection makes clear.28). §6. the implementation of the QR algorithm is a large task. "How small is small?" He began. Its use in the general QR algorithm is traditional. THE QR ALGORITHM sian elimination. many people felt that such small elements could retard the convergence of the implicitly shifted QR algorithm (to be discussed in the next section). and our final result—Algorithm 2... His talk was punctuated by phrases like. "Here is what I do. The observation that the QR step preserves Hessenberg form is due to Francis [76]. The Wilkinson shift The Wilkinson shift was proposed by Wilkinson [302] for symmetric tridiagonal matrices. "When a subdiagonal element becomes sufficiently small. there is little to do but go to the better packages and look at the code. In 1965 I attended a series of lectures on numerical analysis at the University of Michigan. where it insures global convergence of the QR algorithm. "But if you want to argue about it." So it remains to this day. For details and an error analysis see [300. Deflation criteria A personal anecdote may illustrate the difficulties in finding satisfactory criteria for declaring a subdiagonal element zero. Preprocessing The balancing algorithm (p. The computational advantages of this procedure are obvious. I would rather be on your side. along with the strategy for the initial deflation of eigenvalues. sparsely commented. In addition. . The LAPACK [3] codes are state of the art—at least for large matrices—but they are undocumented. 107) is due to Osborne [187] and was used." After one of the lectures I asked him. However. in the Handbook [208] and in subsequent packages. He ended by saying. as do their EISPACK translations into FORTRAN [81]. see [50]. Aggressive deflation The idea of deflating when there are two consecutive small subdiagonal elements is due to Francis [76]. For another scaling strategy." and described the relative criterion (2.8-19]. Watkins [289] has shown that the shift is accurately transmitted through the small elements. where Jim Wilkinson presented the QR algorithm. The Handbook ALGOL codes [305] are well documented but show their age.. and difficult to read.8—cannot be regarded as a polished implementation. which works well with sparse matrices.112 CHAPTER 2.

1. complex arithmetic is quite expensive and to be avoided if at all possible. Since real matrices can have complex eigenvalues and eigenvectors. 3.3 we pull the material together into an algorithm for computing the real Schur form. Finally.SEC. It turns out that these inefficiencies can be removed if the matrix in question is Hessenberg. Since the concern is with solving the real eigenproblem: In this section A will be a real matrix of order n. 3. VolII: Eigensy stems . in §3. This matrix is then passed on to a variant of the QR algorithm—the QL algorithm—that also works from the bottom up. Unfortunately.36). which can also be used to solve generalized eigenproblems and compute the singular value decomposition. Real Schur form A real Schur form is a Schur form with 2x2 blocks on its diagonal. Hence in §3. The adaptation involves a technique called implicit shifting. In this section we will show that there is a real equivalent of the Schur form and that the QR algorithm can be adapted to compute it in real arithmetic. The fact that the direction of the grading must be taken into account was noted for symmetric matrices by Martin. The following theorem establishes the existence of such decompositions. REAL SCHUR FORM In this subsection we derive the real Schur form and present a preliminary algorithm for its computation. and Wilkinson [169]. THE IMPLICITLY SHIFTED QR ALGORITHM In the previous section we described a variant of the QR algorithm that is suitable for computing the Schur decomposition of a general complex matrix. each such block containing a conjugate pair of eigenvalues. We begin the section with the real Schur form and an inefficient algorithm for computing it. 3. because their version of the reduction to Hessenberg form starts at the bottom of the matrix and works up. it is generally impossible to avoid complex arithmetic if the actual Schur form is to be computed. they recommend that the matrix be graded upward. Contrary to (2. Reinsch.2 we establish an important result that enables us to take advantage of the Hessenberg form. For a general theory and references see [266]. THE IMPLICITLY SHIFTED QR ALGORITHM Graded matrices 113 There is little in the literature about general graded matrices.

Let A be real of order n. Chapter 1). let (A. one can verify that the eigenvalues of L are A and A. Then there is an orthogonal matrix U such that T = £7T AU is block upper triangular of the form The diagonal blocks ofT are of order one or two. The blocks of order one contain the real eigenvalues of A. Chapter 1.12. • Three comments. The blocks of order two contain the pairs of complex conjugate eigenvalues of A. Hence X\ = (y z)R~l. and It follows as in the proof of Theorem 1. R is nonsingular. except that we use two real vectors to deflate a complex eigenvalue.3. z) be a complex eigenpair with From the relations 2. Since y and z are independent (see Theorem 1. or By computing the characteristic polynomial of Z.12. its eigenvalues are A and A.or.vz and Az = w 4.LIZ. The blocks can be made to appear in any order. Specifically.y . THE QR ALGORITHM Theorem 3. Proof.114 CHAPTER 2. Now let be a QR decomposition of (y z). The proof is essentially the same as the proof of Theorem 1. we find that Ay = fj. 1 .y = x + x and Izi — x . Because RLR is similar to L. Chapter 1. that which completes the deflation of the complex conjugate pair A and A. Then (y z] = X\R.1 (Real Schur form).

2Re(«)# + W 2 /. Hence so are Q and H. By Theorem 2.2) can be done in real arithmetic.1) in which the blocks are of order one or two is said to be quasi-triangular. However. one with shift K and the other with shift K to yield a matrix H. if is the QR decomposition of (H . the very first step of this algorithm—the formation of H2 -2 RQ(K)H + \K\2I—requires 0(n3) operations. We can then apply the shifted QR algorithm (Algorithm 2. But is real. and then computing H = QTHQ. Here we will start with an expensive algorithm that points in the right direction. The starting point is the observation that if the Wilkinson shift K is complex. Suppose that we apply two steps of the QR algorithm. 3.Kl)(H .2 Re(/c) A +1«|2/. Such a space is called an eigenspace or invariant subspace. Unfortunately. It is an instructive exercise to work out the details.SEC.K!). At that point. • The relation implies that A maps the column space of X\ into itself. Such subspaces will be the main concern of the second half of this volume. THE IMPLICITLY SHIFTED QR ALGORITHM 115 • A matrix of the form (3. We begin by noting that the reduction of A to Hessenberg form (Algorithm 2. resulting in a real matrix H. computing its Q-factor Q. Thus the QR algorithm with two complex conjugate shifts preserves reality. Given the necessary eigenvectors. • Like the proof of Schur's theorem.1. one can reduce a matrix to real Schur form using Householder transformations. the proof of this theorem is almost constructive. A preliminary algorithm The QR algorithm can be adapted to compute a real Schur form of a real matrix A in real arithmetic. then H = QHHQ. We no turn to this property. This strategy of working with complex conjugate Wilkinson shifts is called the Francis double shift strategy. which makes the algorithm as a whole unsatisfactory. VolII: Eigensy stems .8) in real arithmetic until such time as a complex shift is generated. we need a new approach to avoid the introduction of complex elements in our matrix. then its complex conjugate K is also a candidate for a shift. it turns out that we can use a remarkable property of Hessenberg matrices to sidestep the formation of A2 . We can even avoid complex arithmetic in the intermediate quantities by forming the matrix H2 .

The two reductions are not essentially different. Now a unitary matrix has n(n-l)/2 degrees of freedom (see the discussion preceding Theorem 1. Formally we make the following definition. However.. Proof. let H = QEAQ be a unitary reduction of A to Hessenberg form. First. It turns out that this is essentially true. as the following argument suggests. since Q and Q differ only in the scaling of their columns by factors of modulus one. THE QR ALGORITHM 3. Second. Let A be of order n and let H = QHAQ be a unitary reduction of A to upper Hessenberg form.116 CHAPTER 2.2. given A.« We are now in a position to give conditions under which a reduction to upper Hessenberg form is essentially unique. Theorem 3.5) and remembering that q^qi = 1 and gj1^ — 0. The choice of scaling will also affect the elements of H\ specifically. and we will say that the reduction is determined up to column scaling ofQ. In either case the rescaling makes no essential difference. the matrices Q and H are determined by the first column of Q.12. where \D\ = /. IfH is unreduced then up to column scaling ofQ the matrices Q and H are uniquely determined by the first column ofQ or the last column ofQ.3 (Implicit Q theorem).2.. For example. we must assume that the subdiagonals of H are nonzero. Since we must use (n— l)(n-2)/2 of the degrees of freedom to introduce zeros in A. This reduction is clearly not unique. just enough to specify the first column of Q. Let H be upper Hessenberg of order n. hij is replaced by tiihijtij. then H = Q^AQ is also a reduction to upper Hessenberg form. It is therefore reasonable to conjecture that. If Q = QD. then the first column of this relation is Multiplying (3. Definition 3. Consider the relation If we partition Q = (q1 »• qn) by columns. we must introduce (n-l)(ra-2)/2 zeros into A. . Chapter 1). we have n-1 degrees of freedom left over in Q. we have .. but with two provisos. the QR algorithm produces a sequence of Hessenberg matrices each of which is similar to the original matrix A. there is a limit to this nonuniqueness. Then H is UNREDUCED if fct+i f » ^ 0 ( i = 1. In reducing A to Hessenberg form H by a unitary similarity. THE UNIQUENESS OF HESSENBERG REDUCTION Let A be of order n and let H = Q^AQ be a unitary reduction of A to upper Hessenberg form.

and we may take which determines qk+i up to a factor of modulus one. • The theorem is sometimes called the implicit Q theorem. It is called the Arnoldi method and will play an important role in the second half of this volume. because it allows the determination of Q implicitly from its first or last column.. qk+i are orthonormal. Since \\qz\\2 = 1 we may take /i2i = H^gi . q^ have been determined. We will begin with a very general strategy and then fill in the details.3. qk- 3. Its proof is actually an orthogonalization algorithm for computing Q and H. ..£ is nonzero. the vector Aq\ — h\\qi is nonzero.. which determines q-2 uniquely up to a factor of modulus one. THE IMPLICITLY SHIFTED QR ALGORITHM We then have 117 Since /i2i ^ 0. 3.. .4) gives the relation Since q\.. For example. Since /&&+!. VolII: Eigensystems ... .4) we get To show that Q is uniquely determined by its last column we start with the relation and repeat the above argument on the rows of QH starting with the last. THE IMPLICIT DOUBLE SHIFT We now return to our preliminary algorithm and show how it may be modified to avoid the expensive computation of H2 — 2Re(«)# -f |«|2/. Then the fcth column of (3. Finally. Now suppose that qi... The algorithm itself has numerical drawbacks. Aqk — h\kq\ — hkkqk is nonzero.SEC. from the last column of (3. cancellation in the computation of Aqk — hikqi hkkqk can cause the computed qk+i to deviate from orthogonality with qi.huqi ||. we have Moreover.

We will proceed in stages.6) are the computation of the first column of C and the reduction of Q^HQi to Hessenberg form. THE QR ALGORITHM Let K be a complex Francis shift of H. To establish that this algorithm works. up to scaling. Set 0 = 0o0i.2Re(»# + \K\2I then H = QTHQ is the result of applying two steps of the QR algorithm with shifts K and R. We have seen above that if we compute the Q-factor Q of the matrix H2 . if H is unreduced. Thefirstcol umn of the reducing transformation is proportional to q. The first column of QoQi is proportional to q. The key computations in (3.2 RQ(K)H + |Ac|2/ requires that we first compute the scalars 2 Re(«) and |«|2.118 A general strategy CHAPTER 2. The following is a general algorithm for simultaneously determining Q and H. Hence by the implicit Q theorem (Theorem 3. partition the QR factorization of C in the form Then c = pq. The first column of QQ and the first column of Q are.21)]. Call the accumulated transformations Q\. We now turn to the details. then Q = QoQi and H = (QoQi)TA(Q0Q1). Let QQ be a Householder transformation such that Q^c = cre\. This is because QI&I — ej [see (2. 1. By construction the matrix (QoQi )r£H(QoQi) is Hessenberg. It turns out that because H is upper Hessenberg we can effect the first calculation in 0(1) operations and the second in O(n2) operations. then from the relation c — O-QQGI we have c = aq(°\ Hence q and q^ are proportional to c. Setjy 1 =0H40o. Determine the first column c of C = H2-2 Re(K)H+\K\2L 2. To do this we do not need to compute K itself. To see this. 4. 3. 5. For it is easily verified that . Now if we partition Q0 — (g(°)0i°^). Use Householder transformations to reduce H\ t upper Hessenberg form H. 3. Getting started The computation of the first column of C = H2 . 1. we must show that H and Q are the matrices that would result from two steps of the QR algorithm with shifts K and K.3). the same. 2.

Note that we may assume that h21 is nonzero. but between certain rows and columns of H. Algorithm 3. 3. when the problem deflates. For more on this. Then Thus the formulas in Algorithm 3.8) of d. Hence the scaling factor s in statement 2 is well defined. In practice. in effect.7) of t and (3. multiplies c by the scaling factor. but with the specific elements of H used to generate c.1 calculates the vector u generating the Householder transformation to start a doubly shifted QR step. we obtain the following formula for c: Since two proportional vectors determine the same Householder transformation. They are given by Thus the first three components of the first column of c are given by Inserting the definition (3. only the first three components of the first column of H2 are nonzero.SEC. THE IMPLICITLY SHIFTED QR ALGORITHM and 119 Because H is upper Hessenberg. the QR step will not be applied to the entire matrix. The particular form of the formulas has been chosen to avoid certain computational failures involving small off-diagonal elements. For this reason we do not invoke the algorithm with the matrix H.7) and (3. The cure is to scale before calculating c. see the notes and references.1 generate a vector u that starts the iteration with shifts of A and p. we can ignore the multiplicative factor h^\ in the definition of c. suppose that the matrix (3. which does not affect the final Householder transformation. for otherwise the problem would deflate. VolII: Eigensystems . This. The formulas for the components of h^c involve products of the elements of H and are subject to overflow or underflow. An important consequence of our substitution of t for 2 RC(K) and d for \K\2 is that our algorithm works even if the Francis double shifts are real.8) has real eigenvalues A and u. Specifically.

THE QR ALGORITHM This program generates a Householder transformation that reduces the first column of C = H2 .|/z72|.\hnln\. 7. hnnl = s*hnnl. h!2 = s*h!2. 3.|/i22|. hnnl. 9. 6. v] end startqrstep 4. hnn — s*hnn p = hnn — hll q = hnlnl . hnln. hnlnl = s*hnln].2Re(«)# + \K\2I to a multiple of e^ 1. 5. h22 = s*h22\ h32 = s*h32.\hnnl\. \hnlnl\. postmultiplication by RQ operates on only the first three columns of H so that after the complete transformation has been applied. h21 = s*/*27.\hnn\} hll = s*/*77. u) s = l/max{|/z77|.hll r = h22 . startqr2step(hll.120 CHAPTER 2.|/z27|. so that RoH has the form \ / Similarly. w. h!2. hnln = s*hnln. 2.|/z32|. hnlnl. Algorithm 3.hll (p*q-hnnl*hnln)/h21 + h!2\ r-p-q c h32 J housegen(c. Then premultiplication by RQ acts only on the first three rows of H. h22. h32. h21. we get the matrix .1.1: Starting an implicit QR double shift o Reduction back to Hessenberg form Let RQ denote the Householder transformation corresponding to the vector u returned by Algorithm 3. hnn. ( 8.

But here only two of the elements are nonzero. This can be done with a 2x2 Householder transformation or a plane rotation RA: Note that the transpose in the above diagram is necessary if R3 is a rotation. and we can use a Householder transformation RI that acts only on rows and columns two through four. since there is only one element to annihilate. In the usual reduction to Hessenberg form we would annihilate all the elements below the first subdiagonal. Algorithm 3. The result of applying this transformation is We now continue with a transformation R^ that annihilates the hatted elements in this matrix to give Yet another transformation R3 annihilates the hatted elements in the third column: The last transformation is a special case.SEC. which is symmetric. 3. THE IMPLICITLY SHIFTED QR ALGORITHM 121 We must now reduce this matrix to Hessenberg form.2 implements a double implicit QR shift between rows and columns Vol II: Eigensystems . it makes no difference when #3 is a Householder transformation.

rotgen(H[i2-l. The transformations are applied between rows il and 12 and are accumulated in a matrix Q. rotapp(c. VT = uJ-*H[i:i+2. -y = Q[l:rz.i2-l:n]) 16. end qr2step Algorithm 3. rotapp(c. nw) fi 12. 3. /2]) 18. H[i2. 1. Q[l:«. i7.ff[l:iw.j:n] . w. i2. i]. s.2: Doubly shifted QR step o .1 (startqrstep). i:z+2]-v*wT 11. i2-l].j:n] 5. qr2step(H. H[i:i+2. 12-2].H"[z'+l:z'+3. fi 13.U*VT 6. Q[l:n. J5T[l:i2. Q) 2. rotapp(c.122 CHAPTER 2. #[/2-l. H[l:iu.j] = H[i+2J] = 0. <^[l:n.*:*+2]*u 8. J7[i"2. /2-1].j:n] = H[i:i+2. for i = il to /2-2 3. v = . J3"[l:/2. i:i+2] = H[l:iu. *:»+2] = Q[l:n. z:i+2]*w 10. 5. if (t ^ i7) ff[i+l. o+2]-v*ttT 9. w. /2]) 17. this program applies utoH and then reduces H to Hessenberg form. z'2} 7. if (z / /2—2) /i0ws0gew(. c. THE QR ALGORITHM Given an upper Hessenberg matrix H and the vector u from Algorithm 3.z7} 4. iu = min{i-f 3. j = maxjz—I. end for 2 14. 5) 15.i'2-l:n].i2-2].

SEC. There are several comments to be made about the algorithm. Deflation The behavior of the doubly shifted QR algorithm once asymptotic convergence has set in depends on the eigenvalues of the matrix which determines the shift. It differs from Algorithm 2. The coding is a bit intricate.6 in recognizing deflation of both 1x1 and 2x2 blocks. we can start a QR step at a point where two consecutive subdiagonals are small. Computing the real Schur form Algorithm 3. we must allow for the deflation of a 2x2 block at the bottom of the matrix. If we assume that il = 1 throughout the reduction and that it requires k iterations to compute an eigenvalue. As usual. If the eigenvalues are complex and nondefective. then /i n _i )n _2 converges quadratically to zero. if it contains real eigenvalues. even when they are not small enough to cause a deflation. it should be rotated into upper triangular form so that the eigenvalues are exhibited on the diagonal.n rnay show a slow convergence to zero so that we must allow for deflation in the middle of the matrix. where k' is the number of iterations to find an eigenvalue. It is easy to do an operation count on this algorithm. THE IMPLICITLY SHIFTED QR ALGORITHM 123 il and i2 of H.4 reduces a real Hessenberg matrix to real Schur form. As with the singly shifted algorithm. If the eigenvalues are real and nondefective. see the notes and references. with the result that: Algorithm 3. • We have already observed that the routine qr2step requires 6n2 (12—il) flam.n converge quadratically to zero. A 2 x 2 block may require additional processing. the subdiagonal elements other than /i n _i >n _2 and /in-i. the explicitly shifted QR algorithm (Algorithm 2. both the /i n _ 1>n _ 2 and /in-i.2 requires 6n(i2—il) flam. At the very least. 3. Algorithm 3.3 implements a backsearch for deflation. then the total operation count is On the other hand.8) requires k'n3 complex flrots. For more on this topic. It helps to think of i as pointing to the row containing the subdiagonal at which the current Householder transformation starts. since there are special cases at both ends of the loop on i. In terms of real arithmetic this is Vol II: Eigensystems . In either case.

1 < il < i2 < I. Algorithm 3. in which case no two consecutive subdiagonal elements of H in rows 1 through t are nonzero. 1 = il — i2.124 CHAPTER 2. 1.3: Finding deflation rows in a real upper Hessenberg matrix . i2 < t such that one of the following two conditions hold. THE QR ALGORITHM This algorithm takes a Hessenberg matrix H of order n and an index t (1 < t < n) and determines indices il. In processing 2x2 blocks all transformations are accumulated in Q. 2. in which case the matrix deflates at rows il and il.

The similarity transformation is accumulated in Q. The routine gives an error return if more than maxiter iterations are required to deflate the matrix at any eigenvalue.4: Real Schur form of a real upper Hessenberg matrix Vol II: Eigensystems . 3.SEC. Algorithm 3. THE IMPLICITLY SHIFTED QR ALGORITHM [25 Given a real upper Hessenberg matrix H. hqr overwrites it with a orthogonally similar quasi-triangular matrix.

We will assume that the eigenvalues of a diagonal block are distinct from those of the other diagonal blocks. Rather than present a complete program. the superiority of the double shift algorithm is clear. In fact. works in the complex field. which would necessarily be lengthy. It will be sufficient to illustrate the procedure on a quasi-triangular matrix T consisting of three blocks. The singly shifted algorithm.9. Eigenvectors The eigenvectors of the original matrix A can be found by computing the eigenvectors of the Schur form T produced by Algorithm 3. since each iteration of the double shift algorithm is equivalent to two steps of the single shift algorithm.A). • Like the single shift algorithm. This is yet another argument for computing the real Schur form.4 are both backward stable that they produce essentially the same matrices.8 and 3. The method is a generalization of the back-substitution process of Algorithm 2. We begin with the case of a real eigenvalue. This is just the usual back-substitution algorithm. on the other hand. we will sketch the important features of the algorithm. For example. since the real Schur form is real.126 CHAPTER 2. THE QR ALGORITHM Even if k — k'. Thus the problem of finding the eigenvectors of the original matrix A is reduced to computing the eigenvectors of a quasi-triangular matrix T.4 and transforming them back using the orthogonal transformation Q. • It should not be thought that because Algorithms 2. one would expect that k < k'.) From the second row of (3.9) we have whence £22 = -723/(r22 . the double shift algorithm is numerically stable in the usual sense. then we have the equation rp (The reason for putting A to the right of the eigenvector will become clear a little later. Suppose T has the form If we seek the eigenvector corresponding to A = r^ in the form (x j f 2 1) . and the imaginary parts of a pair of complex eigenvalues can drift away from conjugacy — very far away if the eigenvalue is ill conditioned. the doubly shifted algorithm produces a matrix whose complex eigenvalues occur in conjugate pairs. .

The system is nonsingular because by hypothesis A is not an eigenvalue of T\\.2) and (3.SEC. For a general quasi-triangular matrix T the back-substitution proceeds row by row as above. This system is nonsingular because T22 is not an eigenvalue of L. then the process reduces to the usual back-substitution.9) we have 127 Hence we may obtain x\ as the solution of the linear system (T\\ — \I}xi = —tis — £2*1 • Here we have deviated from the usual back-substitution algorithm: the two components of the eigenvector in x\ are obtained by solving a 2x2 linear system.10) we have Hence we may compute zj asthesolutionofthe2x2systemorj(r22-i) = -t^A'ss. On the other hand. If the row in question has a scalar on the diagonal. then two components of the eigenvector are obtained by solving a system of order 2. The case of a complex conjugate pair of eigenvalues is sufficiently well illustrated by the matrix We will look for an "eigenvector" (later we will call it an eigenbasis) in the form for some matrix L of order 2. We begin by letting X$ be nonsingular but otherwise arbitrary. We already have two examples of such block eigenvectors in (3. 3.10) we have Vol II: Eigensystems . From the third row of (3. we have from which it follows that Since L is similar to T^ it contains the complex conjugate pair of eigenvalues of T^. THE IMPLICITLY SHIFTED QR ALGORITHM From the first row of (3. From the second row of (3.3).10). if the diagonal block is 2x2. From the third row of (3.

The principal advantage of this choice is that it does not require any computation to form L. which is simply ^33. if actual eigenvectors are desired. which we can solve for AI because 1 n and L have no eigenvalues in common (see Theorem 1. for a complex eigenvalue A they write out the relation Ax = Ax and backsolve as in the real case but compute the real and imaginary parts explicitly in real arithmetic. It is easy to cast them into columnoriented form in the spirit of Algorithm 2.2)]. 3. A natural choice is X$ — I—the generalization of the choice £3 = 1 for an ordinary eigenvector. For larger matrices the process continues. Arguing as in the proof of Theorem 3. where x3 = y3 + ^3 is the right eigenvector of TSS. In the next section we will show how to use implicit shifting to solve the generalized eigenvalue problem and in the next chapter will apply it to the singular value decomposition. we have where fj. Moreover. we have presented only row-oriented versions of the algorithms. The idea of an implicit shift has consequences extending beyond the computation of a real Schur form.. However. there is a single shift variant. There is also a multishift variant (see below).9. THE QR ALGORITHM This is a Sylvester equation.1 [see (3. NOTES AND REFERENCES Implicit shifting The ideas in this section. We now turn to the choice of X3.16. it is better to take X$ = (y^ 23). This sketch leaves open many implementation details—in particular. Chapter 1). ± iv are the eigenvalues of Tys. Although we have focused on the double shift version.128 CHAPTER 2. The major packages adopt a different approach that is equivalent to our second choice of T^. each row of the matrix yielding either a 2 x 2 system or a Sylvester equation for the corresponding part of X. Thus our choice generates the real and imaginary parts of the desired eigenvector. . from the real Schur form to the implicit double shift. which we will use to compute eigenvalues of a symmetric tridiagonal matrix. are due to Francis [76]. Specifically.4. how to solve the linear systems and how to scale to prevent overflow.If we then compute X as described above and partition This relation implies that y ± iz are the right eigenvectors of T corresponding to the eigenvalues // ± ip. which is implemented in the LAPACK routine xHSEQR.

and the purpose of this section is to describe it. This algorithm and a new deflation procedure is treated by Braman. it should be split if it has real eigenvalues. The LAPACK routine XLANV2 does this. Although the generalization results from the trivial replacement of an identity matrix by an arbitrary matrix B. Moreover. If this process is continued. Watkins [290] has shown that if the number of shifts is great they are not transmitted accurately through the bulge chasing process. say. the bulge from a 2 x 2 shift is chased two steps down the diagonal then another double shift can be started after it. the generalized eigenvalue problem has the equivalents of a Hessenberg form and a Schur form. In spite of the differences between the two problems. Km. suppose that we have chosen m shifts KI. which will have m + 1 leading nonzero components. We then determine an (ra+l)x(ra+l) Householder transformation QQ that reduces c to ei and then apply it to A. Byers. We compute the first column c of (A . 4. It also transforms a block with complex eigenvalues into one in which the diagonal elements are equal. can take advantage of level-3 BLAS—the larger m the greater the advantage. The problem reduces to the ordinary eigenvalue problem when B = I—which is why it is called a generalized eigenvalue problem. An alternative is to note that if.SEC. and Mathias [29]. adjacent bulges being chased down the matrix. when we treat the symmetric eigenvalue problem. The resulting algorithm is called the QZ algorithm.Kml) . THE GENERALIZED EIGENVALUE PROBLEM Processing 2x2 blocks 129 When a 2 x 2 block has converged. which is chased down out of the matrix in a generalization of Algorithm 3.2. Vol II: Eigensystems . For example. The matrix QoAQo has an (m+l)x(m+l) bulge below its diagonal.• • (A «i/). . The paper also contains an extensive bibliography on the subject. Unfortunately.. the problem has many features not shared with the ordinary eigenvalue problem.. We can also use more than two shifts. a generalized eigenvalue problem can have infinite eigenvalues. Specifically. For example.. 4. we will use single shifts. This algorithm. THE GENERALIZED EIGENVALUE PROBLEM The generalized eigenvalue problem is that of finding nontrivial solutions of the equation where A and B are of order n. the result is a sequence of small. proposed by Bai and Demmel [9] and implemented in the LAPACK routine xHSEQR. The shifts can be computed as the eigenvalues of the trailing mxm submatrix. the QR algorithm can be adapted to compute the latter. The multishifted QR algorithm There is no need to confine ourselves to two shifts in the implicitly shifted QR algorithm.

called Hessenberg-triangular form. We will treat the reduction to Hessenberg-triangular form in §4. B) (also written A . eigenpairs of the generalized problem are assumed to be right eigenpairs unless otherwise qualified. THE THEORETICAL BACKGROUND In this subsection we will develop the theory underlying the generalized eigenvalue problem. 4. we only sketch the outlines. including Schur forms and perturbation theory.1. x) is an EIGENPAIR or RIGHT EIGENPAIR of the PENCIL (A. or) is an eigenpair of A. and then iterates to make both pairs triangular. Definitions We begin with the definition of generalized eigenvalues and eigenvectors.2 and the QZ iteration in §4. Let A and B be of order n. Ax = \Bx.3. if . Like the QR algorithm. This elementary fact will be useful in understanding the material to follow. x ^ 0. This difference has important consequences. provided we agree that the direction of the zero vector matches that of any nonzero vector.1. 2. It is important to keep in mind that the geometry of generalized eigenvectors differs from that of ordinary eigenvectors. B) ify ^ 0 and yHA = R Xy B. For reasons of space. Definition 4. or) is an eigenpair of the pencil (A. the direction of x is not necessarily preserved under multiplication by A or B. As with the ordinary eigenvalue problem. On the other hand. Regular pencils and infinite eigenvalues An important difference between the generalized eigenvalue problem and the ordinary eigenvalue problem is that the matrix B in the former can be singular whereas the matrix / in the latter cannot. Instead. the QZ algorithm begins by reducing A and B to a simplified form.130 CHAPTER 2.e. the direction of x remains invariant under multiplication by A. it is possible for any number A to be an eigenvalue of the pencil A . For example.\B) if 1. y) is a LEFT EIGENPAIR of the pencil (A. the same. If (A. B). THE QR ALGORITHM We begin in the next subsection with the theoretical background. if (A. The scalar A is called an EIGENVALUE of the pencil and the vector x is its EIGENVECTOR.. We then turn to the details of the QZ algorithm. The pair (A.\B. The reader is referred to the notes and references for more extensive treatments. In the first place if B is singular. then A maps x into a multiple of itself—i. The pair (A. the direction of both Ax and Bx are.

B) be a matrix pencil and let U and V be nonsingular. Thus if p(X) is constant. Definition 43. For example. then Bx = 0 for some x ^ 0. B) is regular. which are eigenvalues of (A. An even more general example is the pencil in which A and B have no null vectors. x) is an eigenpair of (A. Equivalence transformations We have already observed that the natural transformations for simplifying an ordinary eigenvalue problem are similarity transformations. we will appeal to a generalization of the Schur form (Theorem 4. this can only occur if B is singular. If the pencil is regular. what is the multiplicity of an infinite eigenvalue? To answer this question. If B is singular. The possible existence of infinite eigenvalues distinguishes the generalized eigenvalue problem from the ordinary one. Hence for any X. the pencil must have an infinite eigenvalue corresponding to the vector x. and zero is an eigenvalue of (J5. URBV) is obtained from (A. They preserve the eigenvalues of a matrix and change the eigenvectors in a systematic manner. then p(X) is not identically zero and hence has m zeros. Let (A. More generally. It should be stressed that infinite eigenvalues are just as legitimate as their finite brethren. then (A. J?) for any A. Now if A / 0 is an eigenvalue of a pencil (A. Vol II: Eigensystems . Definition 4.XB) is not identically zero. we say that(UH AV. in common. To see this. Alternatively. B). This suggests the following definition. among others.XB) is a polynomial of degree m < n. We also say that the pair (A. in which case it appears that the pencil has no eigenvalues.XB) = 0. note that Ax = XBx (x ^ 0) if and only if det(A . UEBV) is said to be EQUIVALENT to(A. Then the pencil (UHAV. B) is REGULAR if det( A . The effect of a equivalence transformation on the eigenvalues and eigenvectors of a matrix pencil is described in the following theorem. whose proof is left as an exercise. Ae2 = XBe%. However. A regular matrix pencil can have only a finite number of eigenvalues. A matrix pencil (A. 4.SEC. It is possible for p(X) to be constant. B). The corresponding transformations for the generalized eigenvalue problem are called equivalence transformations. It is therefore natural to regard oo = 1/0 as an eigenvalue of (A. In all these examples the determinant of A — XB is identically zero. right or left. B). However.2. But first we must describe the kind of transformations we will use to arrive at this generalized Schur form. A) and vice versa. then /x = A"1 is an eigenvalue of (5. A).5). THE GENERALIZED EIGENVALUE PROBLEM 131 then Ae2 = Be^ — 0. Now p(X) = det(A . B). B) by an EQUIVALENCE TRANSFORMATION. if A and B have a common null vector x. infinite eigenvalues do present mathematical and notational problems.

Let (A. Equivalence transformations also leave the polynomial p(X) . Let (A.6. Specifically. we have the following theorem. Let v be an eigenvector of (A. Moreover. like the Jordan form. the roots of p( A) and their multiplicity are preserved by equivalence transformations. and the transformations that produce it can be ill conditioned (see §1. THE QR ALGORITHM Theorem 4. is not numerically well determined. at least one of the vectors Av and Bv must be nonzero — say Av. Then The (2.det(A . l)-block of the matrix on the right is zero because by construction Av is orthogonal to the column space of £/j_.B).IfU and V are nonsingular. if we restrict ourselves to unitary equivalences. For example. Similarly. which are well conditioned. Since U and V are nonsingular. Fortunately. if Bv ^ 0 then it must be proportional to Av.132 CHAPTER 2. Theorem 4. and let (v V±) be unitary. Unfortunately.UHBV). • We now turn to four consequences of this theorem. B) is regular. Proof. y) be left and right eigenpairs of the regular pencil (A. V~lx) and (A. Since the (A. Then there are unitary matrices U andV such thatS — UHAV andT = UHBV are upper triangular. x) and (A. Let u — Av/||Av||2 and let (u U_L) be unitary. this form. Chapter 1). so can we simplify a pencil by equivalence transformations. The proof is now completed by an inductive reduction of (A. we can reduce the matrices of a pencil to triangular form. there is an analogue of the Jordan form for regular pencils called the Weierstrass form. In particular.4. Generalized Schur form Just as we can reduce a matrix to simpler forms by similarity transformations. then (A. the equivalence transformation multiplies the p( A) by a nonzero constant. B) be a regular pencil. B) to triangular form. B) normalized so that || v\\2 = 1. U~ly) are eigenpairs of (UEAV.5 (Generalized Schur form). In fact. .XB) essentially unaltered.

This argument justifies the following definitions. to be in the first position. We say that the ALGEBRAIC MULTIPLICITY of a finite eigenvalue of (A. The former correspond to the finite eigenvalues of (A. the degree of p^A^ would be too large.2) and (4. B) has an INFINITE EIGENVALUE OF ALGEBRAIC MULTIPLICITY n-m. In either case we can find an eigenvector corresponding to the second eigenvalue. B must be singular.5.7. THE GENERALIZED EIGENVALUE PROBLEM Multiplicity of eigenvalues 133 Let (A. then we say that (A. for otherwise. if the second eigenvalue is infinite. Specifically.A B) = 0. In Theorem 4. After the first step of the reduction. Thus we have established the following corollary. B).SEC. Definition 4. the eigenvalues of (A.e. 4. B). Now the degree m of p is invariant under equivalence transformations [see (4. Thus any generalized Schur form will exhibit the same number of finite and infinite eigenvalues. I f m is the degree of the characteristic polynomial and m < n. while the latter correspond to the infinite eigenvalues. it must be an eigenvalue of (A. Let (A. then it must satisfy det(A . B) can be made to appear in any order on the diagonals of(S. B). VolII: Eigensystems . It also suggests how to put finite and infinite eigenvalues on the same notational footing.3) that where \u\ — 1. we have from (4. i. is the product of the determinants of the unitary transformations that produce the generalized Schur form). T). B) be a generalized Schur form of (A. Then where \u>\ = 1 (a. Then is the CHARACTERISTIC POLYNOMIAL of (A. Hence exactly m of the ba are nonzero and n—m are zero. B) be a regular pencil and let (A.6. B) to appear in any order in a generalized Schur form. Another consequence of the above argument is that we can choose the eigenvalues of (A. we can choose any eigenvalue. B) is its multiplicity as a zero of the characteristic equation.. An eigenvalue of algebraic multiplicity one is called a SIMPLE EIGENVALUE.1)].B)bea regular pencil of order n. Projective representation of eigenvalues The Schur form has enabled us to solve the mathematical problem of defining the multiplicity of generalized eigenvalues—finite or infinite. finite or infinite. If the second eigenvalue in the sequence is finite. Corollary 4. On the other hand. B).

The reduction is not possible when B is singular.(B + wA)x. and it has computational difficulties when B is ill conditioned.-). x).-. We can erase the distinction between the two kinds eigenvalues by representing them by the pairs (o^-. these eigenpairs have the same multiplicity. D).0) is an infinite eigenvalue.1} is a zero eigenvalue and (1. x) is an eigenpairof(A. If /?t-. B) be a pencil with B nonsingular.. We are not restricted to shifting B alone. since any nonzero multiple of the pair (an. THE QR ALGORITHM Let (A. this representation is not unique. 1). Moreover. Let (A. Theorem 4. Thus when B is nonsingular. .B}bea regular pencil and let be nonsingular. Then the equation Ax = \Bx can be written in the form B~lAx — \x. An ordinary eigenvalue A can be written in the form (A. /%.is nonzero. Generalized shifting Let (-4. Otherwise the eigenvalue is infinite. We solve this problem by introducing the set We will call this set the projective representation of the eigenvalue.8. we can reduce the shifted problem to the ordinary eigenvalue problem (B + wA)~lAx = \LX. as the following theorem shows. B) if and only if is an eigenpair of (C. If we can determine w so that B + wA is well conditioned. /?). B) are determined by the pairs (a. This is equivalent to writing the generalized eigenvalue problem in the form Unfortunately. whose eigenpairs are (A/(l + wA). We can shift A and B simultaneously. In this nomenclature (0.. Then the eigenvalues of (A. A = eta/Pa is a finite eigenvalue of A. fin). B) be a regular pencil reduced to Schur form. Let Then ((a. fin) represents the same eigenvalue.134 CHAPTER 2. To remedy this difficulty we can work with the shifted pencil Ax = jj. we can reduce the generalized eigenvalue problem to an ordinary eigenvalue problem—a computationally appealing procedure.

((7.D) back to (A. D). contradicting the simplicity of (a.9. as the following theorem shows. B) has the form Then x is a nonzero multiple of ei. ft} be a simple eigenvalue of the regular pencil (A. the right and left eigenvectors of the pencil corresponding to the eigenvalue 1 are ei and 62. D} is also in Schur form with (7. The converse is shown by using the nonsingularity of W to pass from (C. For example. Now the first component of y is nonzero. Hence t/H£ / 0. THE GENERALIZED EIGENVALUE PROBLEM 135 Proof. D).Nonetheless. B) is in generalized Schur form with the pair (a. ft}. and it follows from (4. B). so that the vector ei is the corresponding eigenvector. then Consequently.5) we may pass to the Schur form and assume that the pencil (A. ft). Chapter 1]. Consequently.SEC. there is an generalized form of the Rayleigh quotient.12). For otherwise the vector y consisting of the last n-\ components of y would be a left eigenvector of the pencil (A.8) that Vol II: Eigensystems .6) ^ (0.6) = (a.0). Theorem 4. B). Proof. Let (a. The statement about multiplicities follows from the fact that if an eigenvalue appears m times on the diagonal of (A.ei) is an eigenpair of (C.<$). To establish (4. ft) in the initial position. (3)W in the initial position. This allows us to calculate the eigenvalue in the form of a Rayleigh quotient yHAx/yHx. Likewise. Since W is nonsingular (7. B) then by the nonsingularity of W the transformed eigenvalue appears exactly m times on the diagonal of (C. Assume without loss of generality that (A. The left and right eigenvectors of a simple eigenvalue of a pencil can be orthogonal. 4. B). Ifx and y are right and left eigenvectors corresponding to (a. Then (C. • Left and right eigenvectors of a simple eigenvalue We have seen that if A is a simple eigenvalue of a matrix A then its left and right eigenvectors cannot be orthogonal [see (3.

7) is established similarly. B). THE QR ALGORITHM To establish (4. • Since a — yHAx and ft = y^Bx. we know that there exist perturbed eigenvectors x and y satisfying . The implication (4. ft). we can assume that B is nonsingular. Since Ax is proportional to Bx.6). and specifically to the perturbation of a simple eigenpair ((a. since B plays the role of the identity in the Rayleigh quotient for the ordinary eigenvalue problem. /?}. F) on the eigenvector x.136 CHAPTER 2.x) of (A. This theorem will also prove important a little later when we consider the perturbation of eigenvalues of matrix pencils. Then by (4. Chapter 1. Again. B) that converges to the eigenpair ((a.5) and the regularity of the pencil. we must have ?/H A — 0. we drop the Schur form and suppose that yH Ax = 0. ft). y^Bx ^ 0. Hence the problem can be reduced to the ordinary eigenvalue problem Similarly. Perturbation theory We now turn to perturbation theory for the generalized eigenvalue problem. Let be the perturbed pencil. we must have Ax = 0. we have for the left eigenvector y From the perturbation theory of inverses and Theorem 3. We begin with a qualitative assessment of the effects of the perturbation (E. We also wish to know how close the perturbed eigenpair is to the original as a function of e.x) of a regular pencil (A. Let us agree to measure the size of this perturbation by We wish to know if there is an eigenpair ({a. since yH A is proportional to y^B.13. we can write the ordinary form of the eigenvalue as This expression is a natural generalization of the Rayleigh quotient. By shifting the pencil. x) as e —» 0.

6) we have u^x ^ 0. Let x and y be simple eigenvectors of the regular pencil (A. Hence we can write x = -yx + Uc for some 7 and c. Hence renormalizing x by dividing by 7 and setting e = f/c/7. Theorem 4. The key to the proof lies in writing the perturbed eigenvectors x and y in appropriate forms. For defmiteness assume that UH = t/ H A / 0.SEC. not both yH A and y^B can be zero. THE GENERALIZED EIGENVALUE PROBLEM If we normalize x.10.] Thus the projective representation of the eigenvalues converges at least linearly in e. 4. First-order expansions: Eigenvalues We now turn to first-order expansions of a simple eigenvalue. y. Then by (4. Now since x —>• x [see (4. we may write Similarly we may write Now Vol II: Eigensystems . Since the pencil is regular. Then Proof. /? -f O(e)).B).y^Bx} be the corresponding eigenvalue. This implies that if U is an orthonormal basis for the orthogonal complement of u then the matrix (x U)is nonsingular. x.9) implies that For the eigenvalue we have from (4. y so that 137 then (4.5) that [Here the notation (a. (3} = (y^Ax.10)]. (3) + O(e) is an abbreviation for the more proper (a + 0(e). we must have 7 —>• 1. We will use the notation established above. and let (a.

We therefore turn to a different way of measuring the accuracy of generalized eigenvalues. we have by the Cauchy inequality rp Hence.11.e. Denoting this angle by 0. ft) and (7. • The function x is a metric. • Since ||a. ft) and (7. • If/? and S are nonzero and we set A = a/ft and p = 7/8. after some manipulation This justifies the following definition. the process is tedious and does not give much insight into what makes an eigenvalue ill conditioned. then .11) says that up to second-order terms a and ft lie in the intervals [a . /3) is the one-dimensional subspace of C 2 spanned by the vector (a ft) . Definition 4. CHAPTER 2. S) is the number There are several comments to be made about this definition and its consequences.21). Chapter 1.4) we see that the projective representation (a. We can convert these bounds into bounds on the ordinary form A of the eigenvalue. This generalizes the Rayleigh-quotient representation of A in (3. It is therefore natural to measure the distance between two eigenvalues (a. Two observations on this theorem. S) by the sine of the angle between them. it is definite and satisfies the triangle inequality. The chordal metric From (4. However. that is. equation (4. ft + €\. a + c] and \J3 — e. THE QR ALGORITHM The expression (4.138 Similarly. The CHORDAL DISTANCE between (a.11) can be written in the form Hence if A is finite.||2 = \\y\\2 = 1.

SEC. 4. x(0. THE GENERALIZED EIGENVALUE PROBLEM 139 Thus x defines a distance between numbers in the complex plane. the angle between two projective points is preserved. /u). in which we recapitulate a bit. the chordal metric is not invariant under such shifts. In particular. we have from which we get But the numerator of this expression is Hence We summarize our results in the following theorem. Hence for eigenvalues that are not too large. the chordal metric behaves like the ordinary distance between two points in the complex plane. and the chordal distance between them remains unchanged. oo) = 1. the eigenvalue (a. In general. • In Theorem 4. • The name chordal distance comes from the fact that x( A. ft) was shifted into {(a. By abuse of notation we will denote this distance by x( A. By a limiting argument. Vol II: Eigensystems . ft)W). we saw how to shift a pencil and its eigenvalues by a nonsingular matrix W of order two. But if W is a nonzero multiple of a unitary matrix. The condition of an eigenvalue Turning now to our first-order expansion. Specifically. we can write Thus the chordal distance includes the point at infinity. //) is half the length of the chord between the projections of A and ^ onto the Riemann sphere.8.

where Let A .I and F = 0. . then v is not large and the eigenvalue is well conditioned.15) it is independent of the scaling of x and y. Let the projective representation of A be (a. By (4. if one of a or j3 is large. However. y). which means that the subspace (a.A + E and B = B + F. Hence. we may have j3 = 0. it remains in a small neighborhood of infinity on the Riemann sphere. • The number v mediates the effects of the error £ on the perturbation of the eigenvalue and is therefore a condition number for the problem. This agrees with common-sense reasoning from the first-order perturbation expansion If both a and {3 are small. • On the other hand. and set Then fore sufficiently small. Such are the regularizing properties of the chordal metric./?). • The bound does not reduce to the ordinary first-order bound when B . so that the eigenvalue A is infinite.B) and let x and y be its right and left eigenvectors. J3) is ill determined. if we normalize A so that \\A\\2 = 1 and normalize x and y so that yHx — 1. they are sensitive to the perturbation E and F.12.140 CHAPTER 2. THE QR ALGORITHM Theorem 4. B) satisfying where There are five comments to be made on this theorem. • If ||ar||2 = \\y\\2 = 1» the condition number v is large precisely when a and ft are both small. Let Xbea simple eigenvalue (possibly infinite) of the pencil (A. then |A| < 1 and ||ar||2|b||2 = sec Z(x. In this case. there is an eigenvalue A the pencil (A. although A moves infinitely far in the complex plane. In particular.

such as rounding or measurement error. B). and x must be an eigenvector. THE GENERALIZED EIGENVALUE PROBLEM Moreover. Conversely. • Unfortunately. then (u U) (A. Perturbation expansions: Eigenvectors To derive a perturbation expansion for a simple eigenvector x of regular eigenpair (A. we appeal to the first step of the reduction to Schur form [see (4. If we perturb it to get the pencil then Thus the eigenvalues of the perturbed pencil are 1 ± ^/e. ill-conditioned eigenvalues can restrict the range of perturbations for which the first-order expansion of a well-conditioned eigenvalue remains valid. Consider the pencil where e is small. However. bumping into other eigenvalues and destroying their accuracy. IT Vol II: Eigensystems . if such a U exists. It is obvious from the example that only a very special perturbation can undo a well-conditioned eigenvalue. Example 4.16). B)(x X) must be of the form (4. 141 Hence. as the following example shows. Specifically.13. ife — 10 the "well-conditioned"eigenvalue 1 has changed by 10~5. 10 . 4.14). Such perturbations are unlikely to arise from essentially random sources. let (x X) and (u U) be unitary matrices such that The fact that we can choose U so UHAx = UHBx = 0 is a consequence of the fact that x is an eigenvector. The example shows that perturbations can cause an ill-conditioned eigenvalue to wander around. for the case \\A\\2 = 1 the results of ordinary perturbation theory and our generalized theory agree up to constants near one.2)]. In particular. this problem should not be taken too seriously.SEC. by (4.

sinceX and U are orthonormal. Chapter 1. Thus where The condition of an eigenvector To convert (4. We seek the perturbed eigenvector in the form x — (x -f Xp).aB is nonsingular. But the last equality following from (4. where q is also to be determined. J7H B x = UHFx+(3q+Bp. We also seek the corresponding matrix U in the form U = (U + uq^)(I + <jr# H )~ 2. the factor (/+<?<?H)~ 2./3) would be an eigenvalue of (A. /?). ||XH#||2 < 1. For otherwise (a. B}.142 CHAPTER 2. and we will drop it in what follows. B) = (A+E. B). Hence p and q must satisfy the equations We may eliminate the auxiliary vector q from these equations by multiplying the first by /3 and the second by a. note that we can multiply (4. B -f F). THE QR ALGORITHM Now let (A. we may assume that p and q are small. x) = ||J^Hx||2. which makes f/ orthonormal. differs from the identity by second-order terms in g. In particular. Moreover.12. Finally. up to second-order terms sin Z(z.18) to a first order-bound. where p is to be determined. Hence on taking norms we get: In the above notation suppose that and set Then . Similarly. and subtracting to get Now the matrix /3A . contradicting the simplicity of the eigenvalue (a. by Lemma 3. |ft\} — 1. For x to be an eigenvector of (A.16). Since we know that x converges to x as e = \j \\E\\p + \\F\\p —*• 0. we must have /7H Ax = U^Bx — 0.17) by a constant to make max{ | a |.

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM

143

The number <$[(a, /?), (A, B}] is analogous to the number sep(A, M) for the ordinary eigenvalue problem [see (3.16), Chapter 1]. It is nonzero if {a, /?) is simple, and it approaches zero as (a, /3} approaches an eigenvalue of (A, B). Most important, as (4.19) shows, its reciprocal is a condition number for the eigenvector corresponding to(a,/3).

**4.2. REAL SCHUR AND HESSENBERG-TRIANGULAR FORMS
**

We now turn to the computational aspects of the generalized eigenvalue problem. As indicated in the introduction to this section, the object of our computation is a Schur form, which we will approach via an intermediate Hessenberg-triangular form. The purpose of this subsection is to describe the real generalized Schur form that is our computational goal and to present the reduction to Hessenberg-triangular form. Generalized real Schur form Let (A, B} be a real, regular pencil. Then (A, B) can be reduced to the equivalent of the real Schur form (Theorem 3.1); i.e., there is an orthogonal equivalence transformation that reduces the pencil to a quasi-triangular form in which the 2x2 blocks contain complex conjugate eigenvalues. Specifically, we have the following theorem. Theorem 4.14. Let (A, B) be a real regular pencil Then there are orthogonal matrices U and V such that

and

The diagonal blocks ofS and T are of order either one or two. The pencils corresponding to the blocks of order one contain the real eigenvalues of(A,B). The pencils corresponding to the blocks of order two contain a pair of complex conjugate eigenvalues of(A,B). The blocks can be made to appear in any order. The proof of this theorem is along the lines of the reduction to generalized Schur form, the chief difference being that we use two-dimensional subspaces to deflate the complex eigenvalues. Rather than present a detailed description of this deflation, we sketch the procedure. Vol II: Eigensystems

144

**CHAPTER 2. THE QR ALGORITHM
**

1. Using Definition 4.6 and Theorem 4.8, argue that complex eigenvalues of a real pencil occur in conjugate pairs and that the real and imaginary parts of their eigenvectors are independent. 2. Let x = y + iz be an eigenvector corresponding to a complex eigenvalue and let X = (y z). Show that for some matrix L, we have AX — BXL. 3. Let(F V±) be orthogonal with K(V} = n(X). Let (U UL} be orthogonal withft(i7) = K(AV) U U(BV).

4. show that

where S and T are 2x2. By repeated applications of this deflation of complex eigenvalues along with the ordinary deflation of real eigenvalues, we can prove Theorem 4.14. Hessenberg-triangular form The first step in the solution of the ordinary eigenvalue problem by the QR algorithm is a reduction of the matrix in question to Hessenberg form. In this section, we describe a reduction of a pencil (A, B) to a form in which A is upper Hessenberg and B is triangular. Note that if A is upper Hessenberg and B is lower triangular, then AB~l is upper Hessenberg. Thus the reduction to Hessenberg-triangular form corresponds to the the reduction to Hessenberg form of the matrix AB~l. The process begins by determining an orthogonal matrix Q such that Q^B is upper triangular. The matrix Q can be determined as a product of Householder transformations (Algorithm 1:4.1.2). The transformation QT is also applied to A. We will now use plane rotations to reduce A to Hessenberg form while preserving the upper triangularity of B. The reduction proceeds by columns. Figure 4.1 illustrates the reduction of the first column. Zeros are introduced in A beginning at the bottom of the first column. The elimination of the (5, l)-element of A by premultiplication by the rotation $45 in the (4,5)-plane introduces a nonzero into the (5,4)-element of B. This nonzero element is then annihilated by postmultiplication by a rotation Z$± in the (5,4)-plane. The annihilation of the (4,1)- and (3, l)-elements of A proceeds similarly. If this process is repeated on columns 2 through n-2 of the pencil, the result is Algorithm 4.1. Here are some comments. • We have implemented the algorithm for real matrices. However, with minor modifications it works for complex matrices. • The computation of Q is special. Since Q is to contain the transpose of the accumulated transformations, we must postmultiply by the transformations, even though they premultiply A and B.

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM

145

Figure 4.1: Reduction to Hessenberg-triangularform

VolII: Eigensystems

146

CHAPTER 2. THE QR ALGORITHM

This algorithm takes a matrix pencil of order n and reduces it to Hessenberg-triangular form by orthogonal equivalences. The left transformations are accumulated in the matrix Q, the right transformations in the matrix Z.

Algorithm 4.1: Reduction to Hessenberg-triangularform

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM

147

• The operation count for the reduction of B is 2 |n3 flam. The reduction of A takes irot. Thus for real matrices, the algorithm requires

• If only eigenvalues of the pencil are desired, we need not accumulate the transformations. • The algorithm is backward stable in the usual sense.

**4.3. THE DOUBLY SHIFTED QZ ALGORITHM
**

Like the QR algorithm, the doubly shifted QZ algorithm is an iterative reduction of a real Hessenberg-triangular pencil to real generalized Schur form. Since many of the details — such as back searching—are similar to those in the QR algorithm, we will not attempt to assemble the pieces into a single algorithm. Instead we begin with the general strategy of the QZ algorithm and then treat the details. Overview Let (A, B) be in Hessenberg-triangular form and assume that B is nonsingular. Then the matrix C — AB~l is Hessenberg and is a candidate for an implicit double QR step. The idea underlying the QZ algorithm is to mimic this double QR step by manipulations involving,only A and B. In outline the algorithm goes as follows. [Before proceeding, the reader may want to review the outline (3.6) of the doubly shifted QR algorithm.] 1. Let H be the Householder transformation that starts a double shift QR step on the matrix AB~l. 2. Determine orthogonal matrices Q and Z with Qei — GI such that (A, B} = Q^(HA, HB)Z is in Hessenberg-triangular form. To see why this algorithm can be expected to work, let C = AB~l. Then

Moreover, since Qei = e i»the first column of HQ is the same as that of H. It follows that C is the result of performing an implicit double QR step on C. In consequence, we can expect at least one of the subdiagonal elements c njn _! and c n _ 1?n _ 2 to converge to zero. But because (A, B) is in Hessenberg-triangular form,

Hence if frn-i,n-i and 6 n _2, n -2 do not approach zero, at least one of the subdiagonal elements a n>n _i and a n -i,n-2 must approach zero. On the other hand, if either 6 n _i ?n _i or 6n_2,n-2 approaches zero, the process converges to an infinite eigenvalue, which can be deflated. Vol II: Eigensystems

148

CHAPTER 2. THE QR ALGORITHM

If repeated QZ steps force a n)n _i to zero, then the problem deflates with a real eigenvalue. If, on the other hand, a n _ 1>n _ 2 goes to zero, a 2x2 block, which may contain real or complex eigenvalues, is isolated. In either case, the iteration can be continued with a smaller matrix. Getting started We now turn to the details of the QZ algorithm. The first item is to compute the Householder transformation to start the QZ step. In principle, this is just a matter of computing the appropriate elements of C and using Algorithm 3.1. In practice, the formulas can be simplified. In particular, if we set m = n—1, the three components of the vector generating the Householder transformation are given by

In EISPACK and LAPACK these formulas are further simplified to reduce the number of divisions. The formulas are not well defined if any of the diagonal elements 611, 622* bmrn, or bmm is zero. For the case of zero &n, the problem can be deflated (see p. 153). If the others diagonals are too small, they are replaced by eM||.Z?||, where B is a suitable norm. The above formulas do not use the Francis double shift computed from the matrix C. Instead they shift by the eigenvalues of the the trailing 2x2 pencil of (A, B): namely,

This approach is simpler and is just as effective. The QZ step Having now determined a 3x3 Householder transformation to start the QZ step, we apply it to the pencil (.4, B) and reduce the result back to Hessenberg-triangularform. As usual we will illustrate the process using Wilkinson diagrams. Our original pencil

**SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM
**

has the form

149

If we premultiply this pencil by the Householder transformation computed above, we get a pencil of the form

The next step reduces B back to triangular form. First postmultiply by a Householder transformation to annihilate the elements with superscripts 1. We follow with a plane rotation that annihilates the element with superscript 2. This produces a pencil of the form

We now premultiply by a Householder transformation to annihilate the elements of A with superscripts 1. This results in a pencil of the form

Once again we reduce B back to triangular form using a Householder transformation and a plane rotation:

Vol II: Eigensystems

150

CHAPTER 2. THE QR ALGORITHM

When we repeat the procedure this time, we need only use a plane rotation to zero out the (5,3)-element of A. The result is a matrix that is almost in Hessenberg-triangular form:

A postmultiplication by a plane rotation annihilates the (5,5)-element of B without affecting the Hessenberg form of A. Since the QZ algorithm, like the QR algorithm, works with increasingly smaller matrices as the problem deflates, the sweep will be applied only to the working matrix. Algorithm 4.2 implements a sweep between rows il and i2. In studying this algorithm, keep in mind that k—1 points to the column of A from which the current Householder transformation is obtained (except for the first, which is provided by the input to the algorithm). Also note that the Householder transformation that postmultiplies (A, B) reduces the three elements of B to a multiple of ej —not ei as housegen assumes. We therefore use the cross matrix

in statements 10 and 12 to reverse the order of the components. There are four comments to be made about this algorithm. • An operation count shows that the sweep requires

Thus a QZ sweep is a little over twice as expensive as a doubly shifted QR sweep. • The extra rotations in (4.21) are due to the necessity of eliminating 62 in (4.20). An alternative is to premultiply by a Householder transformation to eliminate the elements 61 in the pencil

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM

151

Given a real pencil (A, B) in Hessenberg-triangularform, and a vector u that generates a starting Householder transformation, qz2step implements a QZ sweep between rows il and /2. 1. <7z2ste/?(A, £, tt, i7, i2, Q, Z) 2. for jfc = il to /2-2 3. if(k^il) 4. fa?Msegew(egen A[fc:fc+2, fc—1], w, z/) 5. A[Ar,fe-lJ=i/;A[Jb+l:fc+2,Jt-l] = 0 6. end if

7.

8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.

VT = uT*B[k:k+2,k:n]; B[k:k+2,k:n] = B[k:k+2,k:n] - UVT v = Q[:, Ar:fc+2]*w; Q[:, k:k+2] = Q[:, k:k+2] - VUT housegen(B[k+2,k:k+2]*f, tiT, i/) ,B[Ar+2,A;+2] = i/;5[Jfe+2,A?:fe+l] = 0 WT = wT*f v = jB[l:A?-fl, k:k+2]*u; B[l:k+l, k:k+2] = B[l:k+l, k:k+2] - vu1 Jbk = min{A?+3,/2} t; = A[l:Jbk,fe:A;+2]*w;A[likk, k:k+2] = A[l:kk,fc:fc+2]- W1 v = Z[:, A;:A;+2]*w; Z[:,fc:A;+2]= Z[:, A;:fc+2]-vttT ro/^/i(5[A:+l,A;+l],k+1], 5[fc+l,Ar],c,5) rotapp(c, s, B[l:k, k+I], B[l:k, k}) rotapp(c, s, A[l:kk, jfc+l], A[l:Jbfe, fc]) rotapp(c,s,Z[:,k+l],Z[:,k]) end for A; rotgen(A[i2-l,i2-2], A[i2,i2-2], c, s) rotapp(c, s, A[/2-l, i2-l:z2], (A[i2,i2-l:i2]) rotapp(c, s, 5[i2-l, /2-l:i2], (J?[i2, i2-l:i2]) w/flpp(c» 5' Qt,*2-l], Qt,*2]) rotgen(B[i2,i2], B[i2,i2-l]1c,s) rotapp(c, s, 5[l:i2-l,i2], 5[l:/2-l,i2-l]) rotapp(c, s, A[l:i2, i'2], A[l:/2, i2-l]) rotapp(c, s, £[:, i2], Z[:, i2-l]) end qz2step Algorithm 4.2: The doubly shifted QZ step o

VT = u T *A[fc:fc+2, Jk:n]; A[fc:fc+2, fc:n] = A[fc:fc+2, fc:n] - «VT

Vol II: Eigensystems

then our problem is to choose positive numbers 7*1.il : z'2j) and the transformations need not be accumulated.. This problem is not well determined. rn and c\.. back searching.. If we take that value to be one. Thus we should like to find diagonal matrices DR and Dc with positive diagonals so that the nonzero elements of of DRADC and DRBDC are as nearly equal in magnitude as possible. If we write then we wish to choose positive numbers ri. We will treat the following topics: balancing. B) by a scalar does not change the balance... The purpose of this subsection is to fill the gaps—at least to the extent they can be filled in a book of this scope.. 4. if we set then the problem is equivalent to finding quantities p\ and 7^ so that If we solve this problem in the least squares sense. rn and c\. But we can make it well determined by specifying a value around which the scaled elements must cluster. cn so that all the elements riCj\aij\ and TiCj\bij\ of DRADC and DRBDC are nearly equal.B[il : i2. . since multiplying the pencil (A. • The algorithm is stable in the usual sense. cn so that The problem (4. B) are the diagonal equivalences. For more see the notes and references. .4..il : i2]. . infinite eigenvalues.. • If only eigenvalues are desired.1S2 CHAPTER 2. However. Balancing The natural class of transformations for balancing the pencil (A.. and eigenvectors.. the sweep can be performed only on the subpencil (A[il : i2. IMPORTANT MISCELLANEA Anyone attempting to code a QZ-style algorithm from the description in the last two sections will quickly find how sketchy it is.22) is nonlinear in the unknowns rt. . we end up with the problem . 2x2 problems..and GJ.. THE QR ALGORITHM This gives an operation count of which is exactly twice the count for a double QR sweep.

A similar criterion is used for the elements of B.23) are very simple. THE GENERALIZED EIGENVALUE PROBLEM 153 Without going into details.3)-plane to annihilate the hatted element in B. We can postmultiply this pencil by a rotation in the (4. Infinite eigenvalues If.SEC. There are many ways of treating infinite eigenvalues. Consequently. balanced norm. il] and A[i2—1. Back searching The QZ step implemented in Algorithm 4. Although the system is too large to solve directly..6. The procedure sketched above is not invariant under change of scale of A and B — i. the normal equations for (4. it will produce different results for the pair (ovl. the pencil has an infinite eigenvalue. it can be solved by an iterative technique called the method of conjugate gradients. The integers il and 12 are determined by a back-searching algorithm analogous to Algorithm 2. rB) as a and r vary.12] have been deemed negligible. 4. The chief problem is to decide when an element is negligible. it is recommended that A and B be normalized so that their norms are one in some easily calculated. where the elements A[/7-l. The usual procedure is to assume that the problem is reasonably balanced and declare an element aij negligible if in some convenient norm. Consider the Wilkinson diagram in which the fourth diagonal element of B is zero.e. This scaling also helps in controling underflow and overflow in the QZ step. and the method converges so swiftly that the determination of the scaling factors is also an 0(n) process (remember that we do not have to determine the scaling factors very accurately). in the course of the QZ iteration. Each iteration requires 0(n) operations.2 is performed between rows il and 12 of A and B. The result is the pencil VolII: Eigensy stems . The following is a simple and efficient deflation procedure. a negligible element occurs on the diagonal of B.

.154 CHAPTER 2.5)-plane to return A to Hessenberg form: We can now repeat the process by postmultiplying by a rotation in the (3.and premultiplication gives Finally premultiplication by a rotation in the (1. the infinite eigenvalue has been deflated from the problem.2)-plane to get and then premultiplying by a rotation in the (3.2)-plane gives As the lines in the above diagram show. THE QR ALGORITHM We may now premultiply a rotation in the (4.4)-plane to get One more step of post.

so that the real eigenvalues appear explicitly on the diagonal of the Schur form.5. Eigenvectors Eigenvectors can be computed by a back substitution algorithm analogous to the ones for the ordinary eigenvalue problem. Specifically. Copy the EISPACK or LAPACK code. The details are too involved to give here. then the (5. if ei€2/c is negligible. consider the matrix If we apply a plane rotation in the (5. Kronecker [152.269. 1890] extended the result to rectangular pencils. it may have real eigenvalues. Jordan [137.141. we can set the (5. The use of the chordal metric in generalized Vol II: Eigensystems .4)-element as described above. and once again we refer the reader to the EISPACK or LAPACK code. 4. This amounts to computing the Schur form of a pencil of order two.1874] later gave a new proof for singular pencils.4)-plane to annihilate ci. NOTES AND REFERENCES The algebraic background The generalized eigenvalue problem (regarded as a pair of bilinear forms) goes back at least to Weierstrass [293.4)-element to zero and deflate the zero (4. Unlike the generalized eigenproblem of order two. The 2x2 problem When the QZ iteration deflates a 2x2 pencil. 4. For modem treatments see [80. this algorithm contains no significant numerical pitfalls.4)-element becomes Consequently. who established the equivalent of the Jordan form for regular matrix pencils. In that case. is not an easy problem when the effects of rounding error are taken into account.1868]. However.SEC. we will naturally want to deflate it further.304]. This. 281. Perturbation theory For a general treatment of the perturbation theory of the generalized eigenvalue problem along with a bibliography see [269]. and we only warn the reader not to go it alone. The generalized Schur form is due to Stewart [251]. it turns out. its derivation is lengthy. THE GENERALIZED EIGENVALUE PROBLEM 155 We can also deflate an infinite eigenvalue if there are two small consecutive diagonal elements.

Ward [286] has observed that the algorithm performs better if real shifts are treated by a single shift strategy. THE QR ALGORITHM eigenvalue problems is due to Stewart [255]. Thus an alternative to deflating infinite eigenvalues from the middle of the pencil is to wait until they reach the top. Watkins shows that the the presence of small or zero elements on the diagonal of B does not interfere with the transmission of the shift. both the reduction to Hessenberg-triangular form and the steps of the QZ algorithm move zero elements on the diagonal of B toward the top. The balancing algorithm is due to Ward [287]. The QZ algorithm The QZ algorithm. In particular. who builds on earlier work of Curtis and Reid [50]. including the preliminary reduction to Hessenberg-triangular form. of course. . For the treatment of 2x2 blocks see the EISPACK code qzval or the LAPACK codeSLAG2. For a chordless perturbation theory and backward error bounds see [112].156 CHAPTER 2. is due to Moler and Stewart [178]. works when the eigenvalues of the trailing pencil are real. Moreover. Ward [286] and Watkins [291] consider the problem of infinite eigenvalues. Kahan. This option has been incorporated into the EISPACK and LAPACK codes. who is happy to acknowledge a broad hint from W. However. The double shift strategy. where they can be deflated by a single rotation.

2). In practice. Gaussian elimination (see Algorithm 1:2. We will treat this reduction in §1. Thus. we treat the symmetric positive definite generalized eigenvalue problem Ax = XBx. In the first section we will show how to adapt the QR algorithm to solve the symmetric eigenvalue problem. the algorithms of the first section of this chapter can be used to solve the singular value problem. where A is symmetric and B is positive definite.2. The procedure begins as usual with a reduction to a simpler form—in this case tridiagonal form—followed by a QR iteration on the reduced form. and methods for finding eigenvalues and eigenvectors of band matrices. For example.1 and the tridiagonal QR iteration itself in §1. It is included in this chapter because the squares of the singular values of a real matrix X are the eigenvalues of the symmetric matrix A = X^X and the eigenvectors of A are the left singular vectors of X. and it is better to solve the singular value problem on its own terms. Because of their symmetry they can be stored in about half the memory required for a general matrix. This problem has appealing analogies to the ordinary symmetric eigenproblem. but it is mathematically and numerically less tractable. In §2 we will consider some alternative methods for the symmetric eigenvalue problem: a method for updating the symmetric eigendecomposition. The first part of this chapter is devoted to efficient algorithms for the symmetric eigenvalue problem. real and complex matrices are treated in es157 . the QR algorithm for symmetric matrices has an analogue for the singular value problem. They also have special computational properties. We next turn to the singular value decomposition.3 THE SYMMETRIC EIGENVALUE PROBLEM We have seen that symmetric (or Hermitian) matrices have special mathematical properties. Finally.1). a divide-and-conquer algorithm. In the ordinary eigenvalue problem. the Cholesky algorithm for triangularizing a symmetric matrix requires half the operations that are required by its nonsymmetric counterpart.2. In particular. we can frequently use the symmetry of the problem to reduce the operation count in certain algorithms. Moreover. this approach has numerical difficulties (see §3. Fortunately. in principle. in a brief coda. their eigenvalues are real and their eigenvectors can be chosen to form an orthonormal basis for Rn.

THE SYMMETRIC EIGENVALUE PROBLEM sentially different ways. Finally. But first we must consider how symmetric matrices may be economically represented on a computer. The purpose of this subsection is to describe the details of the reduction of a symmetric matrix to tridiagonal form. the final Schur form becomes diagonal. the columns of the accumulated transformations are the eigenvectors of A. and the definite generalized eigenvalue problem. if complex.1. The general outline of the procedure is the same as for the nonsymmetric eigenvalue problem—reduce the matrix to a compact form and perform QR steps. the complex case differs from the real case only in the fine details. Because orthogonal similarities preserve symmetry. For this reason we will assume in this chapter that: A and B are real symnetric matrices of order n. 1. accumulating the transformations if eigenvectors are desired. Hence: Unless otherwise stated a tridiagonal matrix is assumed to be symmetric or. THE QR ALGORITHM In this section we will show how the QR algorithm can be adapted to find the eigenvalues and eigenvectors of a symmetric matrix. Thus the algorithm computes the spectral decomposition where V is an orthogonal matrix whose columns are eigenvectors of A and A is a diagonal matrix whose diagonal elements are the corresponding eigenvalues. Hermitian. First. Thus the matrix assumes a tridiagonal form illustrated by the following Wilkinson diagram: In this volume almost all tridiagonal matrices will turn out to be symmetric or Hermitian.1)]. all the elements below its first subdiagonal are zero and hence by symmetry so are all the elements above the first superdiagonal. . 1.158 CHAPTER 3. REDUCTION TO TRIDIAGONAL FORM If a symmetric matrix A is reduced to Hessenberg form by orthogonal similarity transformations. the singular value problem. For the symmetric eigenvalue problem. the Hessenberg form becomes a tridiagonal form [see (1. Second. the process simplifies considerably.

Packed storage offers two advantages. the upper half of a two-dimensional array of real floating-point numbers and code one's algorithms so that only those elements are altered. The disadvantage is that the rest of the array is seldom used in practice because its n(n-l)/2 elements are insufficient to hold another symmetric or triangular matrix of order n. THE QR ALGORITHM The storage of symmetric matrices 159 Since the elements a^ and ajz of a symmetric matrix are equal. Second. one can put the diagonal and superdiagonal elements of a symmetric matrix into a one-dimensional array — a device known as packed storage. Because our algorithms are easier to understand in the natural representation. Reduction to tridiagonal form The algorithm for reduction to tridiagonal form is basically the same as the algorithm for reduction to Hessenberg form described beginning on page 83. since the code references only a one-dimensional array. j] while leaving the rest of the array free to hold other quantities. First. The following diagram illustrates a 4x4 symmetric matrix stored in an array a in column order. say. one can store the matrix in. unlike the natural representation. it uses all the storage allocated to the matrix. on nonoptimizing compilers it may run more efficiently. We partition A in the form and determine a Householder transformation H such that Ha^i = 76!. This approach has the advantage that it permits the use of array references like A[z. The price to be paid for these advantages is that the resulting code is hard to read (although practice helps). Second. 1. a symmetric matrix is completely represented by the n(n+l)/2 elements on and above its diagonal. The two usual ways of doing this are to store them in column order and in row order.SEC. Then Vol II: Eigensystems . First. There are two ways of taking advantage of this fact to save storage. we will assume in this volume that the symmetric matrix in question is contained in the upper part of a two-dimensional array.

If we denote the vector defining E by r. Thus the final tridiagonal matrix has the form Algorithm 1. and the off-diagonal elements in another one-dimensional array. The reduction continues by recursively reducing EA^E to tridiagonal form. we must decide how to represent the final tridiagonal matrix. since in our storage scheme traversing a row .1 implements the tridiagonalization described above. then This formula can be simplified by setting so that Before we can code the algorithm. Here are some comments. say e. k+1] and the Householder generating vector must be done at the scalar level.160 CHAPTER 3. THE SYMMETRIC EIGENVALUE PROBLEM This matrix has the Wilkinson diagram (n = 6) from which it is seen that the transformation has introduced zeros into the appropriate parts of A. The usual convention is to store the diagonal of the matrix in a onedimensional array. The computation of EA^E can be simplified by taking advantage of symmetry. say d. • The computation of the product of A[k+1.

15.s[i]*r[j] 21.k] = r[k+l:n] w =0 for i = fc+1 ton s[i] = 0 9. k+l:n] 31. tor j = k+l to i-l s[i] = s[j] + A [7.n] 26. end for k 34. end for j 23.(w/2)*r[k+l:n] 18. F[l:n-2. end for i 17. for A. d*r[j] end for j for j = i to n sM = *[*] + 4[i. 10.j]*r[.n-l] 25.r[i]*s[j] . rf[n-l] -1] = A[n-l.r*vT 32. 1. in the array e. 13. VT = rTF[A.n-l:n] = I 28. d[n] = A[n. 14. 5. d(k) = A[k. s[k+l:n] = s[k+l:n] .1: Reduction to tridiagonal form o Vol II: Eigensystems . 4. e.+l:n. The diagonal elements of the tridiagonal matrix are stored in the array d\ the offdiagonal elements. j ] = A[iJ] .n-l:n] = 0. 12. r[k+l:n]. V) for k ~ 1 to n-2 3.] end for j iu = i/. 8. 24.fc+l:«]. V[n-l:n. -f 5[z]*r[z] 16. 2. d. V r [:. 7. end TriReduce Algorithm 1. A [ i . = n-2 to 1 by-1 29. V[fc+l:n.k+l:n]. end for fc.A ? ] = c Jb 33. THE QR ALGORITHM 161 Given a real symmetric matrix A of order n. e[n-l] = A[n-I. r = V[k+l:n1k] 30. TriReduce reduces A to tridiagonal form. 11.SEC. f o r i = 1 to j 20. 6. f o r j = Ar+1 to 7^ 19.k] housegen(A[k. n] 27. A?+l:w] = V[A:+l:w. end for i 22. The transformations are accumulated in the array V. TriReduce(A. 1. e[k+l]) V[k+l:n.

• As in Algorithm 2.162 CHAPTER 3. Making Hermitian tridiagonal matrices real Algorithm 1. destroying A in the process.n].4. • The algorithm is backward stable in the usual sense (see Definition 2. The computation of A^r requires two loops because only the upper part of the array A is referenced.2) is done in statements 8-17.2. • In spite of the fact that the final tridiagonal form requires only two one-dimensional arrays to store. This computation is neither column nor row oriented. • The computation of s. its computation requires the upper half of a two-dimensional array. The tridiagonal matrix T has the form . Specifically: The operation count for Algorithm 1. however. Chapter 2. we can make the elements of e real by a sequence of diagonal similarity transformations. and r 1 A^r in (1. can contain complex numbers. The process is sufficiently well illustrated for the case n — 3. This means that subsequent manipulations of the tridiagonal form will use complex rotations. the array e. we economize by saving the generating vectors for the Householder transformations and then accumulating them in a separate loop. which is expensive (see the comment on page 91). Fortunately.1 can easily be converted to compute the tridiagonal form of a complex Hermitian matrix.1 is flam for the reduction of A flam for the accumulation of V This should be compared with an operation count of |n3 flam for the reduction of A in Algorithm 2. and it is an interesting exercise to recast it so that it has uniform orientation. THE SYMMETRIC EIGENVALUE PROBLEM of A must be done by proceeding down the corresponding column to the diagonal and then across the row. • The first column of the matrix V is GI . This fact will prove important in deriving the implicit shift QR algorithm for symmetric tridiagonal matrices.2. To keep the indexing consistent we store the Houselder vector in r[fc+l. • The algorithm is cheaper than the reduction of a general matrix to Hessenberg form. Chapter 2). In this form the array d will necessarily contain real numbers. Chapter 2. This particular implementation uses the array containing the original matrix A.

Vol II: Eigensystems . whereas at the end we require an array of real floating-point numbers. Then 163 If e\ — 0.2. 1. Let D\ = diag(l. v<£). if T is symmetric and tridiagonal.2 sketches this scheme. THE QR ALGORITHM The first step is to make e\ real. and we do not need to do anything. it is real.3)]. THE SYMMETRIC TRIDIAGONAL QR ALGORITHM Since a tridiagonal matrix T is necessarily upper Hessenberg and the basic QR step preserves upper Hessenberg form.SEC. As above. 1). 1. 1. Thus the QR algorithm preserves symmetric tridiagonal form. When this fact is taken into account the result is a sleek. Since the QR step also preserves symmetry. the result of performing a QR step on T is symmetric and tridiagonal. we will assume that the tridiagonal matrix T in question is represented by an array d containing the elements of the diagonal of T and an array e containing the elements of the superdiagonal [see (1. The details are left as an exercise. The array e is necessarily an array of type complex. Otherwise we take v\ — e-t l\e\ . Algorithm 1. so that Similarly if we define D-2 — diag(l. the result of performing a QR step on T is upper Hessenberg. In this case we may take 1/1 = 1. z/i. This problem can be overcome by adjusting the e^ (and the columns of V) as they are generated in the course of the reduction to tridiagonal form. It is incomplete in an important respect. where then Thus the original matrix has been reduced to real tridiagonal form. speedy algorithm for computing the eigenvalues and optionally the eigenvectors of a symmetric tridiagonal matrix.

if (& / n-1) 7. Since PI 2 is a rotation in the (1. &+1] 10. for k = 1 ton-1 a = \ek\ if(a/0) v — ek/a ek = a 6.2: Hermitian tridiagonal to real tridiagonal form The implicitly shifted QR step The symmetric tridiagonal QR algorithm has explicitly and implicitly shifted forms. end if 9. Given a shift AC determine a plane rotation P12 such that the (2. The transformations are accumulated in a matrix V. 4.6). end for fc Algorithm 1. Chapter 2.164 CHAPTER 3. F[l:n. the matrix P^APu has the form . 1. 2. An implementation of this sketch goes as follows. Determine an orthogonal matrix Q whose first column is a multiple of ei such that f = QT-Fj2 APi2Q is tridiagonal. end if 11. THE SYMMETRIC EIGENVALUE PROBLEM This algorithm takes a Hermitian tridiagonal matrix represented by the arrays d and e and transforms it to a real tridiagonal matrix.1)-element ofP-^T-K^iszero. of which the latter is the most commonly implemented. shows that T is the matrix that would result from an explicit QR step with shift AC. 5.2)-plane. 3. 2. the reasoning following (3. As with the implicit double shift. Ar+1] = */*F[l:n. In broad outline the tridiagonal QR step goes as follows. efc+i = £*efc+i 8. 1.

Consequently. Figure 1. On applying the similarity transformation. Algorithm 1. As one enters the loop on i. the matrix has the form where a hat indicates a final value. as usual.SEC. THE QR ALGORITHM 165 Figure 1. The application of the rotations is tricky. This process can be repeated until the element / drops off the end.1: Implicit tridiagonal QR step We now choose a rotation P23 in the (2.1 illustrates the entire procedure. we obtain a matrix of the form We see that / has been chased one row and column down the matrix. 1.3)-plane to annihilate /. The element g is an intermediate value of di after the rotation Pi-ij has been applied. The first two statements of the loop complete the Vol II: Eigensystems .3 implements a QR step between rows il and 12 of T. In the implementation of this algorithm we must. take into account the fact that the problem may have deflated.

b = c*e[i] 1. if(tV/7)e[2-l] = ^. V[:. e.p = 0. w = d[i] — p 10.K 3. 5.p 17. /. c[t2-l] = ^ 18. c = l. fori = i7toi2-l 5. THE SYMMETRIC EIGENVALUE PROBLEM Given a symmetric tridiagonal matrix whose diagonal is contained in the array d and whose superdiagonal is contained in e. 2. F[:. 5) 8. c.«+!]) 15. end for i 16. t]. 4. 13. end triqr Algorithm 13: The symmetric tridiagonal QR step o . p = 5*v d[i] = w+p 12. rotgen(g. V) g = d[i7] . / = s*e[i] 6. 1. /o/flpp(c. triqr(d. this algorithm performs one QR step between rows il and i"2. The transformations are accumulated in V.166 CHAPTER 3. g = c*v—b 14. /7. d[i2] = d[i2] . v = (d[z+l] — w)*5 + 2*c*6 11. i'2.fi 9. AC. 5 = 1.

Thus the computation of eigenvectors is the dominant part of the calculation.3.14). dn is converging to a simple eigenvalue. Choice of shift Two shifts are commonly used with Algorithm 1. say. It is worth noting that the derivation of this sequence requires the application of the identity c2 -f s2 = 1. Fortunately. Such "simplifications" based on exact mathematical identities can turn a stable algorithm into an unstable one. implies that the values of e n _i converge cubically to zero. which is the eigenvalue of the matrix that is nearest c/. THE QR ALGORITHM application of P{-\^ to bring the matrix in the form 167 Finally the subsequent statements bring the matrix into the form which advances the loop by one step.The Wilkinson shift is globally convergent and is generally the one that is used. if. If only eigenvalues are required.+I. this is not true of Algorithm 1.. then the transformation of T requires 6m fladd and 5m flmlt plus m calls to rotgen.2. then (2. It is easy to see that if m = 12 — il. Chapter 2. Local convergence The local convergence analysis beginning on page 73 applies to the symmetric problem. provided the Rayleigh quotient shift is used. 1. Specifically. In fact.SEC.The second is the Wilkinson shift. which is simply the diagonal element c/.3. so that in the presence of rounding error the statements do not produce the same results as the direct application of the plane rotations PI. The accumulation of the transformations in V requires mn flrot. statement 14 can be eliminated with great saving in computations. because the multiple eigenvalues of a defective matrix VolII: Eigensystems .2. The first is the Rayleigh quotient shift. which is stable in the usual sense.

We will now see how e perturbs the eigenvalue pd when e is small. The characteristic equation for T is where g = e2. where we proposed two possible solutions.168 CHAPTER 3. the tridiagonal matrix T deflates into two tridiagonal matrices. When e is zero. Here we will propose yet another solution adapted to tridiagonal matrices. This possibility is so remote that it is not considered a reason to prefer the Rayleigh quotient shift over the Wilkinson shift. we have the latter approximation holding when p is small. If et is negligible. If we wish the relative error in pd to be less than the rounding unit. one ending with di and the other beginning with d{+i. we must have or . Since A = pd. Differentiating this equation implicitly with respect to g gives where \g is the derivative of A with respect to g. Chapter 2. we can set it to zero to deflate the problem. Thus we must consider what criterion we shall use to decide when a subdiagonal element of a tridiagonal matrix is effectively zero. We will proceed informally by deriving a criterion for deflating a graded matrix of order two. one for balanced matrices and one for graded matrices. The Wilkinson shift usually gives cubic convergence. the perturbation in the eigenvalue pd is approximately —e2/a. Deflation If ei is zero. THE SYMMETRIC EIGENVALUE PROBLEM are nondefective the convergence is cubic even for multiple eigenvalues. Consider the matrix where p < 1 represents a grading ratio. although in theory the convergence can be quadratic. We have already treated this problem in §2. Hence when e is small. the matrix has eigenvalues d and pd.4.

36). then their geometric mean is about the same size. But it by no means exhausts the subject. and the criterion reduces comparing et. Not surprisingly. which is especially valuable when only the eigenvalues of a symmetric tridiagonal matrix are needed. Vol II: Eigensystems . the grading should be downward [see (2. Because ^f\d^di^\\ < ||T||F. 1. This suggests that we regard a subdiagonal of a graded tridiagonal matrix to be negligible if This criterion is effective for graded matrices. Variations This brings to an end our discussion of the QR algorithm for symmetric matrices. Chapter 1) insures that the criterion (1.10. which performs well on matrices that are graded upward. Chapter 2]. each with their advantages—real or imagined. as does the Handbook series of algorithms [305]. people have proposed many versions of the basic tridiagonal algorithm. For more see the notes and references.5). On the other hand. THE QR ALGORITHM 169 Note that the quantity y |a||pa| is the geometric mean of the two diagonal elements of T. Of these variants perhaps the most important is the square root free PWK method. which tends to preserve the smaller eigenvalues. then the criterion is stricter than (1.to the rounding unit scaled by the size of the local diagonals. See the notes and references.4). but it is too liberal for ordinary matrices. We therefore replace it with Note that if di and di+i are of a size. Wilkinson's Algebraic Eigenvalue Problem [300] contains much valuable material. which is required reading for anyone interested in the subject. One has only to write down a naive version of the tridiagonal QR step and compare it with Algorithm 1. if the matrix is graded as in (1.3 to realize that the formulas can be manipulated in many different ways. Graded matrices It is important to remember that when one is using the tridiagonal QR algorithm to solve graded problems.3. there is a variant of the QR algorithm. the Hoffman-Wielandt theorem (Theorem 3.6) always preserves the absolute accuracy of the eigenvalues. Alternatively. 1.SEC. called the QL algorithm. NOTES AND REFERENCES General references The title of this chapter is the same as that of Beresford Parlett's masterful survey of the field [206]. If the grading is upward we can flip the matrix about its cross diagonal to reverse the grading.

THE SYMMETRIC EIGENVALUE PROBLEM Reduction to tridiagonal form Givens [83. which in consequence are often called Givens rotations. We have derived the criterion (1. Parlett [206. If the tridiagonal matrix was obtained by Householder tridiagonalization. 8].35].6. the criterion can be rigorously justified [59. Perhaps the most complete set of options is given by the LAPACK codes xSYTRD and xSTEQR.170 CHAPTER 3. For a detailed treatment of the algorithm see [206. Jiang and Zhang [135] have proposed a way of alternating between the Rayleigh quotient and Wilkinson shifts in such a way that the algorithm is globally and cubically convergent. For an error analysis of Householder's method see [300. which is based on the sequence defined by The QL step moves northwest from the bottom of the matrix to the top. and it is the first off-diagonal element that converges rapidly to zero. As mentioned in §2.3]. 4] gives a proof of the convergence of the QR algorithm with Rayleigh quotient shift that he attributes to Kahan. Chapter 2.6) for setting an element to zero informally by considering a 2x2 matrix. and the method is called Givens' method. The result is not quite global convergence. However. although the method performs well in practice. He went on to find the eigenvalues of the tridiagonal matrix by Sturm sequences and the inverse power method (see §2. He used plane rotations. that algorithm must also proceed from the bottom up. §5. For positive definite matrices. the QR algorithm will generally fail to compute the smaller eigenvalues accurately. The tridiagonal QR algorithm Although Francis proposed his implicitly shifted QR algorithm for Hessenberg matrices.1. rather than top down as in Algorithm 1. In 1968 Wilkinson [302] showed that the QR algorithm with Wilkinson shift (not his name for it) is globally convergent. Theorem 2. The actual algorithm given here is due to Wilkinson [297]. For an error analysis of Givens' method see [300. the extension to symmetric tridiagonal matrices is obvious. For a brief summary of the various implementations of the symmetric tridiagonal . and this has been the preferred shift ever since. Householder [123] observed that a general matrix could be reduced to Hessenberg form by Householder transformations and by implication that a symmetric matrix could be reducted to tridiagonal form. Ch.3). The latter automatically chooses to use either the QR or QL algorithm based on the relative sizes of the diagonals at the beginning and ends of the current blocks. An alternative is the QL algorithm. If a symmetric tridiagonal matrix is graded upward.26].1954] was the first to use orthogonal transformations to reduce a symmetric matrix to tridiagonal form. Ch. §5. The former reduces a symmetric matrix to tridiagonal form and can be coaxed into performing the reduction either top down or bottom up.

2.SEC. we may write the updating problem in the form Vol II: Eigensystems . the updating algorithm is less expensive than computing the decomposition from scratch.1. updating the spectral decomposition requires 0(n3) operations. With some decompositions—e.1). §§8. since a — ±1. and the central part of the computation—the diagonal updating problem—is useful in its own right. which we already know. which is treated in §2. Algorithm 2. One way of solving this problem is to form A and use the QR algorithm to compute its eigensystem. However. In this section we will consider some important alternative algorithms. 2. 2. Nonetheless. Find the spectral decomposition A — VAV"T of A.2. At each stage we underline the currently reduced matrix First..1 outlines the basic steps of the updating algorithm. Reduction to standard form We first turn to the problem of reducing A = A-fcrxx1 to the standard form D + zz1. We proceed in stages. A CLUTCH OF ALGORITHMS 171 QR algorithm see [206. We will treat these steps each in turn. to simplify the computation—a process called updating. We will begin this section with the problem of computing the eigensystem of a diagonal matrix plus a rank-one modification. Unfortunately.14-15]. it might be hoped that we can use the spectral decomposition (2. For an implementation see the LAPACK routine xSTERF. in §2. Let the symmetric matrix A have the spectral decomposition [see (1. Let where a — ±1. Finally.10). Chapter 1].3 we will show how to compute selected eigenvalues and eigenvectors of symmetric band matrices.g. RANK-ONE UPDATING In this subsection we will be concerned with the following problem. This algorithm can in turn be used to implement a divide-and-conquer algorithm for the symmetric tridiagonal problem. the Cholesky decomposition—updating can result in a striking reduction in computation. A CLUTCH OF ALGORITHMS The QR algorithm is not the only way to compute the eigensystem of a symmetric matrix. which also contains a complete exposition of the PWK (Pal-Walker-Kahan) algorithm. typically from 0(n3) to O(n2).

Deflate the problem to get rid of all very small components of z and to make the diagonal elements of D reasonably distinct. from the spectral decomposition (2.1). such that Sy is nonnegative. Then This makes the elements of the vector defining the update nonnegative. THE SYMMETRIC EIGENVALUE PROBLEM Let A — FAFT be the spectral decomposition of the symmetric matrix A. where D is diagonal with nondecreasing diagonal elements. Third.1: Rank-one update of the spectral decomposition This makes the sign of xx1 positive. which is diagonal. 4. let 5 be a diagonal matrix whose elements are ±1. we have rp The matrix to be updated is now <jA. Algorithm 2. 2. Use the eigenvalues to compute the eigenvectors of D -f zz^. This algorithm sketches how to compute the spectral decomposition of A = A + axx^. 1. 5. 3. and z is a vector with nonnegative components. let P be a permutation such that the diagonals of cr_PT AP appear in nondecreasing order. where <r = ±l. Find all the remaining eigenvalues of D + zz1'. Reduce the problem to that of updating the eigensystem of D + zz . Fourth. Then If we now set then Thus if we can compute the spectral decomposition .172 CHAPTER 3. Transform the eigensystem of D + ZZT back to the original eigensystem of A. This is done by solving a rational equation whose roots are the required eigenvalues. Second.

• Unless the order n of the problem is small. This has the consequence that our updating algorithm does not gain in order over the naive scheme of computing the decomposition of A from scratch. Any permutation of the diagonals must be reflected in a corresponding permutation of the columns of VS and the components of z. Vol II: Eigensystems . update . we will tacitly assume that VS and z have also been adjusted. • The way a = ±1 keeps popping in and out of the formulas may be confusing.A and then change the sign back at the end. 2. The computation of ( V S ) P is a permutation of the columns of VS. The computation of VS is simply a column scaling. what the equations say is that if a — -1. But still it is faster.1). about which more below. Here are some comments on the reduction. Deflation: Small components of z If the ith component of z is zero. and the components of z until the very end. Instead one might construct a data structure that keeps track of where these quantities should be and reference them through that structure. By a permutation we can move this eigenvalue to the end of the matrix and bring it back at the end of the update. A CLUTCH OF ALGORITHMS then with A = ah the decomposition 173 is the spectral decomposition of A. The computation of (VSP)Q is a matrix-matrix multiplication and takes about n3 flam. • We will have occasions to exchange diagonals of D later in the algorithm. so that z has the form then D -f ZZL can be partitioned in the form In this case the element di is an eigenvalue of the updated matrix. the diagonals of D. When we speak of such permutations. • In an actual implementation it may be better not to permute the columns of VS. we must compute VSPQ (this is essentially step five in Algorithm 2.SEC. care should be taken to choose an efficient algorithm to sort the diagonals of D in order to avoid a lengthy series of column interchanges in the matrix VS. Algorithmically. • To complete the updating.

8.. and we must decide when we can regard it as zero.zz^ I^M.174 CHAPTER 3. Now by computing the Frobenius norm of E. THE SYMMETRIC EIGENVALUE PROBLEM In practice.is small. Deflation: Nearly equal d{ It turns out that if two diagonals of D are sufficiently near one another. Chapter 1. we get the bound Moreover we can estimate Hence our criterion becomes Here we have absorbed the constant \/2 into 7. and 7 is some modest constant (more about that later). Instead we will find that some component zt. The process is sufficiently well illustrated by the 2x2 problem Let c and s be the cosine and sine of a plane rotation that annihilates the component Z2 of z: .Unless the original matrix has special structure. if this criterion is satisfied. By Theorem 3.by zero is to replace D + zz1 by D + ZZT . we will seldom meet with components of z that are exactly zero.E. Now in this case we can write Thus the effect of replacing <?. Thus a reasonable criterion is to require Z{ to be so small that where eM is the rounding unit. setting Z{ to zero will move each eigenvalue by no more that \\D-\. thereby deflating the problem. we can use a plane rotation to zero out one of the corresponding components of z. this is the kind of perturbation we should expect from rounding the elements of A.

and the problem deflates. We begin with a theorem that shows where the eigenvalues lie and gives an equation they must satisfy. If d\ < d% < • • • < dn and the components of z are nonzero. There is yet another tradeoff. the above discussion suggests that it should have a reasonable large value. Theorem 2. we will seldom deflate and miss the substantial benefits of deflation. once a diagonal has been deflated. Then 175 Consequently. 2. When to deflate There is an important tradeoff in our criteria for deflation. the accuracy of our final results will deteriorate. the eigenvalues ofD + zz1 satisfy the SECULAR EQUATION Moreover. As above. we deem this quantity sufficiently small if In the general case. On the other hand. this is too liberal because the rounding-error analysis is necessarily conservative. The secular equation We turn now to the computation of the eigenvalues of D + ZZH . The conventional wisdom is that 7 should be a modest constant. If it is too stringent. The calculations of the eigenvalues and eigenvectors to be described later require that the Z{ be nonzero and that the d{ be strictly increasing. say 10.1. Since 7 controls the stringency of the criterion. because the diagonal elements of d are nondecreasing it is only necessary to test adjacent diagonal elements.SEC. However.1) satisfy Vol II: Eigensystems . This favors a liberal criterion over a conservative one. Saving accuracy at one stage is a poor bargain if it is lost at another. if the criterion is too liberal. the solutions AI < • • • < An of (2. A strict rounding-error analysis suggests that 7 ought to grow with n. A CLUTCH OF ALGORITHMS where z\ = y1' z\ + z\. the diagonal that remains must be tested again against its new neighbor. if (d^ — d\)cs is sufficiently small we can ignore it. Thus the deflation procedure can be seen as a preprocessing step to insure that the computation as a whole runs smoothly. Of course.

Thus our root-finding algorithm must be specifically designed to work with the secular equation. we assume that. Hence / has a zero AI in (d\.1). As A approaches d\ from above. For definiteness we will consider the problem of computing A^. Let us consider the zeros of the latter. Thus in principle we could use an off-the-shelf root finder to compute the A.di+i) (i — 1. to determine the eigenvectors the eigenvalues must be calculated to high accuracy.e^). while the vector v is an eigenvector corresponding to 1 — v*-u.+i) [or in (dn. For more see the notes and references. This can be done by . /(A) becomes negative. each floating-point operation returns the correctly rounded value of the exact operation. and here we will merely sketch the important features. Throughout these volumes we assume that floating-point computations are done in standard floating-point arithmetic. The result now follows from the fact that the determinant of a matrix is the product of the eigenvalues of the matrix. • Solving the secular equation Theorem 2. Thus the characteristic polynomial is the product of Hi(^ ~ ^) anc* /(A) defined by (2. The proof is completed by observing that the n zeros A. /(A) must become positive. d. THE SYMMETRIC EIGENVALUE PROBLEM Proof. Finally. The characteristic polynomial of D + zz^ is the last equality following from (2. /(A) must become negative.8) shows that each root A z is simple and located in (d. the interlacing inequality (2.9).. Specifically.1 shows that we can find the updated eigenvalues A z of A (up to the sign a) by solving the secular equation (2.. excluding overflows and underflows. Moreover. To derive the secular equation we will employ the useful fact that This can be proved by observing that any vector orthogonal to v is an eigenvector of / — UVT corresponding to one....7). But first a word on computer arithmetic. The detailed description of such an algorithm is beyond the scope of this book. n . while as A —»• oo. However. of/ are also zeros of the characteristic polynomial p and hence are the eigenvalues of D + zz^.7). as A approach dn from above. This assumption is critical for the algorithms that follow. oo) for An]. The first step is to determine whether A^ is nearer <4 or dk+i.176 CHAPTER 3. Hence / has a zero A n satisfying An > dn. where k < n..In the same way we can show that / has zeros \i £ (di. As A approaches di from below. /(A) -*• 1.

SEC. The second step is to shift the secular equation so that the origin is d^. A& is nearer di\ if it is positive. 2. For definiteness we will suppose that \k is nearer d^. to which we now turn. let and define g(r) = f(dk -f T). Our stopping criterion is: Stop when the computed value of g(T) at the current iterate ios s than its error biound. the functions r(r) and s(r) can be evaluated to high relative accuracy whenever r is between 0 and (cfor+i — cfo)/2. In principle any good zero-finding technique applied to g will suffice to find the correction r^. see the notes and references.D) lz. Moreover. one must decide when the iteration for T> has converged. the most effective schemes are based on approximating g ( r ) by a rational function g ( r ) and finding the root of of g as the next approximation. If the value is negative. as it is calculated one can also calculate a bound on the error in its computed value. Finally. Because of the simplicity of the function g(r). A CLUTCH OF ALGORITHMS 111 evaluating f[(dk -f dk+i )/2]. Computing eigenvectors Let A j be an eigenvalue of D+zz1. Specifically. For more. Then This shift has the advantage that we are now solving for the correction T^ to dk that gives Afc. Then the corresponding eigenvector q must satisfy the equation (D + zz^}qj = \jq3 or equivalently Thus qj must lie along the the direction of (\jl . It can be shown that this criterion returns eigenvalues that are sufficiently accurate to be used in the computation of the eigenvectors. Afc is nearer di+i. However. because g ( r ) has poles at zero and dk+i — dk. and hence the normalized eigenvector is given by VolII: Eigensystems .

.. be far from orthogonal to one another. Now the ith component of qj is given by where rjj is the normalizing constant. One cure for this problem is to compute the eigenvalues Xj to high precision (it can be shown that double precision is sufficient).2. A n . r-r\ Proof. unless Xj is a very accurate approximation to Xj. and solving for zf. We compute the characteristic polynomial of D + ZZT in two ways: [see (2. . satisfy the interlacing conditions Then the quantities are well defined.12) will make a large perturbation in qij. there is an alternative that does not require extended precision. then the eigenvalues of D — zz1 are AI. It is based on the following theorem. replacing Xj by Xj in (2. If Xj is very near di. Instead we have approximations Xj obtained by solving the secular equation. Setting A = c/j. worse yet. However.178 CHAPTER 3. In particular.. We will first derive the formula (2.10)]. Hence the z^ are well defined and given by (2. Let the Quantities A. eigenvectors computed in this way may be inaccurate and.14). Theorem 2. we find that The interlacing inequalities imply that the right-hand side of this equation is positive. If we set z = (zi • • • zn) . we do not know the eigenvalues of D + zz1'. and then show that these values have the required property. as it will be when d{ and di+i are very near each other. THE SYMMETRIC EIGENVALUE PROBLEM Unfortunately.14) under the assumption that the required Zi exist. .

after all. • Equation (2.2. suppose that we compute the zt.16) also justifies our replacing sign(^) by sign(z.. Thus the computation VolII: Eigensy stems .15) are equal at the n distinct points dj. A CLUTCH OF ALGORITHMS 179 To show that the eigenvalues of D -f ZZT are the Aj. as we have pointed out. and hence insignificant. • We have already noted that the algorithm is backward stable. Chapter 2]. they must be within a small multiple of the rounding unit. For if Z{ and Z{ have opposite signs.11) in finding the approximate eigenvalues A. Moreover. in the expression the Aj are the exact eigenvalues of D + ZZT. Since both expressions are monic polynomials of degree n.). The computation of the vectors requires approximately n2 fladd and n2 fldiv for the computation of the components of Q and n2 flam and n2 fldiv for the normalization (the divisions in the normalization can be replaced by multiplications). Hence the components QJ of the computed eigenvectors qj will be to high relative accuracy the components of the exact eigenvectors of D + ZZT .SEC. and then use them to compute the components of the eigenvectors from (2. the vectors corresponding to poorly separated eigenvalues will be inaccurate. we discover that they are zeros of the second expression—i. then where fi is a slowly growing function of n. The inequality (2. There remains the question of the stability of this algorithm. Here are two comments on this program. the Z{ can be computed to high relative accuracy. 2.35).14).from (2.. In particular. To apply the theorem.12). they must be identical. But by Theorem 3.14) does not quite specify 5t-. since the right-hand side contains the factor sign(^). • The computation of the Z{ requires approximately 2n2 flam and n2 fldiv. They will have small residuals [see (2. Chapter 1. However. note that by the way the Z{ were defined the two expressions for the characteristic polynomial of D+z ZT in (2. As we shall see in a moment. the matrix Q of these eigenvectors will be orthogonal to working accuracy. the Aj satisfy the secular equation for D + zz?'. However. Now in standard floating-point arithmetic. Thus the computation of the eigenvectors of D + ZZT is backward stable.13. in our application it is sufficient to replace sign(ii) with sign(^). We summarize the computation of the eigenvectors in Algorithm 2. it can be shown that if we use the stopping criterion (2. with the appropriate signs. this does not imply that the computed eigenvectors are accurate. and. we have not found the eigenvectors of D + zz1 but those of D -f ZZT. In particular if we evaluate both expressions at the AJ. they will be orthogonal almost to working accuracy.e.

flM:. 11.J]||2 16. Compute the modified update vector 1.2: Eigenvectors of a rank-one update o . 3.di) end for j Zi — sqrt(zi)*sign(zi) end Compute the eigenvectors for j — 1 ton for i .di)/(dj+i .180 CHAPTER 3.11)] to the eigenvalues of D + zz1 that satisfy the interlacing inequalities (2. Si = Zi*(Xj .j] = «M/IW[:. Q[iJ] = 2i/&-di) end for i 15.di) end for j forj r j — i ton-1 7. THE SYMMETRIC EIGENVALUE PROBLEM Given sufficiently accurate approximations Xj [see (2. Zi = Zi*(Xj . Z{ = A n — u{ for j = 1 to i—1 4.13). 10.di)/(dj . end for j Algorithm 2.1 to n 13. 14. 9. for i = 1 ton 2. 6. 12. 8. 5. this algorithm returns approximations to the eigenvectors of D + ZZT in the array Q.

particularly the expensive accumulation of the plane rotations generated while solving the tridiagonal system. The deflation also saves time. the count rises to O(n3). and suppose that we can divide P(n) into two problems Pi(f) and ^(f) °f size f in such a way that the solution of P can be recovered from those of PI and PI at a cost of C(n). Thus the code for any implementation will be long and complicated compared with the code for forming A directly and applying the QR algorithm. which allows small but meaningful elements in a graded matrix to be declared negligible. and in some applications there are many deflations. and the routine to solve the secular equation is intricate. Generalities The notion of a divide-and-conquer algorithm can be described abstractly as follows. A DlVIDE-AND-CONQUER ALGORITHM In this subsection we will present a divide-and-conquer algorithm for solving the symmetric tridiagonal eigenvalue problem. Once the cost of computing VSPQ in (2. (2. The algorithm does not work well for graded matrices. At the second level. and so on. assume that n — 2m. the deflation criteria—e. we can repeat this dividing procedure on PI and P2.1. Is it worth it? The answer is yes. For convenience. Concluding comments Our sketch of the algorithm for spectral updating is far from a working program. At the third Vol II: Eigensystems .SEC. We can estimate the cost of a divide-and-conquer algorithm by counting the costs at each level of the recurrence.2. 2. A CLUTCH OF ALGORITHMS 181 of the eigenvectors is an O(n 2 ) process. at least for sufficiently large matrices. At the top level of the recurrence we solve one problem at a cost of C(n). We begin with some generalities about the divide-and-conquer approach. there is a second aspect. In particular. then on the subproblems of PI and PI. Extra storage is required for the eigenvectors of the diagonalized problem D + zz1'. Let P(n) denote a computational problem of size n. The data structures to handle permutations and deflation are nontrivial. We will return to this point in the next subsection. The algorithm is not without its drawbacks.g. This recursion will stop after the size of the subproblems becomes so small that their solutions are trivial—after no more than about Iog2 n steps. Now if PI and P2 are problems of the same type as P. We have seen that the O(n3} overhead in the QR algorithm is the accumulation of the transformation.6) — are couched in terms of the norms of D and z. we solve two problems each at a cost of C(|) for a total cost of 2(7(|). 2.3) is taken into account. it must be remembered that the diagonal updating is only a part of the larger Algorithm 2. However.. The updating algorithm has only one 0(n3) overhead: a single matrix-matrix multiplication to update the eigenvectors V. Thus for n large enough the updating algorithm must be faster. However.

To show how to combine the eigensystems.182 CHAPTER 3.) We can also write this partition in the form Now let and so that TI is TI with the element ds replaced by ds — es and T-z is T% with the element ds+i replaced by ds+i — es. we get for the total cost of the divide-and-conquer algorithm Of course we must know the function C(n) to evaluate this expression. and let T be partitioned in the form (Here s stands for splitting index. The idea behind our divide-and-conquer algorithm is to compute the eigensystems of TI and T<2 (which can be done recursively) and then combine them to give the eigensystem of T. after we have derived our algorithm. We will return to it later. THE SYMMETRIC EIGENVALUE PROBLEM level. Summing up these costs. Then we can write Thus we have expressed T as the sum of a block diagonal matrix whose blocks are tridiagonal and a matrix of rank one. we solve four problems at a cost of C( J) for a total cost of 4C( J)—and so on. Derivation of the algorithm Let T be a symmetric tridiaeonal matrix of order n. let and .

Actually. Algorithm 2. Ideally. as in (2. • In statements 16 and 17 we have taken advantage of the fact that V is block diagonal in forming VQ. it can be shown that cuppen is also stable. In LAPACK it is set at 25. • For ease of exposition we cast the algorithm in terms of full matrices. • The recursive approach taken here is natural for this algorithm. A CLUTCH OF ALGORITHMS be spectral decompositions of TI and T^. whose spectral decomposition can be computed by the algorithm of the last section.SEC. The code is not pretty. • We have already observed that the method for updating a spectral decomposition has a great deal of overhead.. • We have noted that the updating algorithm diagupdate may not work well with graded matrices. so that for small matrices the QR approach is more competitive. 2. we abandon the divide-and-conquer approach and compute the decompositions directly via the QR algorithm. we would represent T in some compact form — e. It follows that is the spectral decomposition of T. This reduces the work from n3 flam to |n3 flam. one might initially partition the matrix into blocks of size just less than mindac. For example. In practice. the value of mindac should be the break-even point where the two algorithms require about the same time to execute. for large matrices the ill consequences of using the wrong value are not very great.Then 183 The matrix D + zz1 is a rank-one modification of a diagonal matrix. when the submatrices in the recursion fall below a certain size.3 inherits this drawback. Consequently.g.18). Then one can work upward combining them in pairs. a direct. specified here by the parameter mindac. Vol II: Eigensystems . bottom-up approach is possible. • Because the algorithm diagupdate is backward stable. Here are some comments.3 implements this scheme. as we shall see. We are now in a position to apply the divide-and-conquer strategy to compute the eigensystem of a symmetric tridiagonal matrix. Algorithm 2. However.

^ = (Vi[5. z. y[l:«.V2*Q[s+l:n.:} 18.Ai.l] = r 2 [l.l:s] 8. end if 19. else 6. Cuppen(Ti.:] 17.3+l] 12. V) 2. n = the order of T 3. f2 2= T[s+l:n.s+l:n] 11. A. fi = T[l:s.:])T 15. o. T1[s. D = diag(Ai. THE SYMMETRIC EIGENVALUE PROBLEM Given a symmetric tridiagonal matrix T. f 22 [l. 1. V[a+l:n. s+1]. Cuppen(T.s] = f[s.l]--r[3. If the size of the matrix is less than the parameter mindac (min divide-and-conquer) the QR algorithm is used to compute its eigensystem. A.Vi) 10. end cuppen Algorithm 23: Divide-and-conquer eigensystem of a symmetric tridiagonal matrix o . It uses the routine diagupdate(D. A.A2) 14. Compute A and V using the QR algorithm 5. Cuppen computes its spectral decomposition VAVT. &!.:] . z. Q) that returns the spectral decomposition D + zz1 = QAQ^. T[s.:] F2[1.s+I] C 9. Cuppen(f2. diagupdate(D.184 CHAPTER 3.:] = Vi*9[l:5. Q) 16. s= \n/1\ 7. if (n < mindac) 4.%>) 13.s]-T[s.

The order constant | is quite respectable — compara Gaussian elimination and far better than the QR algorithm. it has certainly prospered. In general after the &th stage only -^ o the total work remains to be done. the algorithm benefits from deflations in diagupdate. This is essentially the cost of Cuppen exclusive of its cost of the recursive calls to itself. An examination of the algorithm shows two potentially significant sources of work: the call to diagupdate in statement 15 and the matrix multiplication in statements 16 and 17. Vol II: Eigensystems . we must first determine the function C(n).Specifically. Thus. the choice of the point at which one shifts to the QR algorithm (parameterized by mindaq} is not very critical.17)] for the total cost of Algorithm 2. In addition to the low-order constant. leaving \ of the work for the remaining stages. It is still O(n 3 ). 2. Thus: Algorithm 2. The second stage doe | of this work. We have already noted that the updating requires only O(n2) operations.3 requires about |n3 flam. which effectively make part of Q diagonal. while the multiplication requires |n3 flam.. We see that our divide-and-conquer algorithm has not really conquered in the sense that it has reduced the order of the work. leaving only ^ of the total. So little work is being done at the lower stages that it does not matter if we use an inefficient algorithm there. Thus after a two or three stages. With some classes of matrices the number of deflations grows with n. and we may take It is now an easy matter to get a sharp upper bound on Ctotai. the computation is dominated by the matrix multiplication.3. Of the total of f n3flamwork | (i. the remaining work is insignificant compared to the work done by the higher stages. This has the consequence that for large matrices.e. A CLUTCH OF ALGORITHMS Complexity To apply the formula 185 [see (2. But if it has not conquered.SEC. ^n3flam)is done in the veryfirstst the recursion. so that the divide-and-conquer approach becomes dramatically faster than the QR approach.

and transform back. if n is large. In outline our algorithm goes as follows. their complexity puts them beyond the scope of this book. then A is pentadiagonal and has the form Band matrices are economical to store. If p = 2. then A is a tridiagonal matrix. Another possibility is to store the transformations that reduce A to T. a symmetric pentadiagonal matrix can be represented by 3n-3 floating-point numbers. Thus we are forced to find the eigenvectors of A directly from A. Then A is a BAND MATRIX of BAND WIDTH 2p + 1 if Thus if p = 1. find the eigenvectors of T. Unfortunately. 2. and the reader is referred to the notes and references. cannot be stored. THE SYMMETRIC EIGENVALUE PROBLEM 2. However. often a few of the largest or smallest. which is of order n. How to compute small but well-determined eigenpairs of a general band matrix is at present an unsolved problem. This means that it is possible to store band matrices that are so large that the matrix of their eigenvectors. 1. Let A be symmetric of order n. For the most part our algorithms will be cast in terms of absolute tolerances relative to the norm of A.3. The computation of a selected set of eigenpairs of a large symmetric band matrix is the subject of this subsection. Reduce A to tridiagonal form T Find the selected eigenvalues of T Find the selected eigenvectors by the inverse power method applied to A Note that we compute the eigenvalues from T but the eigenvectors from A. .186 CHAPTER 3. This means that small eigenvalues may be determined inaccurately. 3. EIGENVALUES AND EIGENVECTORS OF BAND MATRICES We begin with a definition. In this case it turns out that there are better algorithms than the ones proposed here. the transformations will require too much storage. A special case of our algorithm is when the matrix A is already tridiagonal. Definition 23. For example. in many applications involving band matrices we are interested in only a few eigenpairs. Fortunately.

1 for tridiagonalizing a dense matrix to the full band matrix.3).SEC. There are many ways of doing this. One possibility is simply to apply Algorithm 1. The application of this rotation introduce new nonzeros outside the band in the (7. it is easy to see that even if one takes into account the zero elements in the band matrix.7)-elements. the band width grows as the reduction proceeds. zeros and all. a symmetric band matrix of band width 2p-fl has approximately (p+l}n nonzero elements on and above its diagonal. The asterisks denote unused elements. Here we consider another alternative that uses plane rotations and preserves the band size. More generally if B is the array in which the matrix is stored. When A is complex and Hermitian. since the corresponding upper and lower elements are not equal but are conjugates of one another. in practice we must come to grips with the fact that such storage is not economical. whereas the other stores the lower part. then The other scheme stores the elements according to the formula Thus one scheme stores the upper part of A. The first scheme stores a band matrix in a (p+l)xn array as illustrated below for a pentadiagonal matrix. l)-element (and by symmetry the (1.and (3. Instead such a matrix is usually stored in an array just large enough to accommodate its nonzero elements. so that eventually one is working with a full dense matrix. This step of the reduction is illustrated below: Vol II: Eigensystems . Tridiagonalization of a symmetric band matrix We will now show how to reduce a symmetric band matrix to tridiagonal form. If p <C n it would be a waste of memory to store it naturally in an array of order n. this is not just a rearrangement of the same numbers. We will illustrate it for a matrix of band width seven (p = 3). However. Specifically.4)-plane to annihilate the (4.4)-element) of A. The current standard is the LAPACK scheme — actually two schemes. A CLUTCH OF ALGORITHMS The storage of band matrices 187 Although we will present our algorithms for band matrices as if they were stored in full arrays. We begin by choosing a rotation in the (3. 2.

THE SYMMETRIC EIGENVALUE PROBLEM We now choose a rotation in the (6.and (1.2). These elements can be chased out of the matrix as illustrated below: .10)-elements.2)-elements. as illustrated below: Thus we have annihilated the (4.188 CHAPTER 3. However. when we eliminate this element. This places new nonzeros in the (10.4)-elements while preserving the band structure.1). A rotation in the (2. no new elements are generated.and (6.3)-plane annihilates this element.6). We are now ready to work on the (3.and (6.7)-plane to annihilate the elements outside the band. but introduces nonzeros in the (6. l)-element.

as opposed to Householder transformations. Since we must zero out p— 1 elements in the first row and column. The computed tridiagonal matrix is orthogonally similar to A + E. To get an operation count for this algorithm. and so on. where ||J?||2/||j4||2 is of the order of £M.SEC. A CLUTCH OF ALGORITHMS 189 Thus we have reduced the first row and column to tridiagonal form. Hence the total number of rotations for this algorithm is about Remembering that we only operate on the upper or lower part of the matrix. It is true that the reduction itself requires O(n3) operations as opposed to O(n 2 ) for the banded reduction. and the extra work in accumulating plane rotations. Inertia We now turn to the problem of computing some of the eigenvalues of a symmetric tridiagonal matrix. The actual code is tedious and not very informative. The key observation is that when an element is being chased out of the matrix it moves down the matrix in steps of length p. Similarly it requires (n-l)(p-l)/p rotations to process the second row and column.1. The basic tool is a method for counting how many eigenvalues lie Vol II: Eigensystems . An important implication of these counts is that if we can afford n2 units of storage to accumulate transformations we are better off using the standard Algorithm 1. The reduction now continues by reducing the second row and column. then the third row and column. since the reduction generates 0(n 2 ) rotations we cannot afford to save or accumulate the transformations when n is very large. we see that each rotation is applied to 2(p + 1) pairs of elements (here we count the adjustment of the 2 x 2 block on the diagonal as two applications). the total number of rotations is roughly n(p-l)/p. Thus the total number of annlications of rotations to nairs of elements is We can convert this to multiplications and additions as usual. (n—2)(p—l)/p to process the third. Hence when the first row and column are being processed. we will begin by counting plane rotations.8. and so on. in this case E does not share the banded structure of A. Thus the reduction is truly an 0(n2) process. Chapter 1). and hence the eigenvalues of A are moved by no more than quantities of order ||A||2£M (Theorem3. and we will not give it here. The algorithm is stable in the usual sense. each zero introduced requires roughly n/p rotations to chase the element outside the band to the end of the matrix. However. But it is symmetric. tips the scales against the band algorithm. This can be inferred from our example and easily confirmed in general. 2. On the other hand. But for both algorithms the accumulation of the transformations requires O(n3) operations.

we begin with a constructive theorem for determining inertia. The number of eigenvalues lying above the point A on the real line is the same as the number of eigenvalues of A . C is the number of zero eigenvalues of A.190 CHAPTER 3. Then Now let y be the space spanned by the vectors corresponding to the nonpositive eigenvalues of U^AU. we have the following theorem. we can show that the number of negative eigenvalues of A is the same as the number of negative eigenvalues of (7T AU and therefore that the number of zero eigenvalues is the same for both matrices. . and TT is the number of positive eigenvalues of A. transformations of the form U^-AU. it is much easier to compute than the eigenvalues themselves. Let X be the space spanned by the eigenvectors corresponding to the positive eigenvalues of A. Let A be symmetric and U be nonsingular. By hypothesis. Then It follows that X H Z — {0}. Thus to count the location of the eigenvalues of a symmetric matrix with respect to an arbitrary point.. The reason is that inertia is invariant under congruence transformations. This suggests the following definition. it is sufficient to be able to count their locations with respect to zero. Although inertia is defined by the eigenvalues of a matrix. We begin with the mathematical foundations of this method. Specifically. But the dimension of X is the number of positive eigenvalues of A and the dimension of Z is the number of nonpositive eigenvalues of UTAU. The proof is by contradiction. which are more general than similarity transformations.5 (Sylvester. and let Z = Uy. Then Proof. Let A be symmetric. Suppose that A has more positive eigenvalues than U^AU. Definition 2. Then inertia (A) is the triplet (f. THE SYMMETRIC EIGENVALUE PROBLEM to the left of a point on the real line. where v is the number of negative eigenvalues of A. the sum of these dimensions is greater than the order of A. TT).4. Theorem 2. Let A be symmetric. i. Jacobi). Hence they have a nonzero vector in common—a contradiction. (. Similarly.e.XI lying above zero. • Inertia and the LU decomposition Turning now to the computation of inertia.

2. Hence we can write A = LDu(DylU). Proof. But if A is tridiagonal. in principle..6. Unfortunately. then we can read off the inertia of A by looking at the diagonal elements of U. Then there is a unique unit lower triangular matrix L and a unique upper triangular matrix U such that Moreover if DU = diag(wn. we can compute the inertia of A by performing Gaussian elimination and looking at the diagonal elements of U (which. Let A be symmetric and suppose that the leading principle submatrices of A are nonsingular.w nn ). Gaussian elimination requires pivoting for stability.1. By symmetry (D^1 U)T(LDU) is an LU decomposition of A. and the usual pivoting procedures do not result in a U-factor that has anything to do with inertia. Now the LU decomposition of a matrix can be computed by Gaussian elimination (see Chapter 1:3). Let us see how.SEC. Since A is nonsingular. we will work with the matrix VolII: Eigensystems .(A) = inert!a(DU). A CLUTCH OF ALGORITHMS 191 Theorem 2. It is well known that if the leading principal submatrices of A are nonsingular then A has a unique factorization A = LU into a unit lower triangular matrix L and an upper triangular matrix U (Theorem 1:3. in fact. we can compute the diagonals of U in a stable manner. then'meTtia. D^U — £T. so that A — LDuL^. • • . The inertia of a tridiagonal matrix Consider a symmetric tridiagonal matrix T. • It follows that if we have an LU decomposition of A.3). ^22. Hence A and DU are congruent and have the same inertia. and hence by the uniqueness of LU decompositions. are just the pivot elements in the elimination). the diagonal elements of U are nonzero. written in the usual form: Since we will later be concerned with the inertia of T — XI as a function of A. Hence.

Here are some comments.(A) requires a division by w 4 -_i(A). LeftCount(d.192 CHAPTER 3.4: Counting left eigenvalues If we perform Gaussian elimination on this matrix we get the following recursion for the diagonals of the U-factor: 11. if (u < 0) LeftCount . The parameter umin > 0 is a minimum value on the absolute value of the current t*t-( A) defined by (2.LeftCount + 1 fi 15.20) and a constant A the function LeftCount returns a count of the eigenvalues less than A. u = d [ i ] . 12. In particular. 1.A 4. In applications where the count must be repeated for many different values of A. e. it can be shown that the the computed sequence Wj(A) is an exact sequence of a perturbed matrix T(A) whose elements satisfy . for i:. We will return to this point later. It turns out that we can replace ui-i(X) by a suitably small number.X + e[i-l]2/u 14. u = d[l] . = 2 to n 6.21). • The method is stable.4 counts the eigenvalues to the left of A. and we must specify what happens when ^^(A) is zero. it will pay to modify LeftCount to accept a precomputed array of the numbers t\. u = umin 9. end for i 16. Algorithm 2. if (u < 0) LeftCount = LeftCount + 1 fi 5. THE SYMMETRIC EIGENVALUE PROBLEM Given a tridiagonal matrix T in the form (2. A) 2. end LeftCount Algorithm 2. u = —umin 13. if (|w| < umin) 7. if(w>0) 8. • The algorithm requires 2n fladd n flmlt and n fldiv. The choice of such a number is a matter of some delicacy and depends on how the matrix is scaled. LeftCount — 0 3. end if end if The generation of w. else 10.

A CLUTCH OF ALGORITHMS and 193 This result implies that the count will be correct unless A is extremely close to an eigenvalue of T — so close that in the presence of rounding error the two can scarcely be distinguished. the difference between the current a and 6 is bounded by \b . If we rename this interval [a.SEC. We begin by using Gerschgorin's theorem to calculate a lower bound a on \i and an upper bound b of A n . c]. then A m < c. after k iterations. A m G [a. we can repeat the procedure until the interval width is less than a prescribed tolerance. • In replacing u with ±umin. 2. Here are some comments. we choose the sign so that it is consistent with our previous adjustment of LeftCount Calculation of a selected eigenvalue Suppose that the eigenvalues of T are ordered so that and we wish to compute the eigenvalue A m .a\ < 2. so that LeftCount(a) = 0 and LeftCount(b) — n.«o I jeps) iterations. Consequently. the size of the interval [a. In either case. b] halves at each step. 6]. One possibility is to make the tolerance relative to e. Second. • The choice of umin depends on the scaling of T and how accurately we wish to determine the count. Consequently. if k < m.-. 6]. if k > ra. First. Vol II: Eigensystems . not the LU decomposition as a whole. We can do so by combining LeftCount with a technique called interval bisection. Consequently. then Am > c. we have isolated A m in an interval that is half the width of the original.Q\. Set c — (a + 6)/2 and k = LeftCount(c).~k\bo — O. These results do not imply that unpivoted Gaussian elimination applied to a tridiagonal matrix is stable.5 uses this procedure to find A m . There are two cases. Algorithm 2. This procedure is called interval bisection. A m £ [c. The stability concerns only the w z . so that at the ith step we take This choice assumes that the matrix has been scaled so that ei/umin will not overflow. If we demand that this bound be less that eps. then the method must converge in about Iog2( |&o . where ao and bo denote the initial values. • Except perhaps at the end.

194 CHAPTER 3. 17. 8. 7. It uses the function LeftCount (Algorithm 2. d(i) + \e(i-l)\ + \e(i)\] end for i a = min{a. 19. e. 15. 20.4).d(n) + |e(n—1))} Find \m by interval bisection while (\b-a\ > eps) c = (a+b)/2 if (c = a or c = 6) return c fi k = LeftCount(d. 11. 16. e.20) and let the eigenvalues of T satisfy AI < A 2 < • • • < A n . 6.d(i] — e(i—l)\ — \e(i)\} b = max{6. if(fc<m) a—c else b= c end if end while return (a+6)/2 end FindLambda Algorithm 2.6 = d(l)+|c(l)| for i — 2 to n—1 a = imn{a. 1. 5.5: Finding a specified eigenvalue o . The quantity eps is a generic convergence criterion. 9. 10. 4. 12. c) 13. 14. Find upper and lower bounds on \m a = d(l)|e(l)|. The function FindLambda finds an approximation to \m. 18. m) 2.e/(n) — \e(n—1))} 6 = max{6. 3. THE SYMMETRIC EIGENVALUE PROBLEM Let the tridiagonal matrix T be represented in the form (2. FindLambda(d.

the secant method. Although LeftCount does not produce a function whose zero is the desired eigenvalue. First the computation of the Gerschgorin bounds is repeated with each call to FindLambda. then the eigenvector corresponding to \k will be sensitive to perturbations in A. the product of the w's it generates is the value of the characteristic polynomial at c. and in this case a choice like will result in low relative error for all nonzero eigenvalues. For definiteness we will assume that these eigenvalues are and that they are approximated by the quantities We will make no assumptions about how the fjtk were obtained. in the process of finding one eigenvalue. At this point it may pay to shift over to an interation that converges more quickly—e. Chapter 1]. and without statement 11 the algorithm would cycle indefinitely. when more than one eigenvalue is to be found.SEC. The computation of eigenvectors requires very accurate eigenvalues. special code adapted to the task at hand should be used. then eventually it will be isolated in the interval [a. but because we are going to use inverse iteration to compute their corresponding eigenvectors. It can also be used to find all the eigenvalues in an interval. we will assume that they are accurate to almost full working precision. Second. bounds that can be used to speed up the bisection process. The function FindLambda can obviously be used to find a sequence of several consecutive eigenvalues. one generates bounds on the other eigenvalues. The first thing we must decide is what we can actually compute. and we will not be able to calculate it accurately [see (3.. it is possible to get endpoints a and b that differ only in a single bit. Selected eigenvectors We now turn to the problem of computing the eigenvectors corresponding to a set of eigenvalues of a symmetric matrix A. But this is inefficient for two reasons. Therefore. • The exact choice of eps depends on the accuracy required by the user. If eps is too small. 6]. If the eigenvalue Afc is poorly separated from the other eigenvalues of A. • If the eigenvalue being sought is simple.g. A CLUTCH OF ALGORITHMS 195 • Statement 11 takes care of an interesting anomaly.28). In that case. c must be equal to one of a and 6. 2. Instead we will attempt to find an approximation Xk such that the normalized residual Vol II: Eigensystems .

note that (2.000. so that for some modest constant 7. which says that xk is the exact eigenvector of A+E. we can expect this component to be of the order l/^/n [see (1. where ||£||2/||A||2 is equal to the scaled residual norm. Specifically.3. let us start with a vector u normalized so that 11 u \ \ 2 = 1.196 CHAPTER 3.22) that Now suppose that ^ is a good relative approximation to A^. if we take u normalized so that \\u\\2 = 1 as a starting vector for the inverse power step and let v be the solution of the equation (A — }J. Chapter 2]. Then since |A^| < ||A||2.24) by Thus if n = 10. then Thus if the quantity ||w||2/||A||2||v||2is small enough. and consequently we can estimate the bound (2. For the implications of this result.25) is likely to be an overestimate since it ignores the contribution of the components of u along the n—1 remaining eigenvectors of A. There remains the question of whether the inverse power method is capable of producing a vector with a sufficiently small residual. then we can accept x^ — v/ \\v\\2 as a normalized eigenvector.23) that Thus the effectiveness of the inverse power method depends on the size of the component y^uofu along yk. we have It follows from (2. Let yk be the normalized eigenvector of A corresponding to \k. A nice feature of this criterion is that it is easy to test.kI)v — u. we have from (2. THE SYMMETRIC EIGENVALUE PROBLEM is small. However. Ifu is chosen at random. Chapter 2. The justification for this criterion may be found in Theorem 1. say near the rounding unit. Then if v — (A — jj. Once again. see the discussion following the theorem.kl)~lu. the estimate is only 100 times larger that 7€M.9). .

Example 2. Since this u has had its components associated with the eigenvalues near ^ enhanced. where Thus A has two very close eigenvalues near one and an isolated eigenvalue at two. it is easy to lose orthogonality when two of the \k are close. A little perturbation theory will help us understand what is going on here. In addition to having small scaled residual norms. we would like our computed eigenvectors to be orthogonal. when we compute the inner product z T #2. If. A matrix A was constructed by generating a random orthogonal matrix Q and setting A = <2AQT. Moreover. the vector is easily seen to have 2-norm one and to be orthogonal to x2. so that VolII: Eigensystems . there is a corresponding deterioration in the orthogonality between the two vectors. Chapter 1.7. However. By (3. we find that x^x2 = 7-10~18. The cure for this problem is to force x2 to be orthogonal to x\. A CLUTCH OF ALGORITHMS 197 If it turns out that the scaled residual is not satisfactorily small. we can always try again with u = v/\\v\\2. Since the errors in the eigenvectors are independent of one another. Specifically. so that x\ and x2 we orthogonal to working accuracy. we perform this orthogonalization.28). these perturbations will move the computed eigenvectors by about 10~8. we find that The vectors x\ and x2 have lost half the orthogonality we expect from eigenvectors of a symmetric matrix. which is extremely good. the computed vectors are exact vectors of A perturbed by matrices of order 10~~15.SEC. Let us approximate the first two eigenvalues by and perform inverse iteration with each Hk to approximate the corresponding eigenvectors. The eigenvalues AI and A 2 are very close—10~8 apart. the residual norm for x2 is 2-10~15. For safety one might allow additional iterations. if ll^i \\2 = 1. Unfortunately. we can expect this second iteration to do the job. Since the residual norms are of order 10~15. in our example. 2. The residual norms for the approximations xi and x2 are about 1-10~15 and 2-10~15. as the following example shows.

Since the details of the orthogonalization are tricky (for the full story see §1:4. with Xk to be determined. it satisfies where E is of the order of the rounding unit. THE SYMMETRIC EIGENVALUE PROBLEM the quality of x2 as an eigenvector has not been compromised by the orthogonalization process. the orthogonalization complicates our convergence criterion.198 CHAPTER 3.. Now the approximate eigenvector Xk nroduced bv esreorthne satisfies Hence It follows that Hence . Now suppose that we have computed v by the inverse power method with shift //& applied to normalized vector u. The orthogonalization process can be adapted to handle a sequence of close eigenvalues. suppose that we have a group ^k-g+i •>• • • . we leave the process to a routine that orthogonalizes a vector u against the columns of X returning the result in v (which can be identified with u). To see why. Specifically. Xk-i..1.4).. We will also assume that we have bounds on the residual norms of the or z . . If a stable method is used to calculate v. Mfc of <? approximate eigenvalues and have determined normalized approximate eigenvectors x^-g+i. As we pass through a cluster we use the Gram-Schmidt algorithm to orthogonalize each vector against the previously computed vectors in the cluster. the routine returns a normalized vector v that is orthogonal to the columns of X and satisfies Unfortunately. We divide the //& into clusters according to some tolerance.

The algorithm uses the routine gsreorthog to orthogonalize a vector against a set of vectors [see (2. f = k-g+ltok-l k-1 20. if (it > maxit) error fi 25. u . 21. 16. g = 1 if (fjtk <= /**-! + groupsep) g = g+1 else g . end for z 23. This algorithm returns in the columns of the array X approximations to the eigenvectors corresponding to the nk. 2. 7. rik=:'nk + (*7. if(Ar>l) 6. end for k Algorithm 2. 19. 17.fc-(jr-f !. 3.6: Computation of selected eigenvectors o Vol II: Eigensystems . 15. 13. 14.v/p end if rfk = \\u\\2 + ||v|| 2 *>/(n)*ll^ll<x>*€M 4.SEC. A CLUTCH OF ALGORITHMS 199 Let Pi < V2 < ' ' ' < Vm be a consecutive set of eigenvalues of a symmetric matrix A with relative accuracy approximating the rounding unit eM. // = it + 1 24. end while 26.1 fi end if it = 1 u — a vector of random normal deviates if (g > 1) gsr£0r?/i0g(J*C[:. 8. r. p) else u — u/\\u\\2 end if it = 1 while (frae) Solve the system (A—/*fc/)v = w if (p > 1) gsreorthog(X[:. v. w.k—l].&-!]. 18. w. 11. groupsep= ||A||oo*10~3 tol = 100*5gr/(n)*||A||00*€M maxit — 5. />) else p = ||v||2. 12. r. for k = 1 to ra 5.26)]. 10. w. 1. + lA*i-A«)bl)*l r i-Jb+ J | 22. 9. 2.k—g+l.

diagonal pivoting.200 CHAPTER 3. (The routine gsreorthog insures that the vectors within a group are orthogonal to working accuracy. It should be stressed that Algorithm 2. It will not work on graded matrices. • We have not specified how to solve the system in statement 15. Unfortunately. Thus the storage requirements for the algorithm are four times the amount required to represent the original symmetric matrix. There is a certain common sense to our choices. There are several comments to be made about it. It is orthogonalized against the other members of the group to purge it of components that will contribute to loss of orthogonality. the algorithm is not a reliable way of computing a complete set of eigenvectors of a small perturbation of the identity matrix. It is intended for use on balanced matrices eigenvalues and uses absolute criteria to determine when quantities can be ignored (e. with L of bandwidth p+l and U of bandwidth 2p+l. Even as it stands Algorithm 2. it can and does fail on groups of eigenvalues that are separated by up to 103cM. Thus the natural choice is Gaussian elimination with partial pivoting. destroys band structure.6 pulls all these threads together..2. and it is not clear that it can be modified to do so. The matrix A — ^1 will in general be indefinite. Moreover.27) to test for convergence. see statements 1 and 2). when to orthogonalize. the choice of a starting vector. we will perform a lot of unnecessary orthogonalizations. For more see §1:3.g. Limited experiments show that the bound tracks the actual residual norm with reasonable fidelity. the distance between successive groups can be small. . See the notes and references. THE SYMMETRIC EIGENVALUE PROBLEM is a bound on the residual norm for XkAlgorithm 2. If we use too liberal a grouping criterion. • The algorithm uses the residual bound (2. • The starting vector for the inverse power method is a random normal vector. This method does not respect symmetry. • An absolute criterion is used to group eigenvalues for orthogonalization (see statements 1 and 6).F°r example.g.6 can be expected to work only because we have aimed low. but the same can be said of other choices. and some form of pivoting is required for stability.. On the other hand. if the criterion for belonging to a group is too small. But the L and U factors it computes are banded.) • The while loop can be exited with a new vector or with an error return when the number of iterations exceeds maxit. resulting in loss of orthogonality between groups. the clustering criterion.6 involves a number of ad hoc decisions—e.4. which preserves symmetry. the convergence criterion.

who was the first to find eigenvalues of tridiagonal matrices by bisection.SEC. Gu and Eisenstat [101] propose a different divide-and-conquer algorithm in which the critical step is computing the spectral decomposition of an arrowhead matrix — that is. Instead he used closely VolII: Eigensystems . Nielsen. posthumous].4 to count eigenvalues. Bunch. Kalian. 1973].1978] formulated the algorithm treated here and noted that there were problems with maintaining orthogonality among the eigenvectors. in unpublished communications.1857. An implementation of this method can be found in the LAPACK code xLAEDl. Sorensen and Tang [247. For an analysis of methods for solving the secular equation see [164]. Inertia and bisection The fact that the inertia of a matrix is preserved by congruence was discovered by Sylvester [272. The stopping criterion in the secular equation solver insures that the procedure is backward stable.1991] showed that more was required and gave an algorithm that worked with simulated extended precision on all IEEE compliant machines. The divide-and-conquer algorithm The divide-and-conquer algorithm is due to Cuppen [49]. Finally. beginning with a survey of eigenvalue updating problems by Golub [87. NOTES AND REFERENCES 201 Updating the spectral decomposition This updating algorithm has a long history. For an error analysis see [13]. It was later modified by Dongarra and Sorensen [66]. 1954]. did not use Algorithm 2. A CLUTCH OF ALGORITHMS 2. Gu and Eisenstat [100. and Sorensen [30. Givens [83. a matrix of the form Reduction to tridiagonal form The technique of reducing a band matrix to tridiagonal form is due to Schwarz [235].4.1994] noted that by recomputing the Z{ so that the A z are exact eigenvalues of the corresponding matrix the orthogonality could be maintained in the eigenvectors without resorting to higher precision arithmetic. 2.1853] and independently by Jacobi [129. suggested that the algorithm can be made to work provided extended precision is used at certain critical points and noted that for IEEE standard arithmetic the extended precision arithmetic could be simulated. The algorithm has been implemented in the LAPACK routine SLAEDO.

For working code see the Handbook routine tristurm [305. they are prone to overflow and underflow. §2. given a current iterate A^ and a pivot element a\ '• . Eigenvectors by the inverse power method Algorithm 2. If we let r| be the sum of squares of the diagonal elements of Ak and <r| be the sum of squares of the off-diagonal elements. then and Hence. a rotation Rk in the (ik. If T — XI has the LU decomposition LU. n/18] or the LAPACK routine SSTEIN. Jacobi [127. j'^-plane is determined so that the (»*. not as a complete algorithm in itself. after which the eigenvectors could be found by an iteration based on a perturbation expansion. In other words.jk) so that o?^fc is maximal. . then the product pk( A) = wii • • • Ukk is the determinant of the leading principal minor of T — A/. after which it could be readily solved by what is now called the point Jacobi iteration [285.1]. Ak approaches a diagonal matrix. The thesis also contains a review and critique of previous attempts to use the inverse power method to compute eigenvectors and an extensive bibliography. then it is easy to show that r| -» \\A\\p. Jacobi's method Jacobi's method for the solution of the symmetric eigenvalue problem uses rotations to annihilate off-diagonal elements of the matrix in question. Clearly. Unfortunately.3. In his Ph. This program also saves information from previous counts to speed up the convergence of the bisection procedure.D. THE SYMMETRIC EIGENVALUE PROBLEM related Sturm sequences.4] for definitions and properties). Specifically. In 1846 [128].6 is included to illustrate the problems of implementing a power method. Thus.202 CHAPTER 3. Specifically. There are bisection codes in EISPACK and LAPACK. §3. Jacobi regarded the algorithm now named for him as a preprocessing step. jk)-element of Ak+i = R^A^Rk is zero.1845]fr"8*proposed this method as a technique for reducing the normal equations from least squares problems to diagonally dominant form. he applied the same idea to reduce the symmetric eigenproblem to a diagonally dominant one. The polynomials^(A) are called a Sturm sequence (see [122. The Handbook program bisect [15] appears to be the first to use the diagonal elements of U to count eigenvalues. the p/t(A) change sign when the u^ do. if we choose (ik. although the authors justified their use by an appeal to Sturm sequences. thesis Dhillon [65] has described an O(n2) to compute all the eigenvalues and eigenvectors of a tridiagonal matrix.

The QR algorithm itself is treated in §3. has limitations that render it unsuitable as a general-purpose routine. Vol II: Eigensystems .SEC. Consequently.4. the method was rediscovered by Goldstine. von Neumann. The situation with multiple eigenvalues and clusters of eigenvalues is considerably more complicated.3 we will begin our development of the QR algorithm by showing how to reduce a general matrix to bidiagonal form. Like other algorithms that proceed by rotations. it was proposed to sweep through the elements of the matrix in a systematic order.) But since then it has not fared well. after which the process is repeated with a smaller threshold.172]. The main purpose of this section is to describe this algorithm. as we shall see. It has great natural parallelism. In §3.4. This cyclic Jacobi method considerably complicates the convergence proofs. in which all off-diagonal elements greater than a given threshold are annihilated until there are none left above the threshold. 3. For more on these topics see [75. But no one can say for sure what the future holds for this bewitching algorithm.107. if the matrix has no multiple eigenvalues. a search for a maximal element is too expensive on a digital computer. BACKGROUND In this subsection we will present the background necessary for this chapter.298]. Much of this material has been treated in detail in §1:1. Jacobi's method was the algorithm of choice for dense symmetric matrix. Second. and many results will be presented without proof. First. and Murray in 1949 (also see [85]). This method.1. (For an elegant implementation by Rutishauser— always a name to be reckoned with — see the Handbook code [226]. THE SINGULAR VALUE DECOMPOSITION In this section we will treat methods for computing the singular values and vectors of a general nxp matrix X. though it may be the algorithm of choice in special cases. including the perturbation of singular values and vectors. An interesting alternative to the cyclic method is the threshold method [212]. We will begin with the mathematical background.2 we will show how the singular value decomposition can be reduced to the solution of a symmetric eigenvalue problem. Until it was superseded by the QR algorithm. For definiteness we will assume that n > p and that X is real. For more on this topic see [59. The singular value decomposition is closely allied to the spectral decomposition. there is an analogue of the QR algorithm to solve the singular value problem. THE SINGULAR VALUE DECOMPOSITION 203 According to Goldstine and Horowitz [84]. In §3.3 of this series. The subsequent literature was dominated by two aspects of the convergence of the method.171. In particular. well-determined eigenvalues—at least when the matrix is positive definite. 3. Jacobi's method works well with graded matrices and matrices with small.110. its asymptotic convergence is at least quadratic. and at one time it and its variants were considered candidates for systolic arrays — a class of computers that never realized their promise. 3.

Theorem 3.2) is conventional and will be assumed unless otherwise stated.From this it is easily established that: . where n > p. • The numbers <jj are called the singular values of X. • The columns of are called left and right singular vectors ofX. The nondecreasing order in (3. in which case maintaining the nxn matrix U can be burdensome. LetX£Rn*p.1. THE SYMMETRIC EIGENVALUE PROBLEM Existence and nomenclature The following theorem gives the particulars of the singular value decomposition.204 CHAPTER 3. \\X\\2 = \\Z\\z. Then there are orthogonal matrices U and V such that where with Here are some comments on this theorem. As an alternative we can set and write This form of the decomposition is sometimes called the singular value factorization. • Since the 2-norm is unitarily invariant. We will use the notation &i(X) to denote the ith singular value of X. They satisfy and Moreover. • It often happens that n > p.

Once the right singular vectors have been specified. THE SINGULAR VALUE DECOMPOSITION The 2-norm of a matrix is its largest singular value.17)]. The singular values themselves are unique. From the definition of the 2-norm it follows that 205 • Later it will be convenient to have a notation for the smallest singular value of X. Sometimes they cannot be ignored [see (3. the matrix XXT has n—p additional zero eigenvalues. and we will call them honorary singular values.SEC. 3. However. We will denote it by • Since the Frobenius norm is unitarily invariant. Uniqueness The singular value decomposition is essentially unique.1) and the fact that U is orthogonal. The right singular vectors are the eigenvectors of X^X. it follows that In other words: The square of the Frobenius norm of a matrix is the sum of squares of its singular values. but the space they span is. An analogous result holds for XX1-. The right singular vectors corresponding to a multiple singular vector are not unique. when n > p. it follows that This shows that: The squares of the singular values X are the eigenvalues of the CROSSPRODUCT MATRIX XTX. The right singular vectors corresponding to a simple singular value (one distinct from the others) are unique up to a factor of absolute value one. Relation to the spectral decomposition From (3. the left singular vectors corresponding to nonzero singular values are uniquely determined by the relation Xv{ = OiU{. VolII: Eigensystems .

206 CHAPTER 3. are perfectly conditioned. See Example 3. Theorem 3.9. Let <TI ^ 0 be a simple singular value of the nxp matrix X with normalized left and right singular vectors u\ and v\. like the eigenvalues of a symmetric matrix. Let the singular values ofX be ordered in descending order: Then and An immediate consequence of this characterization is the following perturbation theorem for singular values.5. (Here we have discarded the conventional downward ordering of the singular values. Theorem 33. First-order perturbation expansions The perturbation of singular vectors is best approached via first-order perturbation theory. However. this does not mean that perturbations of small singular values have high relative accuracy. Chapter 1. Chapter 1) has an analogue for singular values. Let the singular values ofX be arranged in descending order: Let the singular values ofX = X + E also be ordered in descending order Then Thus singular values.2. let us partition the matrices of left and right singular vectors in the forms .) Assuming that n > p. THE SYMMETRIC EIGENVALUE PROBLEM Characterization and perturbation of singular values Given the relation of the singular value and spectral decompositions it is not surprising that the min-max characterization for eigenvalues (Theorem 3.

. We will seek the singular value of UTXV in the form a\ + 0i. Then we can write where (p\ = u^Eui. where E is presumed small. Now let X — X + E. We will seek the corresponding left and right singular vectors of t/T^V in the forms (1 g^ p3 > ) T and(l /iJ)T. etc. 3. we get The second row of (3.7) we have Ignoring higher-order terms. eJ2 = u^EV2.SEC. as above. their product must eventually become negligible. THE SINGULAR VALUE DECOMPOSITION where U^ and Vi have p—l columns..7) gives us VolII: Eigensystems . The singular vectors corresponding to a\ in the matrix U^XV are both ei (of appropriate dimensions). Then 207 where £2 = diag(<j2. where 0i is assumed to go to zero along with E. and hi are also assumed to go to zero along with E.where the g-2. Now consider the relation From the first row of this relation. crp) contains the singular values of X other than a\. . and we can write From the third row of (3..gs. we have Since both f^2 and /i22 go to zero with E.

</3.10) can be readily solved for gi and hi to give the approximations and A simple singular value and its normalized singular vectors are differentiable functions of the elements of its matrix. and the products we have ignored in deriving our approximations are all 0(||^||2). and h^ are all 0(||E||). Theorem 3. . and vi be the corresponding quantities for the perturbed matrix X = X + E. Then and . Let a\. we have the following theorem. Hence the quantities Q\.4. <rp).But from the second row of the relation we get The approximations (3.ui.208 CHAPTER 3.9) and (3.. g-z.U?XV2 = diag(<7 2 . Let a\ > 0 be a simple singular value of X £ Rnxp (n > p) with normalized left and right singular vectors u\ and v\. Let the matrices of left and right singular vectors be partitioned in the form where Ui and V<2 have p— 1 columns and let S2 .. THE SYMMETRIC EIGENVALUE PROBLEM or This expression is not enough to determine gi and hi.. If we transform these approximations back to the original system.

5) and (3. the singular value of a scalar is its absolute value. We will begin with the right singular vector. Since we will be concerned with angles between vectors and orthogonal transformations do not change angles. and /2i. The condition of singular vectors We can use the results of Theorem 3.5.It follows that Vol II: Eigensystems .11) for h^.12. Chapter 1. |0i2|}. the sine of the angle between this vector and ei is Thus to get a bound on the angle between ei and the vector (3.We can rewrite this equation in the form Since a\ and a<2 are positive.6). • The expression (3. For example.12) can be written in the form 209 In this form it is seen to be an analogue of the Rayleigh quotient for an eigenvalue (Definition 1. For more on the behavior of zero singular values. Now the normalized perturbed right singular vector is By Lemma 3. THE SINGULAR VALUE DECOMPOSITION Here are two comments on this theorem. we may work with the transformed matrices in (3.13)—or equivalently between v\ and v\ — we must obtain a bound on ||/&2 \\2Consider the approximation (3. The problem is that singular values are not differentiable at zero. and <foi represent corresponding components of /*2. • The hypothesis that 0\ be nonzero is necessary. 3. /i2. and its absolute value is bounded by max{|</>2i|. and 02 is the corresponding diagonal element of £2.4 to derive a condition number for a singular vector.SEC. the fraction in this expression is a weighted average of <j>i2 and -<(>2i • Hence it must lie between fa and -<foi. see the notes and references. <£i2. Because £2 is diagonal we may write it out component by component: Here 77. Chapter 2).

if n > p. which we seek in the form (1 g£ g% ) .8).e. we must take into account g3 defined by (3.15). however. if we know V we . 3.4) we have seen that the eigenvalues of the cross-product matrix A = XTX are the squares of the singular values of X and that columns of the matrix V of the eigenvectors of A are the right singular vectors of X.15). Specifically: Let where 6 is defined by (3.2. Then Thus 6 1 (or S0 1 ) is a condition number for the singular vectors corresponding to the singular value a\. Moreover. The number 8 is the separation of o\ from its companions. We can do this by adjusting 6 to take into account the honorary zero singular values — i. the additional zero eigenvalues of XX^-.28). the quantity 11 h2 \ \ 2 is the sine of the angle between Vi and vi. Chapter 1]. Now up to terms of order 11E \ \ 2. Then Turning now to the left singular vectors. Hence: Let 6 be defined by (3.14). and it plays a role analogous to sep in the perturbation of eigenvectors of Hermitian matrices [see (3.21C CHAPTER 3.We can bound \\g-2\\2 as above. THE SYMMETRIC EIGENVALUE PROBLEM where Since U<2 and v\ are orthonormal. ||/12|| < \\E\\* Hence from (3.. THE CROSS-PRODUCT ALGORITHM In (3. we see that the sine of the angle between HI and u\ is essentially x/H^lli + ll^slli. Similarly.

E = AJ FormA = Jf T *X Compute the spectral decomposition A = VAVT Up = X*V Normalize the columns of Up Algorithm 3. Otherwise put. The formation of A is cheaper than the reduction to bidiagonal form that we will propose later—much cheaper if X is sparse. to expect any accuracy in crt. the perturbed singular value <7. this perturbation can change the singular values of X by as much as (?i€M. «t becomes larger and cr.SEC. Nonetheless.3. Inaccurate singular values The first problem is that we lose information about the singular values in passing from X to A. Let us first consider what happens to the singular values when we round X. Chapter 1].1: The cross-product algorithm o can recover the first p columns of U via the formula XV = UPE. By Theorem 3. Specifically. and the first p left singular vectors of X. 4. 3. is of the order e"1. is the ith singular value of-X t then the relative error in <j{ satisfies As <j{ gets smaller. 2. This amounts to replacing X by X — X + E. let us now assume that we compute A — X^-X exactly and then round it to get A (we could hardly do better than that VolII: Eigensystems . 3.has fewer and fewer accurate digits. which is cheaper than accumulating transformations. will be totally inaccurate. which for defmiteness we will call the cross-product algorithm. the right singular vectors. THE SINGULAR VALUE DECOMPOSITION 211 Let X G R n x p (n> p). where Here eM is the rounding unit for the machine in question [see (2. if <7. This algorithm computes the singular values. By the time K.we must have Turning now to the cross-product method.1. the algorithm has severe limitations. 5.. 1.3). The computation of Up involves only matrix multiplication. We already have algorithms to calculate the spectral decomposition. All this suggests Algorithm 3. and it is important to understand what they are. This algorithm has some attractive features.

when KP > eM 2 . it is seen that the errors increase as predicted by the analysis above. if €M = 10~16 then any singular value satisfying ftz > 108 will have no significant figures when represented as an eigenvalue of A—even though it may have as many as eight significant figures as a singular value of X.177076334615305e-08 Since €M = 10~16.000000000000072e-02 1. Thus we should not expect the cross-product algorithm to work well if the ratio of the largest to the smallest singular value of X is too large.999999970046386e-09 VA l. In particular we must have Ai/A.OOOOOOOOOOOOOOOe+00 1. . _i_ Although it is not illustrated by this example.000009260909809e-06 1. We now note that A 2 = of. a l.000000001992989e-04 1. Example 3.OOOOOOOOOOOOOOOe+00 l. We can show as above that the eigenvalues of A and A satisfy This bound has the same interpretation as (3.18).5. < e^1 to expect any accuracy in A.1). A matrix X was generated in the form X = UDVT. An example will illustrate the above considerations. For example. Consequently the condition Ai/A z < c"1 is equivalent to This says that the range of singular values that the cross-product algorithm can compute with any accuracy has been severely restricted.-.000000000014129e-06 9. THE SYMMETRIC EIGENVALUE PROBLEM in statement 1 of Algorithm 3.000000000000487e-04 1. where U and V are random orthogonal matrices and The singular values ofX and the square roots of the eigenvalues of A = X^X were computed with the following results.212 CHAPTER 3.OOOOOOOOOOOOOOle-02 1. so that the cross-product algorithm produces an imaginary singular value. it is possible for Xp to be negative.

Noting once again that A. even when the singular value itself has no accuracy.19) and (3. Specifically. 3. THE SINGULAR VALUE DECOMPOSITION Inaccurate right singular vectors 213 We can perform a similar analysis to show that the right singular vectors of X can be less accurately represented in A.19) and (3. suppose v\ = ap-\ while ap is small. consider the effects of rounding X on the left singular vector vt. that as ap-\ drifts away from a\. although the small singular value ap is ill determined. the vector vp is well determined by both algorithms. however. we see that the perturbation of Vi depends on the separation of O{ from its nearest neighbor—call it aj.20) are approximately Since K P _I = 1. we see that when K{ and KJ are large. Then with i = p and j = p— 1. It is a curious and useful fact that the singular vector corresponding to the smallest singular value can be quite accurate. we find that KP is much greater than K P _I. From (3. The common sense of these bounds is that the subpace corresponding to the large singular values is well determined and hence its orthogonal complement—the space spanned by vp — is also well determined. If Vi denotes the singular vector of the rounded matrix.20).20) has the smaller denominator. then Since \\E\\2 < <7ieM» we have A similar analysis for A shows that where V{ is the ith eigenvector of A. Specifically. so that right-hand sides of the bounds (3. the bound for the cross-product algorithm deteriorates more rapidly.SEC. Note. we can rewrite this bound in the form If we compare (3. VolII: Eigensystems . (3. = of.16). The singular vector in A is more sensitive to rounding errors than the same vector in X.

if V{ is computed from the cross-product algorithm.will be computed inaccurately if <rz is small. Hence Since Wi = <J{Ui. and assume that The cross-product algorithm computes the zth left singular vector by forming Wi = Xvi and normalizing it. We have not specified the size of e in our analysis because the technique for computing left singular vectors is sometimes used independently with accurate V{ (c = eM). as previously. Now an elementary rounding error analysis shows that the computed w. if £M = 10~16 and Ki = 106. Assessment of the algorithm The above analysis shows that the cross-product algorithm is not suitable as a generalpurpose algorithm. THE SYMMETRIC EIGENVALUE PROBLEM Inaccurate left singular vectors The left singular vectors computed by our algorithm will also be less accurate. For more see the notes and references. Perhaps more important. as we have seen. wz. For example. and the computed w. Even when KP is large. . will have no accuracy at all. the bounds from the analysis suggest when the algorithm can be used. For example. Hence the normwise relative error in w^ is Thus. Normalizing it will not restore lost accuracy.21) will be about 10~4. then e in (3. Now W{ = Xvi + Xei + Evi = W{ + X e{ + Ev{. the right singular vectors vp can. be very accurate. Let V{ be the ith right singular vector computed by the cross-product algorithm. —call it Wi — satisfies where Here 7 is a constant that depends on the dimensions of X.214 CHAPTER 3. we have \\Wi\\2 = &{. if KP is not too large. then the computed singular value ap may be satisfactory. However. e can be quite large and have a devastating effect. although the cause of the loss of accuracy is different.

we determine a Householder H transformation to introduce zeros in the hatted elements in the following Wilkinson diagram: Then H X has the form We next choose a Householder transformation K such that HXK has zeros in the hatted elements of (3. Like its counterparts for general and symmetric matrices. The reduction to bidiagonal form proceeds by alternately pre. which we will write as The reduction As usual let X be an nxp matrix with n > p. its efficient implementation requires that the matrix be reduced to a simpler form.23) to give a matrix of the form Vol II: Eigensystems .3. In this case the form is an upper bidiagonal form. For the first step.and postmultiplying X by Householder transformations. 3. THE SINGULAR VALUE DECOMPOSITION 215 3.SEC. REDUCTION TO BIDIAGONAL FORM The classical algorithm for computing the singular value decomposition is a variant of the QR algorithm.

:7i. 13.k].22). fork = Hop if (k < porn > p) housegen(X[k:n. 18. d.?A:-M:ri aT = 6T*F[Ar+l:p.fi Accumulate U V=( /' ) \Un-p. 23. V) Reduce X 2.k+l:p] X[k:n. 20. 28. THE SYMMETRIC EIGENVALUE PROBLEM Given an nxp matrix X with n > p. 1. BiReduce(X. k:p] = C/"[/^:n.k+l:p] = bT a = X[k+l:n. the array X contains the generators of the Householder transformations used in the reduction.a*6T end if end for k e\p-l] = X\p-l. 26. 29. A?:p] .k+l:p]*b X[k+l:n. k+l:p] .py 9.2: Reduction to bidiagonal form fork = p-2tolby-l . 1.^:p] ?7[jfc:ra. 6T. 5. Ar+l:p] . 12. 30. d[k}) X[k:n.a*6T end for A. 3. The transformations from the left are accumulated in the n xp matrix U. e[k]) X[k.fe+l:p]= V[A?+l:p. 8. and the transformations from the right in the pxp matrix V. k+l:p] = X[k+l:n. 6. k+l:p] = X[k:n. U. 22. 24. A?+l:p] F[fc+l:p. 21.6*aT end for k end BiReduce Algorithm 3.p] if(n = p)rf[p] = A'[p.ri. k+l:p] . 15.216 CHAPTER 3. a. On return. BiReduce reduces X to the bidiagonal form (3.Ar] 6T = aT*C/[^:n. it(k<p-l) for k — min{p. 4. 16. 11. n—1} to 1 by —1 a = X[A. 10. 17.k] = a b1 = aT*X[k:n. 27. k+I:p].a*6T end if housegen(X[k. e. 6T = A"[A. 19. Accumulate V V = Ip 25. 14.

THE QR ALGORITHM In this subsection we are going to show how the QR algorithm can be used to compute the singular values of a bidiagonal matrix B. • The algorithm is backward stable in the usual sense. performs the reduction to real form. Here are some comments. it is about |p3 flam. the row scalings making the di real and the column scalings making the e. Algorithm 3. we can save the transformations that are generated and apply them whenever we need to form a product of the form Ux. • If we require the effect of the full matrix U. • By accumulating only the first p columns of the transformations from the left. THE SINGULAR VALUE DECOMPOSITION 217 The reduction proceeds recursively on the matrix X[2:n. o 2np2 — p3 fl flam for the accumulation of U. This fact will prove important in §3. 2:p\. the tridiagonal form will also be complex. • The first column of the matrix V is ei. What distinguishes the bidiagonal singular value problem from the other problems we have considered is that small relative changes in the elements of B make only small relative changes in the singular Vol II: Eigensystems . • As in Algorithm 2. Thus if n > p the total work is about 4np2 flam. Chapter 2. It alternately scales the rows and columns of the bidiagonal matrix. and the operation count will be reduced if the rotations are real.2 requires 2np2 — -p3 flam for the reduction of X. it is convenient to divide the labor. the algorithm does not accumulate the transformations as they are generated. We can insure this by transforming the complex bidiagonal form into a real form.3). 3.4. the operation count rises to 2n2p—np2.4. 3. Algorithm 3.2. which is an analogue of Algorithm 1. When n >• p this part of the computation becomes overwhelmingly dominant. Instead it saves the generating vectors in X and accumulates the transformations after the reduction beginning with the last transformations. Algorithm 3. • In deriving an operation count for the algorithm. p3 flam flam for the accumulation of V. If we accumulate the entire U.2.2 implements this reduction.3. Our subsequent manipulations of this form will use plane rotations. If n = p. Real bidiagonal form When X is complex. real.SEC. the algorithm sets the stage for the subsequent computation of the singular value factorization (3. This saves some operations.

THE SYMMETRIC EIGENVALUE PROBLEM This algorithm takes the output of Algorithm 3.3: Complex bidiagonal to real bidiagonal form . Algorithm 3. The transformations are accumulated in U and V.218 CHAPTER 3.2 and transforms the bidiagonal matrix to real bidiagonal form.

3. This means that we must take care that our convergence and deflation procedures do not introduce large relative errors. 1. Unfortunately. Vol II: Eigensystems . Since B is bidiagonal. this procedure is a variant of the cross-product algorithm.6). in which case the problem deflates.<72/)Pi2 is zero. we could apply the tridiagonal QR algorithm to B^B to find the squares of the singular values of B. Chapter 2. Consequently. determine a plane rotation PI 2 such that the (1.2)-element of (C . Since the repeated application of this algorithm (with suitable shifts) will cause C P _I )P to converge to zero. which we have shown to have substantial drawbacks.SEC. Given a shift <r. THE SINGULAR VALUE DECOMPOSITION 219 values of B. we need the first column of C. 3. shows that C is the matrix that would have resulted from applying one step of the QR algorithm with shift cr2 to C. The procedure goes as follows. The solution provides us with the wherewithal to describe a back search in the spirit of Algorithm 3. Since these numbers are composed of products. we may expect to see e p _i also converge to zero. Since C = BTB = VTP?2BTBP12V = VTP?2CP12V. the reasoning following (3. this column is easily computed: Thus the cosine and sine of the initial rotation PI 2 are determined by the numbers d\ a2 and d\ei. Chapter 2. We then treat the problem of when the superdiagonal elements of B can be considered negligible. We begin with the basic QR step and then show how to compute the shift. we can mimic a QR step on BTB while working with B. some scaling should be done to prevent overflow and underflow. To implement this procedure. However. Throughout this subsection: B will be an upper bidiagonal matrix represented in the form The bidiagonal QR step The matrix C = B^B is easily seen to be symmetric and tridiagonal. Determine an orthogonal matrix U and an orthogonal matrix V whose first column is a multiple of ei such that B = U^BP^V is bidiagonal. 2.

. the fact that B is bidiagonal can be used to simplify the algorithm.1. at least when p is large.3)-element. We can continue in this manner until the bulge is chased out the bottom of the matrix. the matrix BPi2 has the form Note that BP bulges out of its bidiagonal form at the hatted (2. In practice the QR step will not be applied to the entire matrix.2) element. We can remove this bulge by premultiplying by a plane rotation U\i in the (1. Specifically: Algorithm 3.2)-plane to get the matrix The bulge has now moved to the (1. However. Here are some comments. THE SYMMETRIC EIGENVALUE PROBLEM We could use by Algorithm 3. • The accumulation of the transformations accounts for most of the work. Specifically. We can eliminate it by postmultiplying by a rotation ¥23 in the (2.3)-plane: The bulge is now at the (3.2 to reduce PuB to bidiagonal form. Algorithm 3.220 CHAPTER 3. The entire procedure is illustrated in Figure 3.4 implements a QR step between rows il and i'2. l)-element.4 requires The step is backward stable in the usual sense.

Algorithm 3. THE SINGULAR VALUE DECOMPOSITION 221 Given an upperbidiagonal matrix represented as in (3. The transformations are accumulated in U and V.4: Bidiagonal QR step Vol II: Eigensystems .SEC. BiQR performs a QR step between rows il and i2. 3.24) and a shift cr.

THE SYMMETRIC EIGENVALUE PROBLEM Figure 3.1: A bidiagonal QR step Computing the shift The shift a in Algorithm 3.222 CHAPTER 3. The following theorem gives an explicit formula for this singular value. then the smallest singular value is given by Proof.6. Theorem 3.4 is generally computed from the trailing principal submatrix B[il—l:ilj il—l:il]. Let Then the largest singular value ofT is given by the formula If T •£ 0. The eigenvalues of the matrix . A reasonable choice is to take a to be the smallest singular value of this matrix.

• In practice the formulas must be scaled and rearranged to avoid overflows and harmful underflows. Unfortunately.27). but they are themselves insensitive to small relative perturbations in the quantities p. • If at least one of p. and hence 0"max and cTmin are the largest and smallest singular values of T. and r are zero. The formula (3. are roots of (3. 3. • The formulas yield approximations to the singular values that have high relative accuracy. Negligible superdiagonal elements The formulas in Theorem 3. the following result can be used to tell when we can safely ignore superdiagonals. in testing convergence.26) follows from the fact that <rmax<7min = \pq\. This is a special case of an important fact about the singular values of bidiagonal matrices.SEC. the singular values of T can be computed by inspection. THE SINGULAR VALUE DECOMPOSITION 223 are the squares of the singular values of T. Small relative perturbations in the elements of a bidiagonal matrix cause only small relative perturbations in its singular values. q.6 not only produce singular values to high relative accuracy. Fortunately. Since this is an absolute change. and r. The characteristic equation of this matrix If we set then it is easily verified that Thus cr^ax and cr^. there is a real danger of compromising the relative accuracy of the singular values. we must decide when we can set a superdiagonal element to zero. • Here are three comments on this theorem. For more see the notes and references. Let the quantities TV be defined as follows. which is not greater than crmax. VolII: Eigensystems . g.

the sequence of rt. il = i'2). By (3.24).30) the parameter tol is approximately the relative error we can expect from setting the current superdiagonal to zero. Nonetheless. it works well on graded matrices. It generates the rt from Theorem 3. then e±mc = 1 ± me. . the real significance of the number r. then To see what (3.29) means. If the singular value in question is isolated and e is sufficiently small.224 It can be shown that CHAPTER 3.7 to get Algorithm 3. Theorem 3. Let B be a bidiagonal matrix in the form (3. A reasonable value is the rounding unit. provided the grading is downward.is restarted. then the relative error is bounded by e. Back searching We can combine (3.5. Some of these are discussed in the notes and references. take the exponential to get If me is small.e. is contained in the following theorem.. and for fixed j let B be the matrix obtained by setting ej to zero. Note that whenever the problem deflates (i. Let c?i and <j{ denote the singular values ofB and B arranged in descending order. Although no theorems can be proved. there are further embellishments that can improve the algorithm.7 (Demmel-Kahan).28) and the results of Theorem 3. which searches back for a block in which the problem deflates. Final comments The algorithmic pieces we have created in this and the previous section can be assembled to produce a reliable algorithm for computing the singular values of a general matrix. It follows that Thus the result of setting ej to zero can introduce a relative error of no greater than me in any singular value of B. Let and consider the intervals If<?i lies in the connected union ofm intervals Ik that are disjoint from the other intervals.7 as it proceeds backward through the matrix. THE SYMMETRIC EIGENVALUE PROBLEM However.

Algorithm 3. 3.SEC.24).5: Back searching a bidiagonal matrix VolII: Eigensystems . and an index i (1 < t < p) and determines indices /7. THE SINGULAR VALUE DECOMPOSITION 225 This algorithm takes a bidiagonal matrix B represented as in (3.12 < t such that one of the following conditions holds. a tolerance tol.

and multiply the left singular vectors intoQ. If instead. An alternative is to first compute the QR factorization of A. NOTES AND REFERENCES The singular value decomposition The singular value decomposition was established independently and almost simultaneously by Beltrami [19. Specifically. Algorithm 3. which requires 0 (p3) operations. we store the transformations themselves and retain the matrix W. the QR decomposition of X can be computed in Inp2 flam while the generation of U requires np2 flam.5. Missing proofs in §3.2). contributes only negligibly to the total operation count. Algorithm 3. Here are three comments. • The computations can be arranged so that Q and then U overwrites X. • When n > p. statement 2. When n > p. compute the singular value decomposition of R.6.6 implements this scheme. The price to be paid is that one must access that basis through subprograms that manipulate the Householder transformations.226 CHAPTER 3. where n > p.6: Hybrid QR-SVD algorithn 3. • The matrix Q will generally be computed as the product of the Householder transformations used to reduce X to triangular form (§1:4. Perturbation theory Parts of Theorem 3.1. however. where the expansion of the singular value is carried out to the second order. For ordinary singular values the second-order term makes no essential difference. Theorem 1. 3. For more historical material see [263] and §1:1. we save np2 flam and in the bargain have an implicit basis for both the columns space of X and its orthogonal complement. tion^Y = UXVH. which is 0(np2). but it shows that small singular . A HYBRID QRD-SVD ALGORITHM The algorithm sketched in the last subsection is reasonably efficient when n is not much greater than p. THE SYMMETRIC EIGENVALUE PROBLEM Let X G R nxp .1874].4. bringing the total operation count to 3np2 flam.4 were presented in [269.1 can be found in §1:1.6].4.5.4. one pays a high price for the accumulation of plane rotations in U.1873] and Jordan [137. This algorithm computes the singular value factoriz.

Owing to limitations of space we cannot do the paper full justice in this chapter. which is fine for matrices graded downward. a standard transformation is to subtract column means. much data is noisy enough that the errors introduced by passing to C are insignificant—at least when computations are performed in double precision. Whenever an analysis involves the eigenvalues and eigenvectors of C. Subsequent analyses are cast in terms of C. The derivation given here may be found in the Handbook contribution [91]. the computations can be arranged so that all the singular values retain high relative accuracy. • Bottom to top QR. method for choosing among these methVolII: Eigensystems . From this. This permits a cheap test for negligibility of the €i. normalize. Our implementation of the shifted bidiagonal QR algorithm proceeds from the top of the matrix to the bottom. although partial analyses and instructive examples may be found in most texts on numerical linear algebra— often with warnings never to form the cross-product matrix. • Zero shifted QR. Reduction to bidiagonal form and the QR algorithm The reduction to bidiagonal form is due to Golub and Kahan [88. it is possible to compute a lower bound g_ for the smallest singular value. Namely. and form a cross-product matrix C called the correlation matrix. If the current block extends from il to /2. If the shift in the QR algorithm is zero. given a data matrix X. Much of the material on the implementation of the QR algorithm for bidiagonal matrices is taken from an important paper by Demmel and Kahan [58].1965]. It is a curious fact that the QR algorithm for the singular values of bidiagonal matrices was first derived by Golub [86. where tol is the relative error required in the singular values. The cross-product algorithm The cross-product algorithm is widely and unknowingly used by statisticians and data analysts.| < tol-a. For matrices that are graded upward there is are variants of both the shifted and unshifted algorithm that proceed from bottom to top. This algorithm is used whenever the relative accuracy tol-g_ required of the smallest singular value is less than the error CTCM that can be expected from the shifted QR algorithm (here a is an estimate of the largest singular value). For example.SEC. fcin]"1!]^1. The authors present a forward recurrence that computes the quantity ||J9[fc:n. a natural. though not foolproof. The lesson of the analysis given here is that one should not condemn but discern. But here are some additional details. • Additional tests for negligibility. it amounts to an implicit use of the cross-product algorithm. The analysis of the cross-product algorithm appears to be new.is negligible if |e. 3. et. 1968] without reference to the QR algorithm. THE SINGULAR VALUE DECOMPOSITION 227 values tend to increase in size under perturbation. The derivation of the first-order bounds given here is new. Fortunately.

as well as references to previous work. j). The differential qd algorithm When only singular values are desired.228 CHAPTER 3. depending on what parts o: the singular value decomposition we have at hand. Dovvndating a singular value decomposition The singular value decomposition is useful in applications where X is a data matrb with each row representing an item of data. Gu and Eisenstat [102] give three algorithms for the downdating problem. As with the Jacob algorithm. An interesting problem is to update th( singular value decomposition after a row has been added or removed. A typical iteration step goes as follows.1955] works with square X. .. The two-sided method. Higham [115] has shown that row and columi pivoting in the computation of the QR decomposition improves the accuracy of th( singular values. j)-plane such that the (t. Their paper also contains a summary of other divide-and-conque algorithms with extensive references. The QRD-SVD algorithm The idea of saving work by computing a singular value decomposition from a QR de composition is due to Chan [34. Code based on these ideas may be found in the LAPACK routine xBDSQR. For some * < j.and (j. let V and V be plane rotations in the (i. due to Kogbetliantz [149. 35].e. Jacobi methods Jacobi methods (see p. Divide-and-conquer algorithms Gu and Eisenstat [101] give a divide-and-conquer algorithm for the bidiagonal singu lar value problem. i)-elements ol U^XV are zero—i. The latter prob lem is called downdating and comes in several varieties. 202) can be generalized in two ways to find the singular value decomposition of a matrix X. The algorithm has been implemented in the LAPACK routine SLASQ1. this transformation increases the sum of squares of the diagonal elements of X and decreases the sum of squares of the off-diagonal elements by the same amount. solve the singular value problem for the embedded 2x2 matrix applying the transformations to rows and columns % and j of X. Fernando and Parlett [73] have devised a nev variant of the shifted qd algorithm that calculates them efficiently to high relative ac curacy. THE SYMMETRIC EIGENVALUE PROBLEM ods is to use the top-down method if d[il] > d[i2] and otherwise use the bottom-up method.

2 we present Wilkinson's algorithm. where X is nonsingular. As usual in this chapter. We begin by developing the theory of the S/PD generalized eigenvalue problem. THE SYMMETRIC POSITIVE DEFINITE (S/PD) GENERALIZED EIGEN VALUE PROBLEM A generalized eigenvalue problem Ax = \Bx is said to be symmetric positive definite (S/PD) if A is symmetric and B is positive definite. all matrices are real. The one-sided method does not require X to be square. Mathematically it is a generalization of the symmetric eigenvalue problem and can. it is the best we have. In transforming S/PD pencils it is natural to use symmetry-preserving congruences.. 4.1. 4. The S/PD Eigenvalue Problem 229 An interesting and work-saving observation about this method is that if X is upper triangular and elements are eliminated rowwise. and the purpose of this subsection is to describe the problem and its solution. VolII: Eigensystems . It is easy to see that the rotation V is the same as the rotation that would be obtained by applying a Jacobi step to X^X with pivot indices i and j. be reduced to an equivalent symmetric eigenvalue problem.109. Although this reduction. i. then the result of a complete sweep is a lower triangular matrix.1. The problem arises naturally in many applications. 67. 4. The techniques that are used to analyze the Jacobi method generally carry over to the variants for the singular value decomposition. The following theorem shows that we can diagonalize A and B by such transformations. abbreviated S/PD.Sec. 203]. The slash in S/PD is there to stress that two matrices are involved—the symmetric matrix A and the positive definite matrix B (which is by definition symmetric). Then thepencil(A. The fundamental theorem We begin with a definition. 37.e. 108. including some modifications to handle the case where B is ill conditioned with respect to inversion. 75. XTBX). There is an analogous two-sided method in which rotations are also applied from the left. Definition 4. transformations of the form ( X ^ A X . Let A be symmetric and B be positive definite. For more see [8. THEORY Here we establish the basic theory of S/PD pencils. which is due to Wilkinson. In §4. B) is said to be SYMMETRIC POSITIVE DEFINITE. 107. does not yield an unconditionally stable algorithm. Choose two columns i and j of X and determine a rotation V such that the ith and j>th columns of XR are orthogonal. in fact.

B) and X{ is the corresponding eigenvector. Let be its spectral decomposition. Set Then by construction X'1AX is diagonal. • If we write AX = X~^D^ and denote the ith column of X by Xi. In this sense the S/PD eigenvalue problem is a generalization of the symmetric eigenvalue problem. There are four comments to be made about this theorem.230 CHAPTER 3. It follows that so that Aj = Cti/Pi is an eigenvalue of (A. the theorem says A can be diagonalized by an orthogonal transformation. the numbers fa are positive. The matrix Z~TAZ~l is symmetric. is also diagonal. Moreover. • Since X^ BX = /.B) has real eigenvalues and a complete set of jB-orthonormal eigenvectors. Bxi = [3iX~iei. then Similarly. In this language. But other normalizations are possible. When B — /. This implies that an S/PD pencil cannot have infinite eigenvalues. Thus. Z is also nonsingular. THE SYMMETRIC EIGENVALUE PROBLEM Theorem 4. Then there is a nonsingular matrix X such that and where Proof. • Whatever the normalization. it is natural to say that X is ^-orthogonal. Let Z be any matrix of order n satisfying For example Z could be the Cholesky factor of B. Let (A>B) be an S/PD pencil. we have proved that A can be diagonalized by a i?-orthogonal transformation. .2. our theorem states that the S/PD pencil (A. Since B is positive definite and hence nonsingular. • The proof of the theorem gives eigenvectors Xi normalized so that xjBxi = 1.

2. Let VolII: Eigensy stems . B) be an S/PD pencil and let be the Cholesky decomposition of B—i. Chapter 2. unlike the eigenvalues of a Hermitian matrix. as Example 3. R is upper triangular with positive diagonals (see §A.9. the eigenvalue 105 becomes |-105. This happens when both a. lf(A+E. see Example 4. Let a = x T Axand/3 = xTBx so that (a. consider the problem If the (2. for E and F sufficiently small there is an eigenvalue A of the perturbed pencil such that Thus (a2 + ft2) 2 is a condition number for the eigenvalue A in the chordal metric.Sec.e. Let (A. WILKINSON'S ALGORITHM In this subsection we consider an algorithm due Wilkinson for solving the S/PD eigenvalue problem. in the problem the eigenvalue 2 is generally insensitive to perturbations bounded by 10 . 4.13./?) is a projective representation of A.) On the other hand. B) with ||x||2 = 1. For example.2)-element of B is perturbed to 2-10~5. However. Chapter 2. Chapter 1. can be ill conditioned. while perturbations of the same size can move the eigenvalue 1 anywhere in the interval [0. then by Theorem 4.12. or) be a simple eigenvalue of the S/PD pencil (A.. (However. We begin with the algorithm itself and then consider modifications to improve its performance in difficult cases.5). well-conditioned large eigenvalues can also be ill determined in the relative sense. The bound (4. 4. For example.B+ F) is a perturbed pencil. the well-conditioned eigenvalues of an S/PD pencil are not necessarily determined to high relative accuracy. oo). The S/PD Eigenvalue Problem Condition of eigenvalues 231 Let (A.1) shows that the eigenvalues of an S/PD problem. and fti are small. shows. This is generally true of the small eigenvalues.2. The algorithm Wilkinson's algorithm is based on the proof of Theorem 4.

If A is not needed later. this algorithm produces a matrix X such that Algorithm 4. Algorithm 1:3.. Here are some comments on this algorithm.1. which is determined in part by C. • The exact operation count for the algorithm depends on how long the method used to compute the spectral decomposition of C takes. Algorithm 4. (1:2. B) of order ra. . Computation of R~T AR~l The computation of C = R~^ AR~l is tricky enough to deserve separate treatment. • Only the upper triangles of A and B need be stored. • The Cholesky factor R can be computed by a variant of Gaussian elimination— e. If B is not needed later. Then the matrix diagonalizes the pencil. its upper triangle may be overwritten byR. See page 159.7)]. The one given here proceeds by bordering.2) is used.1 implements this scheme.g.2. • The columns Xj of X may be formed from the columns Uj of U by solving the triangular systems The calculations can be arranged so that X overwrites U [cf. In fact. its upper triangle may be overwritten by the upper triangle of C.2.1: Solution of the S/PD generalized eigenvalue problem be the spectral decomposition of R T AR l. • We will treat the computation of C in a moment. THE SYMMETRIC EIGENVALUE PROBLEM Given an S/PD pencil (A. the operation count is 0(n3) with a modest order constant. If the QR algorithm (§1) or the divide-and-conquer algorithm (§2.232 CHAPTER 3. there is more than one algorithm.

..2 to overwrite A by C.3) may seem artificial. we will show how to compute c and 7. if ll^lbll-B" 1 !^ ls l^S6 (see §I:3-3)—then B~1A will be calculated inaccurately.. Algorithm 4.2) in the form Then by direct computation we find that and Thus we can compute 7 and c by the following algorithm. Three comments. This will enable us to solve our problem by successively computing the leading principal majors of R~T AR~l of order 1. A defect of this approach is that if B is ill conditioned with respect to inversion—i. consider the partitioned product 233 Assuming that we already know C = R~TAR 1. A partial cure—one which keeps the errors confined to the larger eigenvalues—is to arrange the calculations so that the final matrix C is Vol II: Eigensystems .e. For a scheme in which the bulk of the work is contained in a rank-one update see the LAPACK routine XSYGS2. Simply replace all references to C by references to A. The S/PD Eigenvalue Problem Specifically.2 implements this scheme.. but they are designed to allow Algorithm 4. • The bulk of the work in this algorithm is performed in computing the product of the current principal submatrix with a vector. n. since the large eigenvalues of (A. B) are ill determined.2. • The algorithm takes |n3 fla • The arrangement of the computations in (4. Ill-conditioned B Wilkinson's algorithm is effectively a way of working with B~l A while preserving symmetry. There is no complete cure for this problem. 4.Sec. We begin by rewriting (4..

B). One way to produce a graded structure is to compute the spectral decomposition B = VMVT.3. Algorithm 4. This produces a decomposition of the form where P i s a permutation matrix and R is graded downward. THE SYMMETRIC EIGENVALUE PROBLEM Given a symmetric matrix A and a nonsingular upper triangular matrix R. much more can be said about the eigenvalues and eigenvectors of S/PD pencils. and the corresponding eigenvectors are the columns of X = PR~lfU.5. As we have observed (p. A less expensive alternative is to use a pivoted Cholesky decomposition of B (see §A. then C will be graded downward.234 CHAPTER 3. then the diagonal elements of A are eigenvalues of (A. where the diagonal elements of M are ordered so that ^i < ^ < — • • • < Mn. 4. As might be expected by analogy with the Hermitian eigenproblem. 108ff) the reduction to tridiagonal form and the QR algorithm tend to perform well on this kind of problem.3). If we then compute the spectral decomposition C = t/AJ7T. . B).If we define C = M _ 1 V T AVM — 2 and compute its spectral decomposi2 tion C = £/A{7T. NOTES AND REFERENCES Perturbation theory We have derived the perturbation theory for S/PD pencils from the perturbation theory for general pencils. then the diagonal elements of A are eigenvalues of (A. If we set where f is the cross matrix.2: The computation of C = R~T AR~l graded downward. this algorithms computes C = R~TAR~l. and the corresponding eigenvectors are the columns of X = VM~^U.

Thus the eigenvalues of (A. just as in Algorithm 2. the algorithm potentially requires 0(n4} operations. To take advantage of the banded structure Wilkinson notes that Gaussian elimination on A — \B gives a count of the eigenvalues less than A. Code for both problems may be found in the Handbook [170]. Since each deflation step requires the computation of a spectral decomposition. §IV. that only a very few deflation steps are required. Wilkinson mentions.3]. where A is symmetric and B is positive definite. The LAPACK routinesxxxGST reduce the S/PD eigenproblem Ax — XBxto standard form. even when A and B are banded. B) can be found by interval bisections. He shows how to deflate these well-determined eigenpairs from the original pencil. B) is definite if A definite pencil can in principle be reduced to an S/PD pencil. Wilkinson discusses the problem of ill-conditioned B and suggests that using the spectral decomposition of B instead of its Cholesky decomposition to reduce the problem will tend to preserve the small eigenvalues.4. an algorithm for problems of the form ABx — Ax. where A is symmetric and B is positive definite. however. Band matrices Wilkinson's method results in a full symmetric eigenvalue problem. §§6872. Chandrasekaran [36] has noted that the large eigenvalues and corresponding eigenvectors of C (obtained from the spectral decomposition of B) are well determined. giving a smaller pencil containing the remaining eigenpairs. They also reduce the eigenproblem B Ax — Xx. 5]. although he does not mention the necessary sorting of the eigenvalues. Wilkinson's algorithm Wilkinson presented his algorithm in The Algebraic Eigenvalue Problem [300. The definite generalized eigenvalue problem A Hermitian pencil (A. It seems. Eigenvectors can then be computed by the inverse power method suitably generalized. The S/PD Eigenvalue Problem 235 This theory is surveyed in [269. Li and Mathias [163] have revisited this theory and. have given significantly improved bounds. consider the set Vol II: Eigensystems . Recently. starting from a different perspective. 4.Sec. Specifically. but does not derive. Ch.

UY. then f( A. Then the CS decomposition exhibits orthogonal matrices Ux. with W~l on the right-hand sides.] The above argument is nonconstructive in that it does not give a means for computing a suitable angle 0. by Van Loan [283.1976] and in the above form by Paige and Saunders [202.236 CHAPTER 3.4) ranges only over 7£n. B) does not contain the origin. For much the same reasons as given in §3. Then there are orthogonal matrices Ux and Uy and a nonsingular matrix W such that AND where EX and Sy are diagonal. G) is called the field of values of A + iB. and V.3. [It is a curious fact that the above argument does not work for n = 2 and real symmetric A and B when the vector x in (4.2. For more on definite pairs see [269. For an implementation see [43]. Each step requires a partial Cholesky factorization of the current B$.1981]. a computational approach to the generalized singular value decomposition through cross-product matrices is deprecated. VI. B) > 0. such that .258] or §1:1. THE SYMMETRIC EIGENVALUE PROBLEM The set f(A.1. It is easily seen that AND so that the generalized singular value decomposition is related to the generalized eigenvalue problem in the cross-product matrices just as the singular value decomposition is related to the eigenvalue problem for the cross-product matrix. However. which is equivalent to Be being positive definite. it can be rotated to lie in the upper half of the complex plane. Now if 7( A.n > p. The generalized singular value decomposition is related to the CS decomposition of an orthonormal matrix (see [202. Specifically. and it can be shown to be convex. If we define then the field of values TQ of Ag + Be is just the set jF rotated by the angle theta. §§VI. The generalized singular value and CS decompositions Let X £ E nxp and Y G R m X p .3]. For another approach see [40]. This generalized singular value decomposition was proposed. and the algorithm can be 0(n4). whereTO. Crawford and Moon [44] describe an iterative algorithm for computing a 9 such that BO is positive definite (provided. that the original pencil was definite).4.6). and since it is convex. of course. let be a QR factorization of (JfT yT)T.

7) and set J£T = V^Ry the result is the generalized singular value decomposition (4.5). The S/PD Eigenvalue Problem 237 where EX and Sy are diagonal matrices with S^ + Sy = /. whence the name CS decomposition.Sec. (This last relation implies that the diagonal elements of E^ and Sy can be interpreted as cosines and sines. Algorithms for computing the CS decomposition and hence the singular value decomposition have been proposed by Stewart [259.) If we combine (4.6) and (4. Paige [200] has given a method that works directly through X and Y. 4. The LAPACK routine xGGSVD uses Van Loan's approach. 260] and Van Loan [284]. Vollt: Eigensystems .

This page intentionally left blank .

most algorithms for large problems proceed by generating a sequence of subspaces that contain increasingly accurate approximations to the desired eigenvectors. It follows that if the matrix is large enough we cannot afford to store its entire eigensystem. Consequently. this chapter is devoted to the basic facts about eigenspaces and their approximations. The space spanned by a set of eigenvectors is an example of an eigenspace. This problem has two aspects. We now shift focus to problems that are so large that storage limitations become a problem. and it turns out that algorithms for large matrices are best explained in terms of eigenspaces. like eigenvectors. Because eigenspaces. With appropriate data structures we can get away with storing only the latter. It will often help to think ofn as being very large. cannot be computed directly. First. if the order of the matrix in question is sufficiently large. in consequence. Many large matrices arising in applications are sparse. and. and in §2 we present the analytic theory—residual analysis and perturbation bounds. Unfortunately. the eigenvectors of a sparse matrix are in general not sparse. most algorithms for large sparse eigenvalue problems confine themselves to matrix-vector multiplications or perhaps the computation of an LU factorization. we will not be able to store all its elements. This limitation is not as burdensome as it might appear. We begin with the basic algebraic theory of eigenspaces. 239 . our chief concern is with large eigenproblems. Therefore: Unless otherwise specified A will denote a matrix of order n. these data structures restrict the operations we can perform on the matrix.4 EIGENSPACES AND THEIR APPROXIMATION The first half of this book has been concerned with dense matrices whose elements can be stored in the main memory of a computer. their zero elements greatly outnumber their nonzero elements. that is. Instead we must be satisfied with computing some subset of its eigenpairs. Although all of the results of this chapter apply to matrices of any size. In §3 we treat one of the most popular ways of generating subspaces—the Krylov sequence—and in §4 we analyze the RayleighRitz method for extracting individual eigenvectors from a subspace. Second.

A less trivial example occurs in the proof of the existence of real Schur decompositions (Theorem 3. while the expression (1. The information contained in the set of complex eigenpairs (A. x} and (A. In §1.2 we will consider simple eigenspaces—the natural generalization of a simple eigenvector. EIGENSPACES 1. Eigenbases The definition of eigenspace was cast in terms of a subspace of C n . In computational practice we will always have to work with a basis. then for some real 2x2 matrix L. We have already noted that these invariant subspaces—or eigenspaces as we will call them — are useful in the derivation and analysis of algorithms for large eigenvalue problems. Eigenspaces can be used to bring together combinations of eigenvalues and eigenvectors that would be difficult to treat separately. This construction illustrates the utility of eigenspaces.240 CHAPTER 4. Let A be of order n and let X be a subspace ofCn.1. Moreover. For example. The following theorem summarizes the properties of a basis for an eigenspace.y + iz] is a complex eigenpair of a real matrix A. Chapter 2). There we showed that if (A. In the first subsection we will present the basic definitions and properties.1) concerns a specific basis for the subspace in question. a matrix that does not have a complete system of eigenvectors may instead have a complete system of eigenspaces of low dimension corresponding to groups of defective eigenvalues. Then X is an EIGENSPACE Or INVARIANT SUBSPACE of A if The space spanned by an eigenvector is a trivial but important example of an eigenspace.1. 1. They are also important because one can use them to isolate ill-behaved groups of eigenvalues. The notion of an eigenspace is a generalization of this observation. DEFINITIONS We have already observed (p. Since the column space of (y z]L is a subset of the column space of (y z).1. . which consists entirely of real quantities. x) is captured more economically by the relation (1. the eigenvalues of L are A and A. 2) that the subspace spanned by an eigenvector is invariant under multiplication by its matrix. EIGENSPACES In this subsection we are going to develop the theory of subspaces that remain invariant when acted on by a matrix. Definition 1.1). or) = (\. the space X — 7l[(y z)} is an eigenspace of A.

e. In particular L has a complete set of eigenvectors if and only if the eigenvectors of A lying in X span X. for some unique vector t{. Hence and the relation Lu = Xu follows on multiplying by X1.e. then so that (A. u = Xlx. for the important case of a simple eigenspace (Definition 1. 1. u] is an eigenpairofL. #) be an eigenpair of A with x G X. Vol II: Eigensystems . the relation between the eigensystem of A and that of L can be very complicated.2. as required. As we will see. Xu) is an eigenpair of A. If L does not have a complete system of eigenvectors. however. Xlx) is an eigenpairofL.SEC. Let X be an eigenspace of A and letX be a basis forX. Xu) is an eigenpairofA. If (A. i. Then there is a unique matrix L such that The matrix L is given by where X1 is a left inverse ofX —i. then (A. the relation is straightforward. then (A. Conversely. Since y\ 6 X and X is a basis for X. Conversely. Then there is a unique vector u such that x — Xu. If X1 is a left inverse of X. However. The existence of a unique representation L associated with a basis X for an eigenspace X justifies the following extension of the notion of eigenpair..7). If we set L — (i\ • • • ^). • We say that the matrix L is the representation of A on the eigenspace X with respect to the basis X. x) is an eigenpairofA with x £ X. then AX = XL. if (A. yi can be expressed uniquely as a linear combination of the columns of X. The theorem shows that L is independent of the choice of left inverse in the formula L = X1AX. if Lu = Xu. Let be partitioned by columns. ElGENSPACES 241 Theorem 1. Proof. then Now let (A. a matrix satisfying X1X — I.. The theorem also shows that there is a one-one correspondence between eigenvectors lying in X and eigenvectors of L.

. Its proof.. it it natural to ask if any set of eigenvalues has a corresponding eigenspace.242 CHAPTER 4. is left as an exercise. Theorem 1. Note that if (L. IfU is nonsingular. Consequently. . then y = T^(Y] is a left eigenspace of A in the sense that The following theorem shows how eigenpairs transform under change of basis and similarities. . Then there is an eigenspace X of A whose eigenvalues are AI.4. A^. then (L.X) is an EIGENPAIR OF ORDER & Of RIGHT EIGENPAIR OF ORDER k of A if We call the matrix X an EIGENBASIS and the matrix L an EIGENBLOCK. . which is simple and instructive. If X is orthonormal.. we say that (L. Let (L.5. we can speak of the eigenvalues of A associated with an eigenspace. then the pair (U~1LU.3. X) is orthonormal. An important consequence of this theorem is that the eigenvalues of the representation of an eigenspace with respect to a basis are independent of the choice of the basis. we say that the eigenpair (L. we say that (L. Y) is a LEFT EIGENPAIR of A.X) be an eigenpair of A. ForX € Cnxk andL € Ck*k. EIGENSPACES Definition 1.A) be a multisubset of the eigenvalues of A.W~1X) is an eigenpair of"W~l AW. Proof Let be a partitioned Schur decomposition of A in which TH is of order k and has the members of C on its diagonal. Computing the first column of this relation. LetC = { A i . . Existence of eigenspaces Given that every eigenspace has a unique set of eigenvalues. The following theorem shows that indeed it has. IfW is nonsingular. . A^} C A(.XU) is also an eigenpair of A. V) is a left eigenpair of A. IfY G Cnxk has linearly independent columns and YHA = LYH. we get . Theorem 1. Let A beofordern.

Vol II: Eigensystems . Let(i. • The similarity transformation (1.2 is not the most general way to deflate an eigenspace. it is preferable to use unitary transformations to deflate an eigenspace. We call y the COMPLEMENTARY LEFT EIGENSPACE OF X. then for some matrices L.1. M. which. • The construction of Theorem 1. Deflation In establishing the existence of the real Schur form (Theorem 3. For numerical reasons.6. however. Y) is a left eigenpair of A. and H. See the discussion on page 10. Otherwise put: If X is an eigenspace of A and y is the orthogonal complement of X.2) of A deflates the problem in the usual sense. The spectrum of A is the union of spectra of L and M. can be treated as separate eigenvalue problems.2) implies that Thus (M. Specifically. X) bean orthonormaleigenpairofA andlet(X Y) be unitary. then y is a left eigenspace of A corresponding to the eigenvalues not associated with X. the members of C.SEC. Chapter 2) we essentially used the eigenspace corresponding to a pair of complex conjugate eigenvalues to deflate the matrix in question.e. ElGENSPACES 243 Hence the column space of U\ is an eigenspace of A whose eigenvalues are the diagonal elements of the triangular matrix TH —i. This is a special case of a general technique. 1. We will return to this point in the next subsection. if X is any basis for an eigenspace of A and (X Y)is nonsingular.. if A has multiple eigenvalues. for computational purposes. The proof of the following theorem is left as an exercise. • The relation (1. the eigenspaces of A are not as easily described. Then where Here are three comments on this theorem. Unfortunately. • Thus a matrix A of order n with distinct eigenvalues has 2n distinct eigenspaces — one corresponding to each subset of A( A). Theorem 1.

EIGENSPACES 1. Let X bean eigenspace of A with eigenvalues C. The problem is not as simple as the existence of left eigenvectors. Computationally. then it has two linearly independent eigenvectors. Hence we make the following definition. and any nonzero vector in the plane spanned by these two vectors spans a one-dimensional eigenspace corresponding to A.2).244 CHAPTER 4. then there is an eigenspace X of A whose eigenvalues are the members of L. The above example suggests that the nonuniqueness in eigenspaces results from it not having all the copies of a multiple eigenvalue.2.8. Then X is a SIMPLE EIGENSPACE of A if In other words.7. counting multiplicities. an eigenspace is simple if its eigenvalues are. Chapter 1. Theorem 1. since we cannot use determinantal arguments as in the second comment following Theorem 1. Block diagonalization and spectral representations We turn now to the existence of left eigenspaces corresponding to a right eigenspace.5) that if £ is a subset of the spectrum of A.Y<2) be unitary so that (Theorem 1. if A is a double eigenvalue of A that is not defective.X\) be a simple orthonormal eigenpair of A and let(Xi. We will approach the problem by way of block diagonalization. it is impossible to pin down an object that is not unique. Let (L\. SIMPLE EIGENSPACES Definition We have seen (Theorem 1. For example.2. Then there is a matrix Q satisfying the Sylvester equation such that if we set where then . disjoint from the other eigenvalues of A. Definition 1. Unfortunately. this eigen space need not be unique.

which gives the formula for X^ in (1. Likewise (Li. Beginning with (1.12). Then X has a corresponding left eigenspace and a complementary right eigenspace. YI by Yi5~ H . and LI by S~lLiS. in which the first Vol II: Eigensystems . In establishing the spectral representation we assumed that Xi and Y% were orthonormal. to reduce A to block diagonal form.9.4) and (1.3). apply Theorem 1. YI ) is a left eigenpair of A.5) result from accumulating the transformation in the successive reductions to block triangular form and block diagonal form. • Here are some observations on this theorem. For example. The orthogonal complement ofX is the complementary left eigenspace. If. for example. However.5). The eigenspace 7Z(X2) is complementary to the eigenspace 1Z(Xi) in C n .SEC.6) we can write We will call this formula a spectral representation of A. • From (1. an obvious induction shows that if we have k simple eigenspaces then we can reduce v4.6).5). • If we write the second equation in (1. 1.7) is a spectral representation of A is a convenient shorthand for (1.18. Combining these facts with the observation that the orthogonal complement of 7l(X\) is the complementary left eigenspace of A. EIGENSPACES 245 Proof.to(fc-fl)x(fc+l) block diagonal form. The fact that the eigenbasis Y\ for the left eigenspace corresponding to Xi is normalized so that Y^Xi = I generalizes this result. we have the following corollary. The eigenbases in a spectral representation are not unique.8 uses a single simple eigenspace to reduce a matrix to 2x2 block diagonal form. and we call it the complementary right eigenspace. Saying that (1. Chapter 1]. so that 7£( YI) is the left eigenspace corresponding to Tl(Xi). Corollary 1. x) is a simple eigenpair of A. then there is a left eigenvector y corresponding to x. • If (A.X<2) is an eigenpair of A.6) in the form we see that AX? = X^L^ so that (L<2. • Theorem 1. Chapter 1. which can be normalized so that yHx — I [see (3. then we may replace X\ by XiS. S is nonsingular. We say that the bases Xi and YI are biorthogonal. We will call any such spectral representation a standard representation. We will return to this point later when we consider canonical angles between subspaces (see Corollary 2. Let X be a simple eigenspace of A. The formulas (1.

there is a corresponding spectral representation of the form Uniqueness of simple eigenspaces We turn now to the question of whether a simple eigenspace is uniquely determined by its eigenvalues. . M). Now let X be a second eigenspace corresponding to the eigenvalue zero. M). Let X be a simple eigenspace corresponding to a single eigenvalue. we get Since M is nonsingular. which without loss of generality we may assume to be zero.2)-block of (1. F2i = 0.246 CHAPTER 4. Since diag(Z/. Since M cannot have zero eigenvalues.2)-block of this relation. Since L has only zero eigenvalues it is nilpotent (Theorem 1. Then Computing the (2. If the latter are unique. M). EIGENSPACES k blocks are eigenblocks of the eigenspaces and the last eigenblock is the eigenblock of the complement of their union. Thus X is block diagonal. they are similar to each other. Similarly Yi2 .13. For if a simple eigenspace has more than one eigenvalue. It will be sufficient to prove the uniqueness for a simple eigenspace corresponding to a single eigenvalue. Then we can diagonalize A as in Theorem 1. M )X — diag(i. M) are both similar to A. it can be decomposed by diagonalization into eigenspaces corresponding to its individual eigenvalues. Let Y = X~l.7). M) and diag(Z/. and the spaces X and X are the same.8) to get Since MkXM is nonsingular. Chapter 1). the result is not easy to prove. so are X^ and ¥22 • Now compute the (1. then so is the original space. where again L is nilpotent and M is nonsingular. Although it seems obvious that it is.0. it is nonsingular.8 to get the matrix diag(Z. We can then use this eigenspace to diagonalize A to get diag(Z/. Let Jf~1diag(Z. In analogy with (1. and write the similarity transformation in the partitioned form Let k be an integer such that Lk = Lk — 0.

The generalization of the notion of eigenpair as well as the terms "eigenbasis" and "eigenblock" (Definition 1. it is easy to verify that an equation that corresponds to our reduction to block diagonal form.SEC. An alternative approach is to use the resolvent defined by For example. such spaces are usually called invariant subspaces. For more on spectral projections see [146].. it can be shown that the spectral projection corresponding to a set £ of eigenvalues is given by the complex integral where F£ is a closed path in the complex plane that contains £ and no other eigenvalues. It is easily verified that Pi is idempotent—i. For a systematic development of this approach see [146]. Spectral projectors can be used to decompose a matrix without transforming it. which collapses any subspace into {0}. Spectral projections From the spectral representation we can form the matrices Pi = XiY^. where there is no one matrix to operate on the subspace.3. Such matrices project a vector onto its column space along its null space. Vol II: Eigensystems . The resolvent Our approach to eigenspaces through block triangularization and diagonalization reflects the way we compute with eigenspaces. Pf = Pi. and for that reason the Pi are called spectral projectors. Uniqueness of simple eigenspaces That a simple eigenspace is determined by its eigenvalues seems to be taken for granted in the literature on matrix computations. 2. The proof given here is new. For example. PERTURBATION THEORY 1. NOTES AND REFERENCES Eigenspaces 247 Although the term "eigenspace" can be found in Kato's survey of perturbation theory [146]. A trivial reason for preferring "eigenspace" over "invariant subspace" is that the latter need not be truly invariant under multiplication by A — an extreme example is A = 0.3) are new.e. A more substantial reason is that we can use the same terminology for generalized eigenvalue problems.

Hence C = YHX = WUCV and C are unitarily equivalent and have the same singular values.are the singular values of YUX. In §2.1.1. Definition 2. To avoid certain degenerate cases. where V = XHX and W = YEY are unitary. however.5. EIGENSPACES 2. We write Oi(X. For a more complete development of this subject see §1:4. Let X and Y be orthonormal bases for X and y.. Specifically. Definitions Let X and y be subspaces of the same dimension. These angles are independent of the choice of bases for X and y.H2/|/(||z||2||t/2||) [see(3. . because of the important role played by residuals in algorithms for large problems. In this subsection we extend this definition to subspaces in C n .3 we use these residual bounds to derive a perturbation theory for simple eigenpairs. Much of this theory parallels the perturbation theory in §3. Y) for 6(*. We have Hence all the singular values of C lie in the interval [0. Chapter 1]. Then the CANONICAL ANGLES between X and y are the numbers where 7. Let X and y be subspaces ofCn of dimension p and letX and Y be orthonormal bases forX and y. we can write X = XV and Y = YW. we will first present the analysis of residuals and then go onto perturbation theory proper.1] and can be regarded as cosines of angles. CANONICAL ANGLES We have defined the angle between two nonzero vectors as the angle whose cosine is |a. we will restrict the dimensions of our subspaces to be not greater than ^. we begin with preliminary material on the canonical angles between subspaces. We then turn to the analysis of residuals. For if X and y are other bases.248 CHAPTER 4.11).4 we present some residual bounds for Hermitian matrices. PERTURBATION THEORY In this subsection we will treat the perturbation of eigenspaces and their associated eigenblocks. in §2. 2. This justifies the following definition. and consider the matrix C = Y^X. Chapter 1. y). y) for the ith canonical angle in descending order and define We will also write Q(X. Finally.

2. • In this chapter we will use Theorem 2. if the canonical angle is small. Since the eigenvalues of CHC are the squares of the singular values of C.1. and we will conclude that 9 = 0. Specifically. cos 9 = 1-|02. Chapter 1. The statement that X is near y makes sense only when dim(A') < dim(J^). 2. Let Since this matrix is orthonormal. For otherwise there would be a vector # in A" that is orthogonal to every vector in y and the Vol II: Eigensystems . see the notes and references. this procedure will give inaccurate results. and let Y_\_ be an orthonormal basis for the orthogonal complement ofy. PERTURBATION THEORY 249 Geometrically. Computing canonical angles In principle we could compute canonical angles from the characterization in Definition 2.2 as a mathematical device for bounding canonical angles But it also provides the means to compute the sines of canonical angles numerically. Hence the singular values of S — Y±X are the sines of the canonical angles between X and y. However. then cos 9 will evaluate to 1 in IEEE double-precision arithmetic.SEC. Then the singular values of Y^X are the sines of the canonical angles between X and y. But It follows that E2 is diagonal and its diagonal elements are 1 . For details. Proof. The largest such angle 9(x) is the largest canonical angle. for small 6. Subspaces of unequal dimensions We will occasionally need to compare subspaces X and y of differing dimensions. The following theorem. Theorem 2. the largest canonical angle has the following characterization: Thus for each x in X we look for the vector y in y that makes the smallest angle 9(x) with x. Let X and Y be orthonormal bases for X and y. Let VH(CHC)V = T2 be the spectral decomposition of CHC. A cure for this problem is to compute the sine of the canonical angles.7?—the squares of the sines of the canonical angles. provides the wherewithal. which is a generalization of Lemma 3. the diagonal elements of F are the cosines of the canonical angles between X and y.12. IfO < 10~8.

8 we expressed the left eigenspace corresponding to a simple eigenspace in the form X + YQ.3. These canonical angles can be calculated as in Theorem 2. The squares of the nonzero singular values of Q are the nonsingular eigenvalues of Q^Q. Let (Y Y±) be unitary with 7£(F) = y. Theorem 2. the number of canonical angles will be equal to the dimension of X. also arranged in descending order. and the norm at the minimum is \\x_i ||2. and the squares of the nonzero singular values of Z are the nonzero eigenvalues of . which. The following theorem does the job. Then Proof. EIGENSPACES angle between that vector and y could only be ^. Hence This norm is clearly minimized when y — x.4. Let X and Y be ortho-normal matrices with XHY = 0 and let Z = X + Y Q. When dirn(A') < dim(3. where we will have to know the relations between the singular values of Q and X -f YQ. The following theorem contains a useful characterization of the the angle between a vector and a subspace. LetOi denote the nonzero canonical angles between 7Z(X) andlZ(Z). is sin Z(z. • Combinations of subspaces In Theorem 1. Theorem 2. we can define canonical angles as in Definition 2. In this case.2. Let y^y and consider the transformed vector In this new coordinate system we must have y± — 0. Let x be a vector with \\x\\2 — 1 and let y be a subspace.). by Theorem 2.1. Then and Proof. Let o~i and ^ denote the nonzero singular values ofZ and Q arranged in descending order.250 CHAPTER 4.2. We will meet such expressions later. y).

In practice. we will develop residual bounds for the error in (Z. Vol II: Eigensystems . Optimal residuals Most of our algorithms will furnish an approximate eigenbasis X but will allow us the freedom to choose the eigenblock. where Lk is suitably chosen. It therefore seems reasonable to stop when Rk is sufficiently small. In particular.8. X) as an approximate eigenpair.XL as small as possible. it is hoped.= 0{. the only way we have for determining convergence is to look at the residual Rk = AXk — XkLk.5. RESIDUAL ANALYSIS Most methods for computing eigenspaces of a matrix A produce a sequence Xk of bases whose spans.12).4. Let X bean orthonormal basis fora simple eigenspace X of A and let Y be a basis for the corresponding left eigenspace y of A normalized so that YHX = I. Corollary 2. 2. PERTURBATION THEORY Hence 251 By definition the cosines of the canonical angles between K(X) and 7£( Z) are the singular values of XHZ. We will then show that a small residual implies a small backward error— a generalization of Theorem 1. Then the singular values ofY are the secants of the canonical angles between X and y. It is natural to choose this block to make the residual R = AX . Thus the cosines of the canonical angles are the singular values of which by the above observation are the reciprocals of the singular values of Z. The following theorem shows how to make this choice. Chapter 2.3. Chapter 2. In this subsection we will consider the implications of a small residual. which generalizes (3.SEC. If Rk is zero. Chapter 1. or sec 0. converge to an eigenspace of A. where Z is any orthonormal basis for K(Z). First we will show how to compute an optimal block L — a generalization of Theorem 1. Xk) is an eigenpair of A. we obtain the following corollary. Finally. then (Lk. • If we combine this theorem with Theorem 1.2.. 2. We will actually prove a little more. Hence cos &i — (T~l. One such basis is where (/ + QHQ) 2 is the inverse of the positive definite square root of / + QHQ.1). now follows from (2. The relation tan 0t = <r.

If X is an eigenbasis of A.2. Let(X XL) beunitary. We will prove the result for J?. by Theorem 1. where we consider the Rayleigh-Ritz method. then the Rayleigh quotient is an eigenblock. Then the quantity X1 A X is a RAYLEIGH QUOTIENT of A.5. Then (Y^X)'1 YH is a left inverse of X and (YHX)-1YH AX is a Rayleigh quotient. Chapter 2. The matrix X^AX is not the most general matrix that will reproduce an eigenblock when X is an eigenbasis. we may expect the Rayleigh quotient to contain good approximations to the eigenvalues of A. Definition 2.suggests the following definition. Then in any unitarily invariant norm \\R\\ and \\S\\ are minimized when in which case Proof.2 we may expect that if u is an eigenvector of L then Xu will be near an eigenvector of A. . the proof for S being similar.6. but in fact the division is implicit in X1. Let X be of full column rank and letX1 be a left inverse ofX. Moreover. let X be of full rank and let YHX be nonsingular. It may seem odd to have a quotientless Rayleigh quotient. We will give substance to these expectations in §4. Set Then The Rayleigh quotient The quantity L = XHAX is a generalization of the scalar Rayleigh quotient of Definition 1. To make it explicit.7.252 CHAPTER 4. Theorem 1. Consequently. if X is near an eigenbasis of A. EIGENSPACES Theorem 2. LetR = AX-XLandSH = XHA-LXH.

Let X be orthonormal and let Then there is a matrix such that an/"f If A is Hermitian and then there is a Hermitian matrix satisfy ing (2 A}. x) is an exact eigenpair of a perturbed matrix A + E. The singular values of this matrix are ±<T. but for it not to be (£. Proof. Chapter 2.3. The proof is a generalization of the proof of Theorem 1. 2.Ax. This does not guarantee that (I. then (L. VolII: Eigensystems . Theorem 2. Since the singular values of G and R are the same. PERTURBATION THEORY Backward error 253 In Theorem 1. Thus basing convergence on the size of the residual guarantees that wellconditioned eigenpairs will be determined accurately. X) is a good approximation to the eigenpair.3. where the az are the singular values of G. where E is of a size with r. which can be effected by observing that in the notation of the proof of Theorem 2. X) must be ill conditioned—that is.3. Briefly.6).SEC. if the residual is small. Moreover. The following theorem generalizes this result.. then (A. Chapter 2. and the square of the Frobenius norm is twice the sum of squares of the singular values of R. The only tricky point is the verification of (2. sensitive to small changes in A + E.6. X) is an exact eigenpair of a sightly perturbed matrix A -f E. Chapter 2. the 2-norm of this matrix is the largest singular value of R. we saw that if r = Ax .8. • The import of this theorem is discussed just after Theorem 1.

which can be written in the form . cannot be separated by more than \\E\\2. We will see in §2.. . arranged in descending order..9. Specifically.. Block deflation Although the small backward error associated with a small residual is a powerful reason for using the residual norm as a convergence criteria.10 in Chapter 1 with Theorem 2.l\ be the eigenvalues of L. . We will derive bounds by nudging it into full block triangularity. . . we have the following corollary. .PHP. if R is small. Since the eigenvalues of A + E and A. We will derive such bounds by a process of block deflation. theright-handside of (2. Consequently. Let i\. Specifically.6 we know that in any unitarily in variant norm ||-||.7) can be eliminated (see the notes and references).254 CHAPTER 4.8) is almost block triangular.. the question remains of how accurate the approximate eigenspace is. consider the similarity transformation For the (2. Then there are eigenvalues A 9 1 .8 we can obtain residual bounds on the eigenvalues of a Hermitian matrix.1) block of the right-hand side to be zero.8 and 3.\\G\\. we must have PL — MP = G .4 that if we know more about the eigensystem of A then we can get tighter residual bounds. A(Z) C A(A+ E). Corollary 2. A. of A such that and The factor \/2 in (2. EIGENSPACES Residual bounds for eigenvalues of Hermitian matrices By combining Theorems 3. ||J?|| = ||^4^— XL\\ . Let (X X±) be unitary and let Then by Theorem 2.

For the proof of this theorem see the notes and references. X) is a simple eigenpair of B with complementary eigenblock M . In the matrix let where S is defined by If there is a unique matrix P satisfying PL — MP = G — PHP and such that The spectra of L + HP and M — PH are disjoint. By Theorem 2.4 the singular values of P are the tangents of the canonical angles between X and X. Let Then ( X . if || • || is the spectral norm. Consequently. the largest canonical angle Q\ between X and X satisfies Vol II: Eigensystems .8). We will first discuss this theorem in terms of the matrix B and then.PH.SEC.9) is nonlinear in P and has no closed form solution. 2. later. L) is an approximate eigenpair of B with residual norm \\G\\. PERTURBATION THEORY 255 where S is the Sylvester operator Equation (2. under appropriate conditions its solution can be shown to exist and can be bounded. Theorem 2. in terms of the original matrix A in (2.10. However. Let || • || denote a consistent norm. The theorem shows that (L + HP.

which appears in (2. . . If L and M are Hermitian. EIGENSPACES The quantity sep(. £ < ) then and M = diag(MI .11). Ifsep(Z. . then in the Frobenius norm 5. . Item 1 provides an alternative definition of sep. .16. then sep(£. shows. It says that small changes in L and M make like changes in sep. so that sep is a lower bound for the physical separation 6 of the spectra of L and M. . N) introduced in Theorem 3. 4. Thus the poorer the separation. 1. M) = US"1]]"1. Chapter 1. it is nonzero if and only if the spectra of L and M are distinct. Let sep be defined by a consistent norm \\-\\. as Example 3. M m ). which are stated without proof in the following theorem. Item 2 is a continuity property. By Theorem 1. |sep(£ + £. M). 2. Ifthenorm \\ • \\ is absolute (i. where S is the Sylvester operator defined by (2. is a generalization of the quantity sep(A.11. M} can be much smaller than the physical separation.sep(£.11. 3. \\A\\ = |||A|||.256 The quantity sep CHAPTER 4. Finally. It is useful in assessing the separation of diagonalizable matrices. Chapter 1. Then the following statements are true. .and L = diag(ii .Z/. The quantity sep has a number of useful properties. . when we derive perturbation bounds for eigenspaces. M + F) .M) > 0. Chapter 1. .t It is a (not entirely trivial) exercise in the manipulation of norms to establish these properties. the larger the canonical angle between X and X. Unfortunately.10).15.e. sep(i. This fact will be important later.. Items 3 and 4 exhibit sep for matrices of special form. item 5 shows how far sep may be degraded under similarity transformations. Chapter 1. Theorem 2. M)| < \\E\\ + \\F\\. By Corollary 1.17.

or the quantity sep(£. one must have additional information about the disposition of the eigenvalues to use the theorem. provided we have an approximation to the corresponding left eigenbasis. Theorem 2. we see that if (2. M). where The bound (2. but when it comes to eigenspaces of very large matrices. this theorem does exactly what we need.14) follows from Theorem 2. Write Then by Theorem 2. Let X be an approximate eigenbasis for A and let(X X±_) be unitary.SEC. Vol II: Eigensystems .X) approximates an actual eigenpair of the matrix in question.12. we can compute L. which is large and generally dense. We shall see later that we can approximate a condition number for an eigenblock. PERTURBATION THEORY 257 Residual bounds We now return to our original matrix A and derive residual bounds for the approximate subspaces.6 \\R\\ = \\G\\ and \\S\\ = ||#||.10 accumulating transformations. If we now apply Theorem 2. Although. and let If in any unitarily invariant norm then there is a simple eigenpair (£. Let L = XHAX and M . It provides conditions under which a purported eigenpair (L. however. R.12) is satisfied then there is an eigenblock of A of the form (L + HP. and S easily enough. In large eigenproblems. X) of A satisfying and Proof. the size n of the matrix will be very much greater than the number k of columns of the matrix X.4. we cannot compute M.13) follows on taking the norm of the difference of L and L + HP. X + A^P). • From a mathematical standpoint. 2. The bound (2.X^AX± be the Rayleigh quotients corresponding to X and X±.

right and left eigenpairs of A.11 the quantity 6 is a lower bound on sep(L+Fu. Theorem 2. we obtain better bounds by proceeding indirectly through a spectral decomposition of A.19) and (2.258 2. • We will now examine the consequences of this theorem.12. The expressions (2.20) follow upon accumulating the transformations that reduce A to block triangular form.17) is satisfied. M-\^22). In this subsection we will treat the problem of bounding the perturbation of an exact eigenspace when the matrix changes. In principle. . Let A have the spectral representation Let A = A + E and set Let || • || be a consistent family of norms and set If then there is a matrix P satisfying such that and are complementary.16).10 guarantees the existence of a matrix P satisfying (2. The spectra ofL and M are disjoint Proof. Theorem 2.18) that block triangularizes the matrix (2. we wish to determine when there is a nearby eigenspace of a perturbation A = A + E of A and to bound the angles between the two spaces. simple. The perturbation theorem Given a simple eigenspace of a matrix A.Hence if (2. we can do this by regarding the exact eigenspace of A as an approximate eigenspace of A and applying Theorem 2. By item 2 of Theorem 2. PERTURBATION THEORY CHAPTER 4. EIGENSPACES In the last subsection we considered the problem of bounding the error in an approximate eigenspace.13. In practice.3.

13 are cast in terms of the matrices Fij because they are the primary quantities in the proof of the theorem.17) by so that the bound (2. In the remainder of this subsection we will therefore make the following assumptions. In Theorem 2. we may redefine and replace the condition (2.18) becomes Thus how X\ and y\ are situated with respect to one another affects the condition for the existence of the perturbed eigenspace. 2.13) actually represents a continuum of results. these bounds can be conveniently written in the terms of 0xy defined in (2. but it does not affect the final bound on \P\\2. since the matrices L. we will seldom have access to these matrices. With these bounds. There is. but the one that comes closest is the the standard representation in which X\ and Y2 are orthonormal. since spectral representations are only partially unique (see p. The variants do not say the same things. What we are likely to have is a bound on some norm oi E.13. and Fij also change. M. We will set Bounds in terms of E The bounds of Theorem 2. no single choice of spectral representation that is suitable for all applications. Vol II: Eigensystems . In practice.SEC.21). 245). Specifically. In the 2-norm. the matrices Xi and ¥2 are assumed to be orthonormal.4). however. In this case we must weaken the bounds by replacing \\Fij\\ by ||y^H||||£'||||Xj||. unfortunately. PERTURBATION THEORY Choice of subspaces 259 Theorem (2. In consequence (Theorem 2.10) there is a matrix Q such that The singular values ofQ are the tangents of the canonical angles between X^ = H(Xi) andyi = n(Yi) (Theorem 2.

24). the size of P is a measure of the deviation of X\ = K(Xi) from Xi. neither condition holds. But with a little manipulation we can find a basis for X\ to which Theorem 2. Angles between the subspaces In the relation Xi = Xi -f X^P. Chapter 1. This Rayleigh quotient is an especially good approximation to the eigenblock of A. Thus we have the following generalization of the assertion (3. In fact.21) we have . From (2. Unfortunately. from the above bounds Thus the difference between the Rayleigh quotient and the exact representation goes to zero as the square of ||£"||2We also have Hence as E —» 0 Thus the secant of the largest canonical angle between X^ and y\ is a condition number for the eigenblock L that determines how the error in E propagates to the representation L.4 to obtain this relation. If Xi were orthonormal and Y^Xi were zero.260 The Rayleigh quotient and the condition of L Let CHAPTER 4.7. the matrix L is a Rayleigh quotient in the sense of Definition 2.4 does apply. EIGENSPACES Since Y^Xi = /. we could apply Theorem 2. It is natural then to ask how this measure relates to the canonical angles between the two spaces.

QP is nonsingular and We can therefore write Now the subspace spanned by Xi(I+QP) l is still X\.22) (remember sec2 0xy = 1 + tan2 #xy). Then where Here are two comments on this corollary. In fact. 2. It follows that / -}. Moreover.23) and (2. since for small 9 we have 6 = tan 0. M) we can generalize the proof of Theorem 3. Let 0xy be the largest canonical angle between X\ and 3^i and let Q(X\. AI) be the largest canonical angle between X\ and X\. Since X\ and Y2 are orthonormal and Y^Xi = 0. so that when E is small the bound is effectively 11E \ \ 2 / 8. PERTURBATION THEORY 261 the last inequality following from (2. M) is a condition number for X\.13. M)""1 is a condition number for the eigenspace K(Xi). Thus sep(Z.27) we have the following corollary. for E small enough the singular values of P are approximations to the tangents of the canonical angles between X\ and X\. • As E. to obtain the following bound: However.4 that the tangents of the canonical angles between X\ and X\ are the singular values of P(I + QP}~1. approaches 6 as E —»• 0. Vol II: Eigensystems . 6 itself approaches sep(i. Consequently. which is necessarily positive. On taking norms and using (2. In summary: Let A have the standard spectral representation X\LY^ + X^MY^. Chapter 1. this approach has the disadvantage that it assumes rather than proves the existence of X\. —»• 0 the matrix P(I + QP)~l —> P. • If we are willing to replace sep(L.14. it follows from Theorem 2. • The denominator of (2. M) with sep(L. M). the singular values approximate the angles themselves.SEC. Corollary 2. Then in the 2-norm the quantity sep(jL.28).

There we required that the spectra of two Rayleigh quotients corresponding to the approximate subspaces be disjoint. Theorem 2. If then The separation hypothesis (2. If ||G||2 is less than f.12. Let the Hermitian matrix A have the spectral representation where (X Y) is unitary. As usual. and let where N is Hermitian. Let the orthonormal matrix Z be of the same dimensions as X. The eigenspaces corresponding to these two groups of eigenvalues are candidates for bounding. The purpose of this subsection is to state theorems containing such direct residual bounds. Chapter 1.262 CHAPTER 4. The fact that we know these eigenspaces exist before we start enables us to get direct bounds that are simpler and more useful than the ones developed above. For Hermitian matrices we can often say a priori that the required eigenspaces exist. Here the separation is between the certain eigenvalues of A (those of L) and eigenvalues associated with the .2 addresses two logically distinct problems: the existence of certain eigenspaces of A and the bounding of the error in those eigenspaces.8. y) will denote the the diagonal matrix of canonical angles between the subspaces X and y. will be omitted. Residual bounds for eigenspaces We begin with the simpler of the two results on eigenspaces treated here. suppose that the Hermitian matrix A has the form and that 6 = min |A(£) . EIGENSPACES 2. which are not needed in this work. the eigenvalues of A must divide into two groups—those that are within ||G||2 of A(L) and those that are within ||<7||2 of A(M).15. For example.A(M)| > 0. RESIDUAL BOUNDS FOR HERMITIAN MATRICES The residual bound theory of §2. The proofs. then by Theorem 3. ©(A'.4.30) is different from that required by the residual bounds of Theorem 2.

The condition (2.9 we showed that if R — AX — XL is small then there are eigenvalues of A lying near those of L. the square root of the sum of squares of the sines of the canonical angles. The natural norm to use is the 2-norm..12. 7£(Z)) — i. they are free to interlace with eigenvalues of M lying between those of N and vice versa. The price we pay is that we can only bound the Frobenius norm of sin 0(7£(JQ. Let Let VolII: Eigensystems .The following shows that if we can specify a separation between the eigenvalues of L and a complementary set of eigenvalues of A then we can reduce the bound to be of order \\R\\2Theorem 2.32) bounds the largest canonical angle between the two spaces.16.e. Let (Z Z_\_] be unitary. The bound is now in an arbitrary unitarily invariant norm.15 suppose that and for some 8 > 0 Then for any unitarily invariant norm The conditions on the spectra of N and M say that the spectrum of N lies in an interval that is surrounded at a distance of 6 by the spectrum of M. Theorem 2.17. and the final bound is smaller—the benefits of being Hermitian. The bound is of order \\R\\.30) requires only that the eigenvalues values of M and N be decently separated. Otherwise.SEC.15 than that of Theorem 2. For individual angles the bound will generally be an overestimate. where Z has k columns. The theorem reflects an important tradeoff in bounds of this kind. In the notation of Theorem 2. We can resolve this problem by restricting how the eigenvalues of M and N lie with respect to one another. Moreover. 2. PERTURBATION THEORY 263 approximating subspace (those of N). Residual bounds for eigenvalues In Corollary 2. in which case (2. the separation criterion is less stringent in Theorem 2.

1.264 CHAPTER 4. 2. who recommend computing the sines of the angles. The definitive paper on computing canonical angles is by Bjorck and Golub [26]. For a more complete treatment of this subject see Volume I of this series or [269. 264] proved the result for vectors.31) on the spectra. However. However. I].R|| < 6 and righthand side of (2. NOTES AND REFERENCES General references For general references on perturbation theory see §3. These transformations can then be applied to X to give Y±X in the last n—k rows of the result. EIGENSPACES If then The theorem basically says that if A has a set £ of eigenvalues that are close to A(£) and that are removed from A(M) by 6. Wilkinson [300. Chapter 1.1973] stated the result explicitly for nonHermitian matrices and gave the formula (2.1965.2). and has often been rediscovered. This theorem has the great advantage that we do not have to put restrictions such as (2. in sparse applications it has the disadvantage that we do not know N. Ch. Canonical angle The idea of canonical angles between subspaces goes back to Jordan [138. Optimal residuals and the Rayleigh quotient Theorem 2. Earlier.6 for Hermitian matrices is due to Kahan [142. Stewart [252. then for R small enough. In our applications n will be very large compared with the common dimension k of the subspaces.2|| J2||2). then we can write Hence the condition \\R\\2 < # can be replaced by the condition 3||.33) by \\R\\%/(6 . But if we know that (in the notation of the last paragraph) there is a set C± of eigenvalues of A with rnin{|A(M) .3.£_L|} = <!>. 1967] (his proof works for non-Hermitian matrices).2) for the optimal residual R. In this case Yi is very large.1875]. p. . given a basis for y one can compute an orthogonal matrix whose first k columns span y as a product of k Householder transformations (Algorithm 1:4. the eigenvalues £ and A(i) can be paired so that they are within O(||J?||2) of each other.5. and it is impossible to compute Y±X directly.

5) is optimal (also see [64. The result can be generalized to unitarily invariant norms [269. The residual bounds for eigenvalues are due to Mathias [174] and improve earlier bounds by Stewart [261]. Given a residual. who showed that the factor \/2 can be removed from the right-hand side of (2. For more on the history of such bounds see [269. Block deflation and residual bounds The use of block triangularization to derive residual bounds is due to Stewart [250. Theorem FV.9 is due to Kahan [142. who also introduced the quantity sep. we can regard the greater accuracy as due to the fact that the matrix (2. 1971]. 3.1967]. p. we can generate a backward error and apply a perturbation theorem to get a residual bound.24) is accurate up to terms of order \\E\\2 shows that not all Rayleigh quotients are equal.16) is an off-diagonal perturbation of the block diagonal matrix consisting of the two Rayleigh quotients Y^AXi and F2H AX2The bound (2. The greater accuracy of Y^AX\ is due to the fact that we use approximations to both the left and right eigenbases in its definition.SEC. If we have small right and left residuals R = AX . Residual eigenvalue bounds for Hermitian matrices Corollary 2.1970]. is accurate only up to terms of order \\E\\. is simpler than the original. Kahan. 244]. A treatment of other theorems of this kind may be found in [206]. The Rayleigh quotient X^AXi.14]. Theorem 9.11]). It is worth noting we can go the other way. The approach taken here. Alternatively. Vol II: Eigensystems . The fact that the Rayleigh quotient (2. and Jiang [144] produce perturbations that are optimal in the spectral and Frobenius norms. a result of Dennis and More [63. Parlett.29) is due to Ipsen [126].XL and 5H = YUA ~ Z H y H it is natural to ask if there is a perturbation that makes X a right eigenbasis and Y a left eigenbasis. KRYLOV SUBSPACES Backward error 265 For Hermitian matrices in the vector case.7). in which the transforming matrix is triangular. which used an orthogonal transformation. Residual bounds for eigenspaces and eigenvalues of Hermitian matrices The residual bounds for eigenspaces are due to Davis and Kahan [54. The theorems stated here all have to do with approximate right eigenspaces. who prove much more in their ground-breaking paper. for example. and of course they have trivial variants that deal with left eigenspaces. Perturbation theory The passage from residual bounds to perturbation bounds was given in [254].4. 1977] shows that (2.

at each step the power method considers only the single vector Aku.1 shows the tangent of the angle between Akui and the dominant eigenvector of A.2. We begin with a definition.. 3.3 we will introduce block Krylov sequences. A diagonal matrix A of order 100 was generated with eigenvalues 1. Chapter 2) is based on the observation that if A has a dominant eigenpair then under mild restrictions on u the vectors Aku produce increasingly accurate approximations to the dominant eigenvector..95". EIGENSPACES 3.. which amounts to throwing away the information contained in the previously generated vectors.. as it must be. in §3.2. KRYLOV SEQUENCES AND KRYLOV SPACES Introduction and definition The power method (§ 1.. reducing the error by eight orders of magnitude. The first stage generates a subspace containing approximations to the desired eigenvectors or eigenspaces. — Aku\ were generated. the vectors uj. Example 3.95... Then the sequence is a KRYLOV SEQUENCE BASED ON A AND u. ..95.. Finally.. We will then consider the accuracy of the approximates produced by the method in §3. However. they may be used to restart the first stage.952.1. Starting with a random vector ui. The second stage extracts approximations to the eigenvectors from the subspace generated in the first stage. The dashed line gives the tangent between the dominant eigenvector and the space spanned by «i. as the following example shows. Let A be of order n and let u ^ 0 be an n vector. We call the matrix . The solid line in Figure 3. Definition 3. Uk. of which more later.953.... The convergence is much more rapid. Aui. (The dash-dot line shows the convergence to the subdominant eigenvector.. In this section we will be concerned with the most widely used method for computing subspaces containing eigenvectors—the method of Krylov sequences. In the first subsection we will introduce Krylov sequences and their associated subspaces. KRYLOV SUBSPACES Most algorithms for solving large eigenproblems proceed in two stages.1.) The swift convergence of an approximation to the dominant eigenvector in the subspace spanned by ui. If the approximations are not satisfactory. since the ratio of the subdominant eigenvalue to the dominant eigenvalue is 0.266 CHAPTER 4. It is seen that the convergence is slow.. It turns out that this information is valuable. Ak~lu\ justifies studying such subspaces.

The sequence of Krylov subspaces satisfies and 2. Thus we may write ICk(u) or simply /C^.3. 3. When A or u are known from context.1: Power method and Krylov approximations the kth KRYLOV MATRIX. we will drop them in our notation for Krylov matrices and subspaces. Let A and u ^ 0 be given. Vol II: Eigensystems . I f t r ^ O . KRYLOV SUBSPACES 267 Figure 3. The proof is left as an exercise. Theorem 3. Elementary properties The basic properties of Krylov subspaces are summarized in the following theorem. The space is called the kth KRYLOV SUBSPACE. Then 1.SEC.

268 3. u) can be written in the form Termination If (A. and In other words the Krylov sequence terminates in the sense that the Krylov vectors after the first one provide no new information. Theorem 3. u) is an eigenpair of A. EIGENSPACES 4.( A. Conversely. u] can be written in the form Hence if we define the matrix polynomial p( A) then v = p(A)u. CHAPTER 4. The polynomial connection Krylov subspaces have an important characterization in terms of matrix polynomials.then v G K. we may diagonalize it—a strategy that sometimes simplifies the analysis of Krylov sequences. If W is nonsingular. then The second item in the above theorem states that a Krylov subspace remains unchanged when either A or u is scaled. u). we will say that a Krylov sequence terminates at t if I is the smallest integer such that The following theorem states the basic facts about termination. the fourth says that the space behaves in a predictable way under similarity transformations. More generally. The space ICk(A. then Aku — Xku. The third says that it is invariant under shifting.4. Finally.k(A. In particular if A is Hermitian or nondefective. Thus we have proved the following theorem. if v — p(A)u for any polynomial of degree not exceedingfc—1. For any K. By definition any vector v in £*. .

. 3. On the other hand. Au. then so that K.. If u is in an invariant subspace X of dimension ra. Xi) (i = 1. Let u be a starting vector for a Krylov sequence. the sequence terminates at I. that the Krylov subspace converges to the vector.£ is an invariant subspace of A. is the smallest integer for which If the Krylov sequence terminates at I...1 that a sequence of Krylov subspaces can contain increasingly accurate approximations to the certain eigenvectors of the matrix in question. Experience has shown that these eigenvectors have eigenvalues that tend to lie on the periphery of the spectrum of the matrix. KRYLOV SUBSPACES 269 Theorem 3.1). Now if the sequence terminates at I > m it follows that m = dim(/Q) > dim(/Cm) = dim(A'). ifu lies in an eigenspace of dimension m. Am~l u. If the sequence terminates at I. (3. then for some I < m. A Krylov sequence terminates based on A and u at I if and only iff. In this subsection we will give mathematical substance to this observation—at least for the Hermitian case. • Termination is a two-edged sword. where p(X) is a polynomial of degree not greater than k. As we shall see.. Equation (3. then /Q is an eigenspace of A of dimension t. n).1) follows from the fact that £/ C /Q+i.2) is satisfied. The general approach In what follows. On the one hand.2. and write u in the form where a. Since p(A)x{ = p(\i)x{. In this case we will say.5. it contains exact eigenvectors of A. we will assume that A is Hermitian and has an orthonormal system of eigenpairs (A.. = x^u. CONVERGENCE We have seen in Example 3. On the other hand if (3. Proof.4 any member of fCk can be written in the form p(A)u. if a Krylov sequence terminates.SEC. somewhat illogically. then so are the members of the sequence w..2) follows from (3. we have Vol II: Eigensystems . the sword is especially difficult to wield in the presence of rounding error.. On the other hand. By Theorem 3. which is impossible.. it stops furnishing information about other eigenvectors. 3.

Hence Thus if we can compute as a function of k. Theorem 3.(p(A)u.4) that To apply this theorem note that the right-hand side of (3. Hence It follows from (3. EIGENSPACES Thus if we can find a polynomial such that p(\i) is large compared withp(Aj) (j ^ i). In the above notation. then p(A)u £ fCk and tan /. if x^u / 0 andp(Xi) / 0. then Proof. X{) is an upper bound for tanZ(xj. As it stands.5) does not depend on the scaling of p.3) then |aj|/||w||2 is the cosine of the angle 9 between u and Xi and -y/Sj^i |aj |2/||w||2 is the sine of 0. To simplify it. then p( A)u will be a good approximation to X{.270 CHAPTER 4.6. we will have a bound on the accuracy of approximations to X{ in the sequence of Krylov subspaces. We can therefore assume that p(Xi) = 1 and write If deg(p) < k — 1./C&). The following theorem quantifies this observation. this problem is too difficult. assume that . We begin by observing that if u has the expansion (3.

Theorem 3.6).SEC. KRYLOV SUBSPACES and that our interest is m the eigenvector x\.8). we collect them in the following theorem. Let Ck(i) be defined by (3. Then 271 is an upper bound for (3. Then the following assertions are true. Thus we are lead to the problem of calculating the bound This is a well-understood problem that has an elegant solution in terms of Chebyshev polynomials. Chebyshev polynomials The Chebyshev polynomials are defined by Chebyshev polynomials are widely used in numerical analysis and are treated in many sources. to which we now turn. 3. Vol II: Eigensystems .7. Rather than rederive the basic results.

In view of (3.1].10) to compute <7jt. /C^).rnmps . but it simplifies—at least asymptotically. a nearby subdominant eigenvalue will retard convergence. the bound hp.p. with its eigenvalues ordered so that T. For if 77 is small.9). However.8. AI] to [-1. For k large we can ignore the term with the exponent 1 .t Then Three comments. in this case the bound gives us a bonus. • The bound appears to be rather complicated.k and write • As with the power method. It is easily verified that the change of variables is The value of/^ at AI is Hence the value of a\^ is l/Cfc_i(//i). EIGENSPACES Convergence redux We now return to the problem of bounding tan /.(x\. we have the following theorem. xi). provided we change variables to transform the interval [A n . which by (3.272 CHAPTER 4. Theorem 3. Let theHermitian matrix A have an orthonormal system ofeigenpairs (Af.7) is equivalent to finding We can apply (3.

the Krylov subspaces of A remain unchanged when A is replaced by —A. However.(t > 2) can be derived by considering a polynomial of the form where A is the Lagrange polynomial that is zero at AI .69..23. so that the bound will be unfavorable. it is important to keep in mind that the smallest eigenvalues of a matrix—particularly a positive definite matrix—often tend to cluster together. To see this.1. and expand u in the form Vol II: Eigensystems . • By item 2 in Theorem 3. The general approach can be used to derive bounds for the remaining eigenvectors z 2 . For example..053. £3. let where q(t) is a polynomial of degree not greater than k — 2 satisfying q( A 2 ) — 1. — For example.8 can be relaxed. 3. . In Example 3. to derive a bound for x 2 .95 for the power method. A. There 77 = 0. and the bound on the convergence rate is 1/(1 + 2-0.5/0. suppose that Aj = \2 > AS.97. Thus the square root effect gives a great improvement over the rate of 0. The square root of this number is about 0.95 = 0. so that the bound on the convergence rate is about 0. Let u be given..1.SEC._i and one at A. the value of 77 corresponding to the smallest eigenvalue is about 0.. KRYLOV SUBSPACES 273 The square root increases the influence of rj on the bound. Note thatp(Ai) = 0 andp(A 2 ) = 1-It: tnen follows from (3. consider the matrix in Example 3.8 can be used to bound convergence to the eigenvector corresponding to the smallest eigenvalue of A. Multiple eigenvalues The hypothesis AI > A 2 in Theorem 3..5) that where Bounds for A.3. Hence Theorem 3.23) = 0.33-10~4.

This phenomenon is not accounted for by our theory. We will return to this point in a moment.5-10"7.8 continues to hold when AI is multiple provided we adopt the convention that only one copy of AI is listed in (3.31. The reason is that the bounds must take into account the worst-case distribution of the eigenvalues.0-10"2 — once again. The final value is 5.1 is tan Z(£ 2 . Then This shows that of all the vectors in the two-dimensional invariant subspace that is spanned by xi and #2. which is the best we can expect and compares favorably with the bound 2. but they can underestimate how good the approximations actually are — sometimes by orders of magnitude. The largest eigenvalue was recomputed . There are two reasons.55 to 0. The last three convergence ratios are 0. Actually. but then they drop off. which does not reflect the true behavior of the Krylov approximations. the bounds provide support for the empirical observation that Krylov sequences generate good approximations to exterior eigenvalues. The same procedure applied to the dot-dash line yields an optimal bound of about 6-10~3. Immediately after the first step they range from 0.2-10"6. if the matrix is nonnormal.75. Simply take a ruler and draw a line from the common initial point that is tangent to the dashed line (at about k = 12).26.0-10~3./C 20 ).0-10"3. whereas the bound predicts a value of 4. the graphs show that any exponentially decreasing bound of the form apk must be weak. Thus Theorem 3. Non-Hermitian matrices The theory of Krylov subspaces for non-Hermitian matrices is less well developed. Assessment of the bounds It is instructive to apply the bounds to the Krylov sequence generated in Example 3. the bound (3. we may not have a complete system of eigenvectors to aid the analysis. Its value at A.11). EIGENSPACES where the Xi are the normalized eigenvectors corresponding to A ? . The dot-dash line in Figure 3. otiXi + #2X2) were a simple eigenpair of A. The problem is that the theory predicts a convergence ratio of 0. and 0.9.1 shows that the convergence ratios are steadily decreasing.67. = 25isabout7-10~ 4 . To summarize. a poor prediction. where the actual value is 1. while a glance at the graph in Figure 3. I* is just as though (Ai.12) continues to hold when A! and A 2 are multiple. A normal (actually diagonal) matrix was constructed whose eigenvalues were those of a random complex matrix. Example 3. First.22. u) contain only approximations to ai^i -f 0:2X2. if we retain only one copy of AI and A2. The following example shows that a change in geometry can have a major effect on the rates of convergence.274 CHAPTER 4.1. Similarly. £20) °f 2. the geometry of the eigenvalues is generally more complicated. Thus part of the problem is with the mathematical form of our bounds.0. the spaces ICk(A. Second. The bound predicts a value for tan Z.(x\.

To vary the geometry. so that the matrix is nowHermitian.2: Power method and Krylov approximations so that the absolute value of its ratio to the second largest eigenvalue was 0. KRYLOV SUBSPACES 275 Figure 3. It would be foolish to conclude too much from these examples.8. Since it converges more slowly. It turns out that the third experiment in Example 3.1. the eigenvalues from the first experiment were replaced by their absolute values. 3. has exactly the same bounds. the bounds cannot predict swifter convergence. The solid line in Figure 3.95 (as in Example 3.1.1). The example does have something to say about the convergence bounds from Theorem 3. The convergence. The dash-dot line shows how well the Krylov subspace approximates the eigenvector. The convergence is swift as in Example 3. It is clear that geometry matters and perhaps that exponentially decaying eigenvalues are a good thing.95 white their arguments remained the same. in which the matrix is Hermitian. is intermediate.SEC. illustrated by the dashed line.2 shows the tangent of the angle between the kth Krylov subspace and the dominant eigenvector. Vol II: Eigensystems .1.9. Recall that the bound failed to predict the swift convergence to xi for the matrix from Example 3. The convergence is not much better the power method in Example 3. There is a polynomial connection for non-Hermitian matrices. the eigenvalues were altered so that the ratio of the absolute values of the successive eigenvalues became 0. In a third experiment.

it is a continuum of theorems. . for example. For example. The theorem also illustrates the difficulties introduced by defectiveness and nonnormality. EIGENSPACES Theorem 3. Let where Then Proof. Suppose. . since spectral representations are not unique. The theorem suggests that we attempt to choose polynomials p with p(\) = 1 so that p(L) is small. Then for p(L) to be zero.3 Hence for any polynomial p of degree not greater than k-1 with p(\i) = 1. /j ^ A is an eigenvalue of A with a Jordan block of order ra. This is not. it is not sufficient to simply make p small at //.10. if A is nondefective we can take L to be diagonal and the columns of X to have norm one. This implies that we cannot generalize the approach via Chebyshev polynomials to defective matrices. . In other words. What results exist suggest that under some conditions the Krylov sequence will produce good approximations to exterior eigenvectors. .6. . in which case the bound becomes (after a little manipulation) where A 2 . since the geometry of the eigenvalues can be complicated. Actually. Let A be a simple eigenvalue of A and let be a spectral representation. By Theorem 2.276 CHAPTER 4.n)m. we have This theorem is an analogue of Theorem 3. An are the eigenvalues of A other than A. the polynomialp(r) must have the factor (r . in general. an easy task. For more see the notes and references.

a Krylov sequence has tunnel vision. unless we are unlucky the two approximations will span the required two-dimensional invariant subspace. Xi) be a complete system oforthonormal eigenpairs of A.3.0. Although it requires more storage and work to compute a block Krylov sequence than an ordinary Krylov sequence. Second. Of course. The foregoing suggests that we generate a block Krylov sequence by starting with a matrix U with linearly independent columns and computing The space JCk(A. BLOCK KRYLOV SEQUENCES We saw above [see (3. Specifically. First. ra vectors takes less than ra times the time for a matrix-vector product.11. Theorem 3. v) can approximate the linear combination faxi + /32^2. 3.SEC.2X2. A cure for this problem is to iterate with a second vector. if the matrix is stored on disk. passing to block Krylov sequences improves the convergence bound. what we would like to approximate is the entire invariant subspace corresponding to that eigenvalue. as the following theorem shows. with the eigenvalues ordered so that and assume that the multiplicity of AI is not greater than m. say. Let A be Hermit/an and let (Az-. If VolII: Eigensy stems .13)] that when it comes to multiple eigenvalues.Since the sequence in u produces approximations to otiXi -}. if AI is double and the generating vector is expanded in the form the Krylov sequence can only produce the approximation a\x\ -f 0:2^2 to a vector in the eigenspace of AI. KRYLOV SUBSPACES 277 3. it frequently happens that the computation of the product of a matrix with a block U of. For example. say Then so that ICk(A. it has to be retrieved only once to form AU. there are two compensating factors (aside from curing the problem of multiple eigenvalues).U) spanned by the first k members of this sequence is called the kth block Krylov subspace.

.11 for non-Hermitian block Krylov sequences. • If the multiplicity of A! is less than m. m) had been deleted. v) contains no components along #2. Hence JCk(A. We may now apply Theorem 3. Hence the Krylov subpace /Cjt(A. • Three comments. • • • .. Since deflation is vital in other ways to many algorithms. the order constant in the bound istanZ(xi. EIGENSPACES where Proof. However.. However.v) C ICk(A. The two approaches each have their advantages and drawbacks.. Thus Theorem 3. so that the Krylov sequence can converge to a second eigenpair corresponding to a repeated eigenvalue.278 is nonsingular and we set then CHAPTER 4.16) is larger than the value of 77 in Theorem 3.8.15).12).{7). we can get convergence rates for the vectors #2. m). An alternative to block Krylov methods is to deflate converged eigenpairs. xt-) (i = 1. But the reader should keep in mind that there are important block variants of many of the algorithms we discuss here. In general the latter will be smaller.. it is as if we were working with a matrix in which the pairs (A. .U)so that From the definition of v it is easily verified that xfv = 0 (i = 2. • By the same trick used to derive the bound (3. .) not tanZ(xi. then 77 defined by (3.t.-.11 predicts faster convergence for the block method. so that the eigenvalue following AI is A ra+ j. there is no convergence theory analogous to Theorem 3. xm.8 to get (3... we will stress it in this volume. By construction v€K(U). so that initially the bound may be unrealistic. x3-> — The argument concerning multiple eigenvalues suggests that we can use block Krylov sequences to good effect with non-Hermitian matrices.

1]. §6. the focus of this section has been on the spectral properties of Krylov sequences. and. obtaining ra successive eigenvalues and principle axes by only m+1 iterations. • • • . in which he proposed the algorithm that bears his name (Lanczos himself called it the method of minimized iterations). N. Vol II: Eigensystems . In addition to most of the results treated in this subsection. Krylov [153. these books contain historical comments and extensive bibliographies. to avoid underusing a good letter. The solution of eigenvalue problems by forming and solving the characteristic equation is a thing of the past. Parlett [206]. Convergence (Hermitian matrices) It seems that Krylov had no idea of the approximating power of the subspaces spanned by his sequence. Specifically.1)]. if the Krylov sequence terminates at i [see (3. Krylov sequences are based on u. Krylov sequences According to Householder [121. and Krylov sequences are often based on a vector v. Chatelin [39]. The first hint of the power of Krylov sequences to approximate eigenvectors is found in Lanczos's 1950 paper [157]. In much of the literature the eigenvectors of the matrix in question are denoted generically by u (perhaps because u is commonly used for an eigenfunction of a partial differential equation). To maintain consistency with the first half of this book. he states Moreover. Since this book is concerned with eigensy stems. What makes the Krylov sequence important today is that it is the basis of the Lanczos algorithm for eigenpairs and the conjugate gradient method for linear systems as well as their non-Hermitian variants. A word on notation. KRYLOV SUBSPACES 3.SEC. The term "Krylov sequence" is due to Householder. Numerical Methods for Large Eigenvalue Problems: Theory and Algorithms^ Y. <^-i such that It can then be shown that the polynomial divides the characteristic polynomial of A. and Eigenvalues of Matrices by F. then there are constants ao. eigenvectors are denoted by x. the good convergence of the method in the case of large dispersion makes it possible to operate with a small number of iterations.4. 1931] introduced the sequence named for him to compute the characteristic polynomial of a matrix. Saad [233]. 3. NOTES AND REFERENCES Sources 279 Three major references for Krylov sequences are The Symmetric Eigenvalue Problem by B. In the conclusion to his paper.

shows that there is nothing sacrosanct about Chebyshev polynomials. These results raise the possibility of using polynomials to get eigenvalue bounds. Block Krylov sequences Block Krylov sequences — limited to small powers of A — were introduced by Cullum and Donath [45.11 itself is due to Saad [229]. 1980]. RAYLEIGH-RITZ APPROXIMATION We have seen how a sequence of Krylov subspaces based on a matrix A can contain increasingly accurate approximations to an eigenspace of A. This bound. who gave a direct proof later [233. In his important thesis Paige [196.1971] followed Kaniel's approach.10). 1966].6. One is free to use them—or not use them—in combination with other polynomials. EIGENSPACES It is unfortunate that the first convergence results for Krylov spaces were directed toward the Lanczos algorithm. It is known that Chebyshev polynomials defined on a suitable ellipse asymptotically satisfy an inequality like (3. p.14) was stated without proof by Saad in [230. Theorem 3. The Rayleigh-Ritz method approximates that best approximation. 1974] in connection with a restarted Lanczos algorithm. Underwood established eigenvalue bounds in the style of Kaniel. incidentally. But it is one thing to know that an approximation exists and another to have it in hand. For more see [229]. Theorem 3. Convergence (non-Hermitian matrices) The inequality (3. and the related bound (3. Block Krylov sequences in their full generality were introduced by Underwood in his Ph.8 are possible.10 removes this restriction. 204].D. which is a combination of Krylov sequences and the Rayleigh-Ritz method to be treated in the next subsection. Kaniel [145. who gave the first analysis of the method.8. This is the approach we have taken here. The purpose of this subsection is to describe and analyze the most commonly used method for extracting an approximate eigenspace from a larger subspace—the Rayleigh-Ritz method. started by using Chebyshev polynomials to bound the eigenvalues produced by the Lanczos method and then using the eigenvalue bounds to produce bounds on the eigenvectors.11. Theorem 3. using a technique related to the construction in the proof of Theorem 3. The result requires that A be diagonalizable. 4. It is important to keep in mind that there are two levels of approximation going on here. thesis [280. Thus when an eigenvalue lies outside an ellipse containing the rest of the spectrum. In 1980 Saad [229] showed that more natural bounds could be derived by first bounding the eigenvector approximations contained in the Krylov sequence itself and then analyzing the Rayleigh-Ritz procedure separately. 1975] to generalize the Lanczos algorithm (also see [92]). First there is a best approximation to the desired eigenspace in the larger subspace.280 CHAPTER 4. correcting and refining his eigenvector bounds.12). Con- . The result is Theorem 3. bounds analogous to those of Theorem 3.

W) is an eigenpair of B. This suggests that to approximate an eigenpair corresponding to X we execute the following procedure. UW) is an eigenpair of A with U(UW) = X. 4. RAYLEIGH-RlTZ METHODS The essential ingredients for the application of the Rayleigh-Ritz method is a matrix A and a subspace U containing an approximate eigenspace of A. W) ofB such that (L. although its use in practice is straightforward. It may and often will be a Krylov subspace. 4. Proof. To keep the treatment simple. We begin with a description of the method and then turn to its analysis. Let (L. the method may not produce satisfactory approximations. Consequently. we will confine ourselves to the approximation of eigenvectors and will point out where the results generalize to eigenspaces. No other assumptions are made about U.SEC. the last two subsections are devoted to ways to circumvent this problem. Then from the relation we obtain so that (L. By continuity. The analysis shows that unless a certain regularity condition is satisfied. In this section we will develop the theory of the method and examine its practical implications. which treats the case when U contains an exact eigenspace of A. then there is an eigenpair (L. W) of 7? such that (L. Let V be a left inverse ofU and set If X C It is an eigenspace of A. AW) is an approximate eigenpair of A with Tl(AW) = X. We will motivate the method by the following theorem. we might expect that when U contains an approximate eigenspace X o f A there would be an eigenpair (!•.1. Vol II: Eigensystems . RAYLEIGH-RlTZ APPROXIMATION 281 sequently. the analysis of the method is somewhat complicated. but that is irrelevant to what follows. LetU be a subspace and let U be a basis for U. Theorem 4. Thus we can find exact eigenspaces contained in U by looking at eigenpairs of the Rayleigh quotient B. X) be an eigenpair corresponding to X and let X — UW.1.

Thus the method can fail. even though the space U contains an exact approximation to the desired eigenvector. Uw a Ritz vector. M its Ritz block'. we will call A a Ritz value. we have no guarantee that the result approximates the desired eigenpair of A. however. The second difficulty has no easy answer.2.5). We will call the pair (M. The space 1Z(UW) will be called the Ritz space. W) (see Theorem 1. we have not specified how to choose the eigenpair (M. We will call W itself the primitive Ritz basis. W) in statement 3.1) has two difficulties. Let us examine these difficulties in turn. in real life we would not expect B to be exactly zero. by perturbing U by a matrix of random normal deviates with standard deviation of 10~4. as the following example shows. — 1) and suppose we are interested in approximating the eigenpair (0. First. Since B itself gives no clue as to which vector to choose. Instead we must specify the part of the spectrum that we are interested in—e. UW) a Ritz pair. Assume that by some method we have come up with the basis Then U is its own left inverse and we may take Since B is zero. in which case our approximate eigenvector becomes (1 \/2 -\/2) » which is completely wrong. a group of eigenvalues with largest real part. will work only if we can insure that there are eigenvalues in B that approximate those of A in the the desired part of the spectrum. It is therefore natural to collect the eigenvalues of B lying in the same chosen part of the spectrum and use them to compute (M. When the concern is with eigenvalues and eigenvectors so that the Ritz pair can be written in the form (A. Example 4. we obtained a matrix B = (UTU)~l UTAU of the T . Second. Let A = diag(0. In practice. ei). But nearly zero is bad enough. Two difficulties The procedure listed in (4. This procedure. Of course. W) is chosen by looking at the eigenvalues of B.g.1. we T might end up with p = (1 1) . and w a primitive Ritz vector. For example. Uw). the eigenpair (M. and UW its Ritz basis.282 CHAPTER 4. any nonzero vector p is an eigenvector corresponding to the part of the spectrum near zero. Recall that in large eigenproblems we cannot expect to compute the entire spectrum. EIGENSPACES We will call any procedure of this form a Rayleigh-Ritz procedure..

4). But in the overwhelming majority of applications U is taken to be orthonormal and V is taken equal to U. 4 approximation to GI that we The problem arises because the process of forming the Rayleigh quotient creates a spurious eigenvalue of B near zero. Symbolically: We will flesh out this equation in the next subsection. This form of the method has important applications (e. RAYLEIGH-RITZ APPROXIMATION form 283 The eigenvector corresponding to the smallest eigenvalue ofB (the primitive Ritz vector) is (9. Rayleigh-Ritz and generalized eigenvalue problems The algorithm (4. The orthogonal Rayleigh-Ritz procedure In its full generality the algorithm (4.SEC.3813e-01)T and the corresponding Ritz vector is This comes nowhere near reproducing the order 10 know lies in Tl(U}.g. then and hence Thus the primitive Ritz pairs are the eigenpairs of the generalized eigenvalue problem (4.4110e-01 3. 4. provided of course that VHU is nonsingular.1) is called an oblique Rayleigh-Ritz method because the subspaces spanned by U and V can differ. This suggests that we may have to postulate a separation between the approximations to the eigenvalues we desire and the other eigenvalues of the Rayleigh quotient. It turns out that such a postulate combined with the convergence of the eigenvalues in the Rayleigh quotient allows us to prove that the Ritz spaces converge as the approximations in U converge. In Vol II: Eigensystems . to the bi-Lanczos algorithm).. If (//. w) is a primitive Ritz pair. we can convert it to a left inverse by replacing it with V = V (FH f/)~ H . If it is not.1) presupposes that V is a left inverse of U.

We call this procedure the orthogonal Rayleigh-Ritz method. Setting the scene Consider the use of the Rayleigh-Ritz procedure in real life. we have 4. ask it to grind out another UQ with a smaller value of 0. Uepe) • If the pair is unsatisfactory. we must show that the process in some sense converges as 0 -»• 0. so that the Ritz basis X — UW is also orthonormal. Theorem 4.2).6 all we need to show is that M = XHAX. we return to our Umachine. Let (M. then M = WHBW (see Theorem 1. We want to approximate the eigenpair (A. and try the RayleighRitz process again. We will treat this problem in two stages.2. x) of A. We will treat the orthogonal variant and for simplicity will only consider the case where we are attempting to approximate a simple eigenvalue and its eigenvector—by far the most common use of the method.Q that converges to A. the orthogonal Rayleigh-Ritz method has the obvious advantage that it preserves symmetry throughout the procedure. Since X = UW. UQ] is small.284 CHAPTER 4. EIGENSPACES addition.3. First we will show that the Rayleigh quotient contains a Ritz value /J. When A is Hermitian. We apply the Rayleigh-Ritz process to obtain a Ritz pair (fj. Then is minimal in any unitarily invariant norm. We will then treat the more delicate question of when the Ritz vector x$ converges. . By some method—call it the U-machine—we obtain an orthonormal basis U0 for which 0 — Z(ar.g. the primitive Ritz basis W is taken to be orthonormal. X) be an orthogonal Rayleigh-Ritz pair with respect to U. By Theorem 2. the following theorem shows that for any matrix the orthogonal method also gives minimum residuals. To justify this process. Proof. But if W is the (orthonormal) primitive Ritz basis. CONVERGENCE This subsection is concerned with the accuracy of the approximations produced by the Rayleigh-Ritz procedure. However.

Proof.1.SEC. Corollary 4. Theorem 4. Hence ||2:||2 = sin 9 and hence or it follows from (4. RAYLEIGH-RlTZ APPROXIMATION 285 Convergence of the Ritz value Our approach to the convergence of Ritz values is through a backward perturbation theorem.4. Let BQ be the Rayleigh quotient (4. 4.5).5. Then there is a matrix EQ satisfying such that A is an eigenvalue of Be + EQ. There is an eigenvalue U.6) that Now define Then clearly We can now apply Eisner's theorem (Theorem 3. Vol II: Eigensystems . Let (Ue U±_) be unitary and set Now Uj_ spans the orthogonal complement of the column space of UQ.Q ofBa such that where m is the order of Be. Chapter 1) to give the following corollary.

EIGENSPACES • Assume the number m of columns of U& is bounded. U&WQ) be a Ritz pair for which U. If there is a constant a > 0 such that then asymptotically . Then the corollary assures us that we only have to wait until 0 is small enough for a Ritz value to emerge that is near A. Convergence of Ritz vectors We now turn to the convergence of Ritz vectors. Chapter 1). let (fig. CHAPTER 4. Theorem 4. suppose we have chosen a Ritz pair (whose existence is guaranteed by Corollary 4. Since \\hg\\2 < \\A\\2.8). if sep(/£0. NO) in (4. The key is the following theorem.7) by \\E\\? (see Theorem 3. NO) remains bounded below. To apply the theorem.5) for which IJLQ —*• A.6.7) suggests that we may have to wait a long time. The factor H-E^H™ in the bound (4. we have Consequently.8. which relates Ritz vectors to the angle 0. at least in the limit.11). we have sin Z(x.286 Two comments. This has important implications for the Lanczos algorithm.Q -» A. In this case we do not have to assume m is bounded. But as we shall see later. so that j_ Then For the proof of the theorem see the notes and references. U0p$) -* 0 along with 0. Let (//<?.7. Then by the continuity of sep (Theorem 2. the convergence will be 0(0). under a certain separation condition. Corollary 4. • If A is Hermitian. In the above notation. we) be a primitive Ritz pair and let the matrix (we We) be unitary. so does the sep(A. which uses a sequence of Krylov spaces in which m can become rather large. Thus we have the following corollary. we may replace the right-hand side of (4.

What is unusual for results of this kind is that it consists entirely of computed quantities and can be monitored. then (7! = cosZ(#0.5 suggests that the convergence of the Ritz value can be quite slow. which accounts for the failure of the Ritz vector to reproduce the eigenvector ei. The conclusion is that the sine of the angle between the eigenvector x and the Ritz vector Uepe approaches zero. Specifically. XQ] be the Ritz approximation to (A.SEC. x). Hermitian matrices When A is Hermitian.#). If the uniform separation is satisfied. Then by construction. Let (fig. even though it is present in U to accuracy of order 10~4.:r) and \cr\ = sin/(£#. in the second part of Example 4. RAYLEIGH-RITZ APPROXIMATION 287 This corollary justifies the statement (4.7 we have a\ = sin Z(z0. By direct computation of the Rayleigh quotient x^Axg we find that Thus the Ritz value converges at least as fast as the eigenvector approximation of x in K(U8). If we write XQ — 70. Convergence of Ritz values revisited The bound in Corollary 4. We will now use the fact that the Ritz vector converges to derive a more satisfactory result. we can say more. -f <jy. 4.3) to the effect that eigenvalue convergence plus separation equals eigenvector convergence.2 the separation of the eigenvalues of B is about 1. x) = O(9). where y 1.9-10"4. For example. x and ||t/||2 = 1. We call the condition (4.9) the uniform separation condition. Let the eigenvalues of A be ordered so that and let the Ritz values—the eigenvalues of the Hermitian matrix U AU—be ordered so that Vol II: Eigensystems . it contains the convergence hypothesis that JJLQ approaches A and the separation hypothesis that the quantity sep(//0. Ng) is uniformly bounded below. is the Rayleigh quotient formed from XQ and A. by Corollary 4.

we must look to the residual as an indication of convergence. x) be an approximation to (A. we can use the Rayleigh-Ritz method to approximate eigenspaces with the same assurance that we use it to approximate eigenvectors. Theorem 4.6. Chapter 1).10) we have x^Av = 0. it must converge from above.then it must converge from below. Then .11) is a spectral decomposition. the proofs are essentially the same. Consequently. Let T = Ax — f i x . Similarly. The following residual bound actually applies to any approximate eigenpair.5. Residuals and convergence The results of this subsection have been cast in terms of the sine of the angle 9 between an eigenvector x and its approximating subspace U. and. in the last analysis we cannot know 0 and hence cannot compute error bounds. if /^converges to A n _ m+ . Theorem 4. if m converges to A. Y is an orthonormal basis for the orthogonal complement of the space spanned by x. and hence Convergence of Ritz blocks and bases Theorem 4.7. Thus. The Ritz values converge more quickly in the Hermitian case.6. Because (4. Let (^. as always. with the exception of Corollary 4. Let A have the spectral representation where \\x\\% — 1 and Y is orthonormal. Consequently.7 all have natural analogues for Ritz blocks and Ritz bases.8. EIGENSPACES Then by the interlacing theorem (Theorem 3. For more see the notes and references.288 CHAPTER 4. and Corollary 4. Although these results give us insight into the behavior of the Rayleigh-Ritz method. For in (4. x) and let Then Proof.#) = Y^x.4.. Thus sin/(#. Corollary 4.

The following theorem gives quantitative substance to this assertion.X\ (see Theorem 2.fJ>0%9 approaches zero. • Since A is assumed to be simple. RAYLEIGH-RlTZ APPROXIMATION 289 It follows that from which the first inequality in (4. A REFINED RITZ VECTOR is a solution of the problem We will worry about the computation of refined Ritz vectors later. that residual must approach zero. if we solve the problem using the smallest eigenvalue of B in (4. Convergence of refined Ritz vectors We now turn to the convergence of refined Ritz vectors.11). Vol II: Eigensystems . We therefore turn to two alternatives to Ritz vectors. there is a Ritz value pe that is approaching A.9.5.\^i . This suggests the following definition. x).3. L) > sep(A. spurious Ritz values can cause the residual to fail to converge. It follows that the residual Axe .Ve&d has the smallest norm.2. By Corollary 4. 4. The second inequality follows from the first and the fact that sep(/z. We have argued informally that the convergence of the refined Ritz value u-o to the simple eigenvalue A is sufficient to insure the convergence of the refined Ritz vector.2) we get the vector which is an excellent approximation to the original eigenvector ei of A. REFINED RITZ VECTORS Definition By hypothesis. as Example 4. Let U. Definition 4.2 shows. which showed how the Rayleigh-Ritz method could fail to deliver an accurate approximation to an accurate eigenvector contained in a subspace. First let us return to Example 4. 4.Q be a Ritz value associated with U$. this theorem says that: Unfortunately. Hence if we define xg to be the vector whose residual Axe .SEC.12) follows. Specifically. the space U$ contains increasingly accurate approximations XQ to the vector in the simple eigenpair (A. L) .

• Here are two comments on this theorem. • By Corollary 4. Moreover since y and z are orthogonal. where y = UUHx. EIGENSPACES where ||z||2 = 1 and Y is orthonormal.5 we know that fig -» A. . refined Ritz vectors. L) . Let fig be a Ritz value and XQ the corresponding refined Ritz vector.15) that sin Z(ar. Let be the normalized value of y.6. \\y\\\ = 1 — sin2 9. by Theorem 2.8. then Proof. are guaranteed to converge. Moreover.g — A| > 0. the residual norm \\Axg . If sep(A.\fj. We have Hence By the definition of a refined Ritz vector we have The result now follows from Theorem 4.10. unlike ordinary Ritz vectors. Let U be an orthonormal basis for U and let a: = y+z. In other words. Let A have the spectral representation CHAPTER 4.290 Theorem 4. the Rayleigh quotient fa = x jAxg is likely to be a more accurate approximation to A. XQ) —> 0. Then \\z\\2 = ^U^x^ = sin#. It follows from (4.jji0x0\\2 will be optimal. • Although the Ritz value //# was used to define the refined Ritz vector XQ.

SEC. Thus we need to solve the problem By Theorem 3. which requires O(np2) operations. This suggests the following algorithm. where \\z\\2 = 1.fj. It is natural to compare this process with the computation of an ordinary Ritz vector. Chapter 3. Then any vector of 2-norm one in U can be written in the form Uz. say m. are to be computed. As with the formation of block Krylov sequences. it will often happen that the formation of AU takes less time than p individual matrix-vector multiplications. 4. The next nontrivial step is the computation of the singular value decomposition of W. The situation is different when several vectors. On the other hand. This computation must also be done by the Rayleigh-Ritz method. In this case. On the other hand. The computation is dominated by the formation of V and the computation of the singular value decomposition. where n ~^> p. Thus all Ritz vectors can be calculated with 0(np2) operations. We will assume that U is nxp. Specifically.I)U. Chapter 3) to compute the singular value decompositions. which is also an 0(np2) process. the Rayleigh-Ritz method must calculate H = UEV. Hence the computation of m refined Ritz vectors will take longer than the computation of m ordinary Ritz vectors by a factor of at least m. we will have to compute m singular value decompositions to get the refined vectors. the Rayleigh-Ritz method requires only that we compute the eigensystem of H. albeit with a smaller order constant. The formation of V requires p multiplications of a vector by A. the computation of a single refined Ritz vector and a single ordinary Ritz vector require the same order of work.2. the solution of this problem is the right singular vector of (A . while the computation of each refined Ritz vector requires O(np2) operations. the cross-product Vol II: Eigensystems .1. RAYLEIGH-RlTZ APPROXIMATION 291 The computation of refined Ritz vectors The computation of a refined Ritz vector amounts to solving the optimization problem Let U be an orthonormal basis for U. A cure for this problem is to use the cross-product algorithm (Algorithm 3.fjLl)U corresponding to its smallest singular value. Thus we can compute refined Ritz vectors by computing the singular value decomposition of (A . Thus.

we can form at negligible cost.2 lies between the eigenvalues 1 and —1 and for this reason is called an interior eigenvalue of A. HARMONIC RITZ VECTORS The eigenvalue 0 of A in Example 4. Then by (3.17) we precompute the matrices then for any fj. 4. Definition 4.4. the computation of refined Ritz values can be considerably abbreviated. denote the singular vectors of AU — ^1 by a\ > • • • > ap. we note that if U is a Krylov subspace.11. If // is an accurate approximation to an eigenvalue A. Thus the cost of computing an additional refined Ritz vectors is the same as computing a spectral decomposition of a matrix of order p. Uw) is a HARMONIC RITZ PAIR WITH SHIFT K if . as we shall see in §1. Specifically. each refined Ritz vector must be calculated independently from its own distinct value of//. The use of refined Ritz vectors solves the problem by insuring that the residual Ax — fix is small — something that is not true of the spurious Ritz vector. EIGENSPACES If in the notation of (4. the quality of the refined Ritz vector depends on the accuracy of ^. Interior eigenvalues are particularly susceptible to being impersonated by Rayleigh quotients of the form uHAu. we expect ap to be small. such an impersonation is the cause of the failure of the Rayleigh-Ritz method in the example. Then (K + £. can provide a complete set of approximating pairs while at the same time protecting pairs near a shift K against impersonators. Chapter 3. on the other hand. Our analysis of the cross-product algorithm applies here. LetU be a subspace and U be an orthonormal basis for U. There is no really good way of introducing them other than to begin with a definition and then go on to their properties. Unfortunately.20). with i = p and j = p—ly we see that the sine of the angle between the computed vector and the true vector is of order (<J p _i/cri) 2 e M . Finally. Harmonic Ritz vectors. where u is not near an eigenvector of A.4.If &p-i is not too small — something we can check during the calculation — then the refined Ritz vector computed by the cross-product algorithm will be satisfactory. Chapter 5. In fact. as we have seen.292 matrix corresponding to A U — ull is CHAPTER 4. and.

SEC. 4. RAYLEIGH-RITZ APPROXIMATION

293

We will refer to the process of approximating eigenpairs by harmonic Ritz pairs as the harmonic Rayleigh-Ritz method. This method shares with the ordinary RayleighRitz method the property that it retrieves eigenvectors that are exactly present in U. Specifically, we have the following result. Theorem 4.12. Let (A, x) be an eigenpair of A with x = Uw. Then (A, Uw) is a harmonic Ritz nair. Proof. Calculating the left- and right-hand sides of (4.18), we find

Hence Uw is an eigenvector of (4.18) with eigenvalue b — A — K; i.e., (A, Uw) is a harmonic Ritz pair. • This theorem leads us to expect that if x is approximately represented in U, then the harmonic Rayleigh-Ritz, like its unharmonious counterpart, will produce an approximation to x. Unfortunately, as of this writing the method has not been analyzed. But it is easy to see informally that it protects eigenpairs near K, from imposters. Specifically, on multiplying (4.18) by W H , we gel or

Consequently, if any harmonic Ritz value is within 5 of K the pair must have a residual norm (with respect to K) bounded by \S\. For example, if AC —> A and 6 —> 0, then it eventually becomes impossible for the harmonic Rayleigh-Ritz method to produce impersonators. It should be stressed that A does not have to be very near K for this screening to occur. If we return to Example 4.2 and compute the harmonic Ritz vector associated with the eigenvalue zero setting K = 0.3, we obtain the following excellent approximation to ei:

Computation of harmonic Ritz pairs When U contains a good approximation to an eigenvector whose eigenvalue lies near AC, (A - Kl)U in (4.18) will have a small singular value which will be squared when we form U^(A - K,I^(A - «/)£/, which can result in subsequent inaccuracies (see §3.2, Chapter 3). To circumvent this problem, we can compute the QR factorization

Vol II: Eigensystems

294

CHAPTER 4. EIGENSPACES

Then (4.18) assumes the form

Hence

This eigenproblem can be solved by the QZ algorithm (§4.3, Chapter 2). An alternative is to work with the ordinary eigenproblem

In this case a small singular value in R will cause R l and hence R 1Q U to be large. Fortunately, we are interested in harmonic Ritz values that are near K, i.e., values for which 6 is small. Thus we want eigenpairs corresponding to large eigenvalues in (4.23). Although the eigenvalue itself will not necessarily be accurately computed (the loss of accuracy occurs in forming R, not in foming R~1Q^U), if the eigenvalue is sufficiently well separated from it neighbors the eigenvector p will be well determined. The harmonic Ritz values are useful primarily in locating the harmonic Ritz vectors we need. When it comes to approximating the associated eigenvalue, we should use the ordinary Rayleigh quotient WH UAUw, which minimizes the residual norm. In computing this Rayleigh quotient we can avoid a multiplication by A by using quantities alreadv calculated. SDecificallv, from (4.20)

Hence

where x is the harmonic Ritz vector. When A is Hermitian, both the eigenproblems (4.22) and (4.23), which are essentially nonsymmetric, can give complex harmonic Ritz values in the presence of rounding error. However, we can use the same technique that we used for the S/PD problem (§4, Chapter 3). Specifically, we can transform (4.23) into the Hermitian eigenproblem

for the vector Rw. This approach has the slight disadvantage that having computed a solution g we must solve the system Rw = g to retrieve the primitive harmonic Ritz vector. However, the usual Ravleidi Quotient has the simpler form

SEC. 4. RAYLEIGH-RITZ APPROXIMATION

295

4.5. NOTES AND REFERENCES Historical In 1899, Lord Rayleigh [215] showed how to compute the fundamental frequency of a vibrating system. His approach was to approximate the fundamental mode by a linear combination of functions. He chose the linear combination x for which the quantity A = XT Ax/x^x was minimized and took A for the fundamental frequency. In 1909 Ritz [217] proposed a general method for solving continuous problems that could be phrased as the minimization of a functional. His idea was to choose a suitable subspace (e.g., of low-order polynomials) and determine coefficients to minimize the functional. His example of an eigenproblem—a second-order differential equation—does not, strictly speaking, fit into his general scheme. Although the mode corresponding to the smallest eigenvalue minimizes a functional, the other eigenvalues are found from the stationary points. The Rayleigh-Ritz method is sometimes called a Galerkin method or a GalerkinPetrov method. These methods solve a linear system Ax — b in the operator A by choosing two subspaces U and V and choosing x to satisfy the conditions

If U = V the method is called a Galerkin method; if not it is called a Galerkin-Petrov method. Now let V be the subspace spanned by V in (4.1). Then any Ritz pair (ju, x) satisfies the conditions

The similarity of (4.25) and (4.26) explains the nomenclature. The subject is not well served by this proliferation of unnecessary terminology, and here we subsume it all under the terms Rayleigh-Ritz method, Ritz pair, etc. The Rayleigh-Ritz method is also called a projection method because a Ritz pair (/x, x) is an eigenpair of the matrix F^APy, where P\> and PU are the orthogonal projections ontoZY and V. Convergence For earlier work on the convergence of the Rayleigh-Ritz method see the books by Parlett [206] for Hermitian matrices and by Saad [233] for both Hermitian and nonHermitian matrices. The convergence theory developed here is essentially that of Jia and Stewart for Ritz vectors [133] and for Ritz bases [134]. Theorem 4.6 is a generalization by Stewart of a theorem of Saad [231] for Hermitian matrices (also see [148]) and can in turn be extended to Ritz bases. For a proof see [268]. Refined Ritz vectors Refined Ritz vectors whose residual norms are not necessarily minimal were introduced by Ruhe [220] in connection with Krylov sequence methods for the generalVol II: Eigensystems

296

CHAPTER 4. EIGENSPACES

ized eigenvalue problem. Refined Ritz vectors in the sense used here are due to Jia [131], who showed that they converge when the Ritz vectors fail to converge. The proof given here may be found in [134]. The use of the cross-product algorithm to compute refined Ritz vectors was suggested in [132]. Harmonic Ritz vectors For a symmetric matrix, the (unshifted) harmonic Ritz values are zeros of certain polynomials associated with the MINRES method for solving linear systems. They are called "harmonic" because they are harmonic averages of the Ritz values of A~l. For a survey and further references see the paper [201] by Paige, Parlett, and van der Vorst, who introduced the appellation harmonic. Morgan [179] introduced the harmonic Ritz vectors as good approximations to interior eigenvectors. His argument (still for symmetric matrices) was that since eigenvalues near K become exterior eigenvalues of (A — Ac/)"1, to compute their eigenvectors we should apply the Rayleigh-Ritz method with basis U to (A — K I)"1. However, we can eliminate the need to work with this costly matrix by instead working with the basis (A — K,I)U. This leads directly to equation (4.18). But the resulting Ritz vector (A - K,I)Up will be deficient in components in eigenvalues lying near K. Hence we should return the vectors Up, which are not Ritz vectors, at least in the usual sense for symmetric matrices. Morgan also recommended replacing the harmonic Ritz value with the ordinary Rayleigh quotient. Since Morgan was working with Hermitian matrices, he was able to establish some suggestive results about the process. Harmonic Ritz values for non-Hermitian matrices were described by Freund [78], again in connection with iterative methods for solving linear systems. Harmonic Ritz vectors were introduced by Sleijpen and van der Vorst [240], who characterized them (in our notation) by the Galerkin-Petrov condition

The use of the inequality (4.19) to justify the use of harmonic Ritz vectors is new. The explicit reduction of the harmonic eigenproblem to the forms (4.22), (4.23), and the symmetric form (4.24) is new, although the symmetric form is implicit in the implementation of the Jacobi-Davidson algorithm in [243]. Another way of looking at these reductions is that when Tl(U} contains an eigevector of A corresponding to eigenvalue r, the pencil in the original definition (4.18) is not regular, while the transformed pencil (4.22) is. For the generalized eigenvalue problem Ax — \Bx, a natural generalization of a harmonic Ritz pair is a solution of

which was first introduced in [74]. An argument similar to that leading to (4.19) shows that this method also discourages imposters.

5

KRYLOV SEQUENCE METHODS

In this chapter and the next we will treat algorithms for large eigenproblems. There are two ways of presenting this subject. The first is by the kind of problem—the symmetric eigenproblem, the generalized eigenproblem, etc. This approach has obvious advantages for the person who is interested in only one kind of problem. It has the disadvantage that, at the very least, a thumbnail sketch of each algorithm must accompany each problem type. The alternative is to partition the subject by algorithms, which is what we will do here. In the present chapter we will be concerned with methods based on Krylov sequences— specifically, Arnoldi's method and the Lanczos method. In the first subsection we will introduce certain decompositions associated with Krylov subspaces. The decompositions form the basis for the the Arnoldi and Lanczos algorithms to be treated in §2 and §3.

1. KRYLOV DECOMPOSITIONS

For a fixed matrix A, not every subspace of dimension fc is a Krylov subspace of A. This means that any basis of a Krylov subspace must satisfy certain constraining relations, which we will call Krylov decompositions. It turns out that a knowledge of these decompositions enables us to compute and manipulate Krylov subspaces. Consequently, this section is devoted to describing the relations that must hold for a basis to span a Krylov subspace. The most widely used Krylov decompositions are those that result from orthogonalizing a Krylov sequence. They are called Arnoldi decompositions for non-Hermitian matrices and Lanczos decompositions for Hermitian matrices. We will treat these special decompositions in the first two subsections. Unfortunately, these decompositions are not closed under certain natural operations, and in §1.3 we introduce more general decompositions that are closed — decompositions that will be used in our computational algorithms.

297

298

CHAPTER 5. KRYLOV SEQUENCE METHODS

1.1. ARNOLDI DECOMPOSITIONS

An explicit Krylov basis of the form

is not suitable for numerical work. The reason is that as k increases, the columns of Kk become increasingly dependent (since they tend increasingly to lie in the space spanned by the dominant eigenvectors). For example, the following table gives the condition number (the ratio of the largest to the smallest singular values) for the Krylov basis Kk( A, e), where A is the matrix of Example 3.1, Chapter 4. k Condition 5 5.8e+02 10 3.4e+06 15 2.7e+10 20 3.1e+14 A large condition number means that the space spanned by the basis is extremely sensitive to small perturbations in the vectors that form the basis. It is therefore natural to replace a Krylov basis with a better-conditioned set of vectors, say an orthonormal set. Unfortunately, a direct orthogonalization of the columns of Kk will not solve the problem; once the columns have become nearly dependent, information about the entire Krylov subspace has been lost, and orthogonalization will not restore it. We therefore proceed indirectly through what we will call an Arnoldi decomposition. The Arnoldi decomposition Let us suppose that the columns of Kk+i are linearly independent and let

be the QR factorization of Kk+i (see §6, Chapter 2.3). Then the columns of Uk+i are the results of successively orthogonalizing the columns of Kk+i • It turns out that we can eliminate Kk+i from equation (1.1). The resulting Arnoldi decomposition characterizes the orthonormal bases that are obtained from a Krylov sequence. Theorem 1.1. Letu\ have 2-norm one and let the columns of Kk+i(A,ui) belinearly independent. Let Uk+i — (u\ • • • Uk+i) be the Q-factor ofKk+i- Then there is a (&+l)xAr unreduced upper Hessenberg matrix Hk such that

Conversely, ifUk+i is orthonormal and satisfies (1.2), where Hk is a (k+l)xk unreduced Hessenberg matrix, then Uk+i is the Q-f actor of Kk+i(A, ui).

**SEC. 1. KRYLOV DECOMPOSITIONS
**

Proof. If we partition (1.1) in the form

299

we see that Kk = UkRk is the QR factorization of AV Now let Sk — Rk l. Then

where

It is easily verified that Hk is a (k+l)xk Hessenberg matrix whose subdiagonal elements are hi+iti = ri+ij+isa = n+i,z+i/ r n- Thus by the nonsingularity of Rk, Hk is unreduced. Conversely, suppose that the orthonormal matrix Uk+i satisfies the relation (1.2), where Hk i s a ( f c + l ) x f c unreduced Hessenberg matrix. Now if k = 1, then Au\ — huui 4- h,2iU2. Since /i2i ^ 0, the vector u2 is a linear combination of HI and Au\ that is orthogonal to u\. Hence (u\ u^} is the Q-factor of AY Assume inductively that Uk is the Q-factor of Kk- If we partition

then from (1.3)

Thus Uk+i is a linear combination of Auk and the columns of Uk and lies in JCk+iHence Uk+i is the Q-factor of Kk+i- • This theorem justifies the following definition. Definition 1.2. Let Uk+i G Cnx^k+l^ be orthonormal and let Uk consist of the first k columns ofUk+i • If there is a (&+1) x k unreduced upper Hessenberg matrix Hk such that

then we call (1.3) an ARNOLDI DECOMPOSITION OF ORDER k. If H is reduced, we say the Amoldi decomposition is REDUCED. Vol II: Eigensystems

300

CHAPTER 5. KRYLOV SEQUENCE METHODS

In this nomenclature, Theorem 1.1 says that every nondegenerate Krylov sequence is associated with an Araoldi decomposition and vice versa. An Amoldi decomposition can be written in a different form that will be useful in the sequel. Specifically, partition

and set

Then the Arnoldi decomposition (1.2) is equivalent to the relation

In what follows it will reduce clutter if we dispense with iteration subscripts and write (1.4) in the form

When we do, the compact form (1.3) will be written in the form

Thus the hats over the matrices U and H represent an augmentation by a column and a row respectively. Uniqueness The language we have used above—an Arnoldi decomposition—suggests the possibility of having more than one Amoldi decomposition associated with a Krylov subspace. In fact, this can happen only when the sequence terminates, as the following theorem shows. Theorem 1.3. Suppose theKrylov sequence Kk+\(A, u\) does not terminate at k+1. Then up to scaling of the columns of Uk+i, the Amoldi decomposition of Kk+i is unique. Proof. Let

be the Amoldi decomposition from Theorem 1.1. Then by this theorem Hk is unreduced and fa ^ 0. Let

SEC. 1. KRYLOV DECOMPOSITIONS

301

where Uk+i is an orthonormal basis for /Cjt+i(A, wi). We claim that U^Uk+\ = 0. For otherwise there is a Uj such that Uj = auk+i + Uka, where a ^ 0. Hence Aitj contains a component along Ak+l u and does not lie in /Cjt+i, a contradiction. It follows that H(Uk) = ^(Uk) and hence that Uk = UkQ, for some unitary matrix Q. Hence

or

On premultiplying (1.5) and (1.7) by Uk we obtain

Similarly, premultiplying by u^+l, we obtain

Now by the independence of the columns of Kk+i, both (3k and (3k are nonzero. It then follows from (1.8) that the last row of Q is a>£e£, where \u>k\ = 1. Since the norm of the last column of Q is one, the last column of Q is u^e^. Since Hk is unreduced, it follows from the implicit Q theorem (Theorem 3.3, Chapter 2) that Q = diag(u>i,... ,tjfc), where \MJ = 1 (j = 1,... , fc). Thus up to column scaling Uk = UkQ is the same as Uk- Subtracting (1.7) from (1.5) we find that

so that up to scaling Uk+i and tik+i are the same. •

This theorem allows us to speak of the Arnoldi decomposition associated with a nonterminating Krylov subspace. Perhaps more important, it implies that there is a one-one correspondence between Krylov subspaces and their starting vectors. A nonterminating Krylov subspace uniquely determines its starting vector. The nontermination hypothesis is necessary. If, for example, /Cfc+i = /C^+2, then AUk+i = Uk+iHk+i> and for any Q such that Q^Hk+iQ is upper Hessenberg, the decomposition A(Uk+\Q) = (Uk+iQ}(Q^Hk+iQ] is another Arnoldi decomposition. The Rayleigh quotient In the proof of Theorem 1.3 we used the fact that Hk = U^AUk- But this matrix is just the Rayleigh quotient with respect to the basis Uk (see Definition 2.7, Chapter 4). Thus the Arnoldi decomposition provides all that is needed to compute Ritz pairs. Vol II: Eigensystems

Then Hk is reduced if and only ifR. Let A be the corresponding eigenvalue and let Then Because H\ is unreduced. Theorem 1. It follows that w = 0. so that U is an eigenbasis of A. Uk. Chapter 4. Proof. . the sequence can be continued by setting wj+i equal to any normalized vector in the orthogonal complement of Uj. The following theorem answers that question. Computing Arnoldi decompositions The Arnoldi decomposition suggests a method for computing the vector Uk+i from HI . .(Uk) and suppose that H k is unreduced. Then lZ(Uk) contains an eigenvector of A. Conversely. which we can write in the form x = UkW for some w. along with Theorem 3.4. First suppose that Hk is reduced. . However.2 we have required that the Hessenberg part of an Arnoldi decomposition AUk = Uk+iHk be unreduced.5. • This theorem. Let the orthonormal matrix Uk+i satisfy where Hk is Hessenberg.(Uk) contains an eigenspace of A. the result is a reduced Arnoldi decomposition. It is natural to ask under what conditions an Arnoldi-like relation can be satisfied with a reduced Hessenberg matrix.302 CHAPTER 5. say that hj+ij — 0. Write the fcth column of (1. the matrix Uk+iH\ is of full column rank. . If this procedure is repeated each time the sequence terminates. Let H be the leading principal submatrix of Hk of order k and let U be the matrix consisting of the first j columns of Uk+i • Then it is easily verified that AU — UH. a contradiction. suppose that A has an eigenspace that is a subset ofR. if hj+ij = 0. KRYLOV SEQUENCE METHODS Reduced Arnold! decompositions In Definition 1. shows that the only way for an Arnoldi decomposition to be reduced is for the Krylov sequence to terminate prematurely.4) in the form Then from the orthonormality of Uk+i we have .

It is illustrated in the following extension of (1. the extended algorithm therefore orthogonalizes it again. VolII: Eigensystems . the computation of wjt+i is actually a form of the well-known Gram-Schmidt algorithm. This algorithm has been treated extensively in §1:4. Since the vector v in statement 3 need not be orthogonal to the columns of Uk.1. The cure is a process called reorthogonalization.2. where it is shown that in the presence of inexact arithmetic cancellation in statement 3 can cause it to fail to produce orthogonal vectors. accumulating any changes in h^. 1.9) from the Arnoldi decomposition.4.SEC.9). Chapter 2. It is worthwhile to verify that this process is the same as the reduction to Hessenberg form described in §3. We call this procedure the Arnoldiprocess. KRYLOV DECOMPOSITIONS Since we must have 303 and These considerations lead to the following algorithm for computing an expanding set of Arnoldi decompositions. Reorthogonalization Although we derived (1.

what to do when v in statement 3 is zero. Chapter 2. & — 1 . Hence. one must apply the algorithm for orders i = 1. More important.1: The Arnoldi step In general. Algorithm 1.304 Given an Arnoldi decompositioi CHAPTER 5.g.1. In general r will be one or two.1. KRYLOV SEQUENCE METHODS of order k. The cost of the orthogonalization will be 2nkr flam. Thus: .1 repeatedly. and its cost will vary.1 —uses a function gsreorthog (Algorithm 1:4. where r is the number of orthogonalization steps performed by gsreorthog. one can compute an Arnoldi decomposition of any order. The algorithm uses an auxiliary routine gsreorthog to perform the orthogonalization (Aleorithml:4.3) that handles exceptional cases and tests whether to reorthogonalize. we can often dispense with the reorthogonalization entirely. For more see Example 1. ExpandArn extends it to an Arnoldi decomposition of order k + 1. However. . The user must provide a method for computing the former. Practical considerations The work in computing an Arnoldi step is divided between the computation of the vector-matrix product Au and the orthogonalization. our final algorithm—Algorithm 1. this algorithm is quite satisfactory—usually no more than the one reorthogonalization performed in statements 4-6 is needed.2. . Hence: To compute an Arnoldi decomposition of order k from scratch. e.. . By applying Algorithm 1.13). . there are some tricky special cases.

so that the effects of rounding error in the formation of the matrix-vector product AiUk are minimal.12). is in the Arnoldi procedure. With Krylov sequences. where the concern is with one eigenpair. 1. The problem. the Krylov subspaces /C&(A t -. For the inverse power method. Now in the presence of rounding error. we have taken A{ to be diagonal. No formal analysis of this phenomenon has appeared in the literature. Thus shift-and-invert enhancement can be used to steer a Krylov sequence to a region whose eigenvalues we are interested in. KRYLOV DECOMPOSITIONS 305 It requires nk floating-point words to store the matrix U. If (A — ft/)""1 is used to form a Krylov sequence. In any Krylov sequence the Arnoldi vector u^+i is orthogonal to /Cjfe.1 plots sin 0.2. as good approximations to e2 appear. then as k grows we will quickly run out of memory. although more slowly. The bound (3.k+i must be almost orthogonal to it. an excessively accurate approximation to one eigenvalue can cause the convergence to the other eigenpairs to stagnate. Chapter 4 suggests that it will also produce approximations to 62. If n is very large. where the concern is often with several eigenpairs. Let be of order 30 with the last 29 eigenvalues decaying geometrically. For i > 1. Orthogonalization and shift-and-invert enhancement We have seen in connection with the inverse power method (§1. Of course we can move the columns of U to a backing store—say a disk. the second component of Uk will approach zero. as the following example shows. The convergence of sin 9 toward zero proceeds typically up to a point and then levels off. e) can be expected to quickly develop very good approximations to GI. But each Arnoldi step requires all the previous vectors. at least in part.fck(Ai. Similarly. then v. the sequence will tend to produce good approximations to those large eigenvalues. The larger the value ofi. the sooner the convergence stagnates. this means that the first component of Uk will rapidly approach zero as good approximations to ei appear in £*. restricts our ability to compute Arnoldi decompositions. Example 1. more than operations counts.SEC. Figure 1. Thus the effects observed in the figure are due to the combination of the large eigenvalue and the orthogonalization step. e)] versus k for several values ofi. Hence if there is an accurate approximation to any vector in /Cjt. In this example we have simulated a shift-and-invert transformation by giving A{ a large eigenvalue of 10*. the closer the shift is to the eigenvalue the better the performance of the method. but what seems to be happening is the following. and transfers between backing store and main memory will overwhelm the calculation. This fact. For our example. Chapter 2) that replacing A by (A — K!)~I will make eigenvalues near K large. where 0 = ^[G2. the first component of Uk will not become exactly zero but will be reduced to Vol Hi Eigensystems .5. In fact. Moreover.

It follows that Tk is tridiagonal and can be . Tk is Hermitian. and we can expect no greater accuracy in the approximations to the other eigenvectors. Specifically.2. Then the Rayleigh quotient Tk is upper triangular. let A be Hermitian and let be an Arnoldi decomposition. The moral of this example is that if you wish to use shift-and-invert enhancement to find eigenpairs in a region. Thus we have effectively changed our rounding unit from 10~16 to 10l~16. But since Tk = U^AUk. But this nonzero first component is magnified by 10* when Auk is formed. this spurious first component will introduce errors of order 10*~16 in the other components. do not let the shift fall too near any particular eigenvalue in the region.306 CHAPTER 5. When Auk is reorthogonalized. 1. LANCZOS DECOMPOSITIONS Lanczos decompositions When A is Hermitian the Amoldi decomposition and the associated orthogonalization process simplify.1: Stagnation in a Krylov sequence the order of the rounding unit—say 10~16. KRYLOV SEQUENCE METHODS Figure 1.

12) we get the relation where This is the Lanczos three-term recurrence. .1. For more on variants of the basic algorithm. see the notes and references. the Lanczos decomposition is essentially unique. • A frequently occuring alternative is to calculate This is the equivalent of the modified Gram-Schmidt algorithm (see §1:4. w^.. This will always be true if the sequence does not terminate. note that the first column of the Lanczos decomposition (1.. Both work well. • Since /2j_i is taken to be real. from the jth column of (1. KRYLOV DECOMPOSITIONS 307 We will call the decomposition (1. in contrast to the Arnoldi procedure where Uk depends on u\. two inner products. and two subtractions of a constant times a vector.. from the orthonormality of HI and w 2 it follows that «i = v^Aui and we may take0i = \\Au\ — ^ui\\2. we do not have to conjugate it in statement 5. More generally. Hence: Vol II: Eigensystems . Like the Arnoldi.SEC.12) is or Moreover.2. To see this. 1.12) a Lanczos decomposition. Here are three comments. which is implemented in Algonthm 1. • The body of the loop requires one multiplication by A. provided that the matrix Tk is unreduced.4). The Lanczos recurrence The fact that Tk is tridiagonal implies that the orthogonalization Uk+i depends only on Uk and t/fc-i.

11) for the Arnoldi process. KRYLOV DECOMPOSITIONS An Arnoldi (or Lanczos) decomposition can be regarded as a vehicle for combining Krylov sequences with Rayleigh-Ritz refinement. one must retain all the vectors u3.308 CHAPTER 5. if we reorthogonalize Uj+\ in the Lanczos against its predecessors. there are formidable obstacles to turning it into a working method. and for large problems it will be necessary to transfer them to a backing store. However. we can discard the Uj as they are generated and compute Ritz values from the Rayleigh quotient T^.2 is k matrix-vector multiplications and 4nk flam. In the Arnoldi process we were able to overcome this problem by reorthogonalization. Algorithm 1. It exhibits an orthonormal basis for a Krylov subspace and its Rayleigh quotient—the wherewithal for computing Ritz pairs.13). As we shall see. This algorithm generates the Lanczos decomposition where I\ has the form (1. KRYLOV SEQUENCE METHODS Let HI be given. where we will treat the Lanczos algorithm in detail. we can do little more with it. But if one wants to compute eigenvectors.3. we lose all the advantages of the Lanczos recurrence. 1. because of its uniqueness. Although the simplicity and economy of the Lanczos recurrence is enticing. The above is less by a factor of about k than the corresponding count (1. We will return to this problem in §3. For .2: The Lanczos recurrence The operation count for Algorithm 1. However. If only eigenvalues are desired. it also permits the efficient computation of the residual norms of the Ritz pairs. A more serious problem is loss of orthogonality among the vectors Uj..

Specifically. 1. In this subsection we will show how to relax the definition of an Arnoldi decomposition to obtain a decomposition that is closed under similarity transformations of its Rayleigh quotient. We call the H(Uk+i) the SPACE SPANNED BY THE DECOMPOSITION and Uk+i the basis for the decomposition. KRYLOV DECOMPOSITIONS 309 example.6. Definition A Krylov decomposition.16) that IT In particular. Thus a Krylov decomposition has the general form as an Arnoldi decomposition. Definition 1. It is uniquely determined by the basis Uk+i> For if (V v) is any left inverse for Uk+i. Uk+i be linearly independent and let A KRYLOV DECOMPOSITION OF ORDER k is a relation of the form Equivalently.. we say that the Krylov decomposition is ORTHONORMAL. then it follows from (1. specifies a relation among a set of linearly independent vectors that guarantees that they span a Krylov subspace. Two Krylov decompositions spanning the same space are said to be EQUIVALENT. we can write IfUk+i is orthonormal. we make the following definition.7. 1^2. Bk is a Rayleigh quotient of A. but the matrix B^ is not assumed to be of any particular form.SEC. Let Vol II: Eigensystems . • • . like an Arnoldi decomposition. Here is a nontrivial example of a Krylov decomposition. Example 1. there is no unreduced Arnold! form whose Rayleigh quotient is triangular. Let u\.

Similarity transformations Let be a Krylov decomposition and let Q be nonsingular. however. HI) are independent. then is a Krylov decomposition whose basis is the Krylov basis. There is also a Krylov-Schur decomposition. In that case. With a general Krylov decomposition. Every Arnold! decomposition is a Krylov decomposition. as the above example shows.310 CHAPTER 5. which will play an important role in our treatment of the Arnoldi method (see §2. but not every Krylov decomposition is an Arnoldi decomposition. we could reduce B to Jordan form. in which case we will say that the decomposition is a KrylovJordan decomposition. Translation The similarity transformation (1. however. In the Krylov decomposition let . the transforming matrix Q necessarily satisfied \Q\ = I so that the two similar decompositions were essentially the same. For example. If the columns ofKk+i(A.6)]. We have already met with a similarity transformation in the proof of the uniqueness of the Arnoldi decomposition [see (1. If the Arnoldi decomposition is unreduced. In this terminology an Arnoldi decomposition is an orthonormal Krylov-Hessenberg decomposition and a Lanczos decomposition is an orthonormal Krylov-tridiagonal decomposition. every Krylov decomposition is equivalent to a (possibly unreduced) Arnoldi decomposition. They are called similarity transformations and translations. However. KRYLOV SEQUENCE METHODS be of order k. then it is essentially unique. We can transform this vector as follows.18) leaves the vector Uk+i unaltered. and the subspace of the original Krylov decomposition must be a nondegenerate Krylov subspace. Then on postmultiplying by Q we obtain the equivalent Krylov decomposition We will say that the two Krylov decompositions are similar.2). To establish these facts we will introduce two classes of transformations that transform a Krylov decomposition into an equivalent Krylov decomposition. we are free to reduce the Rayleigh quotient B to any convenient form.

KRYLOV DECOMPOSITIONS 311 where by the independence of the vectors Uj we must have 7 ^ 0 . let Q be a unitary matrix such that 6HQ = ||&||2eJ and QHBQ is upper Hessenberg. Then the equivalent decomposition is a possibly reduced Arnoldi decomposition. Theorem 1. Then it follows that Since K[(Uk Wfc+i)] = TZ[(Uk v>k+i)]> this Krylov decomposition is equivalent to the original. • The proof of Theorem 1. If the original basis Uk+i is reasonably well conditioned.3 it is essentially unique.8 is constructive.SEC. Next let be a vector of norm one such that UHu = 0. Uk+i is constrained to have components along the original vector Uk+i. Given the original decomposition.8. then by Theorem 1. There are two points to be made about this theorem. Every Krylov decomposition is equivalent to a (possibly reduced) Arnoldi decomposition. Proof. and by Theorem 1. in which the matrix U is orthonormal. We say that it was obtained from the original decomposition by translation. each step in the proof has an algorithmic realization. We begin with the Krvlov decomposition where for convenience we have dropped the subscripts in k and k+1. • If the resulting Amoldi decomposition is unreduced. 1. Note that translation cannot produce an arbitrary vector Uk+i. Then is an equivalent decomposition. these algorithms will be reasonably stable. Then the translated decomposition is an equivalent orthonormal Krylov decomposition. Let U = UR be the OR factorization of U.4 71 (U) contains no invariant subspaces of A. To summarize: Vol II: Eigensystems .Instead. Equivalence to an Arnold! decomposition We will now use the transformations introduced above to prove the following theorem. so that the two decompositions span the same space. Finally.

in which the matrix is reduced rowwise beginning with the last row. Reduction to Arnoldi form As we shall see. The course of the reduction is illustrated in Figure 1. It uses a variant ofhousegen that zeros the initial components of a vector.. Let H\ be a Householder transformation such that 6HH\ = /tefc. This algorithm illustrates an important fact about algorithms that transform Krylov decompositions. Chapter 2.19) is an Arnoldi decomposition. .2: Rowwise reduction to Hessenberg form Each Krylov sequence that does not terminate at k corresponds to the class of Krylov decompositions that are equivalent to its Arnoldi decomposition. KRYLOV SEQUENCE METHODS Figure 1. Because the last rows of HI.. . Now let Q = HiH2 • • . If n is large.3 implements this scheme.2. H^-\ are ei.Then the Krylov decomposition has a Hessenberg Rayleigh quotient.312 CHAPTER 5. so that (1. We now reduce H\BHi to Hessenberg form by a variant of the reduction described in §2. This can be done by Householder transformations as follows. Algorithm 1. Let the Krylov decomposition in question be where B is of order k.3. it is sometimes necessary to pass from an orthonormal Krylov sequence to an Arnoldi sequence.Hk-i. transformations on the Rayleigh quotient are negligible .

3: Krylov decomposition to Amoldi decomposition Vol II: Eigensystems . Chapter 2]. w. 1.20). which takes an ra-vector x and returns a Householder transformation H — I — uvP such that Hx — ven [see (2. This algorithm reduces it to an Arnoldi decomposition. */). The algorithm uses the routine housegen2(x. KRYLOV DECOMPOSITIONS 313 Let be an orthogonal Krylov decomposition of order k. Algorithm 1.SEC.

their computation simplifies.11. Thus the computation of a refined Ritz vector can be reduced to computing the singular value decomposition of the low-order matrix B^. From (1. we will assume that is a orthonormal Krylov decomposition.3). In general. if the defining subspace for these vectors is a Krylov subspace.16).4. KRYLOV SEQUENCE METHODS compared to the manipulation of U.20) we have If we set we have Since (U u)is orthonormal. But the possibilities for optimizing cache utilization are better if we calculate U*Q as above (see §1:2. Thus the bulk of the work in Algorithm 1.314 CHAPTER 5. However. the right singular vectors of (A — ^iI)U are the same as the right singular vectors of B^.iiI)U whose singular value is smallest [see (4. Chapter 4. The purpose of this subsection is to give the details.4.3 and 4. Uw) is a harmonic Ritz pair if . Harmonic Ritz vectors By Definition 4. Chapter 4]. Refined Ritz vectors If \i is a Ritz value then the refined Ritz vector associated with n is the right singular vector of (A . both are somewhat expensive to compute. COMPUTATION OF REFINED AND HARMONIC RITZ VECTORS In §§4. we introduced refined and harmonic Ritz vectors. We could have accumulated the Householder transformations directly into U. This process requires nk2 flam. where U is overwritten by U*Q. Chapter 4. the pair (K + <!>. In what follows.3 is found in the very last statement. which would give the same operation count. 1.

he was aware that accurate approximations to certain eigenvalues could emerge early in the procedure. Lanczos did not compute approximations to the eigenvalues from the matrix Tk (which he never explicitly formed). noting that the Rayleigh quotient is Hessenberg. from which he computed eigenvalues.. If we are willing to assume that (B — K/)H is well conditioned. of which (1. The two described here emerge with a clean bill of health.1950] has had a profound influence on the solution of large eigenproblems. As indicated above (p.. we can multiply by (B — K/)~ H to get the ordinary eigenvalue problem 1.. whose algorithm [157. and indeed the relation (1. the Lanczos recurrence preceded the Arnoldi procedure. 279). Consequently.5. we have opted for the slightly vaguer word "decomposition. there are several variants of the basic three-term recursion. but instead used the recurrence to generate the characteristic polynomial of Tk.SEC. 1. Uk minimizes the norm of the result. which can be done by the QZ algorithm (see §4. Lanczos also gave a biorthogonalization algorithm for non-Hermitian matrices involving right and left Krylov sequence. which is by far the most widely used. NOTES AND REFERENCES Lanczos and Arnold! Contrary to our order of presentation.1951] seems to have been to generalize the onesided Lanczos process to non-Hermitian matrices. The matrix form (1. . What we have been calling an Arnoldi decomposition is often called an Arnoldi factorization. KRYLOV DECOMPOSITIONS and 315 It follows tha Thus we have reduced the problem of computing w to that of solving a small generalized eigenvalue problem. whose ground-breaking thesis helped turn the Lanczos algorithm from a computational toy with promise into a viable algorithm for large symmetric eigenproblems. It is due to to Cornelius Lanczos (pronounced Lantzosh)." Vol II: Eigensystems .3. the factorization is concealed by the second form (1. As might be expected. He called his method the method of minimized iterations because orthogonalizing Au against u\. Chapter 2). Paige [197] has given a rounding-error analysis of these variants. but the others may not perform well. Arnoldi's chief contribution [5.15) is an example.However.2) is a factorization of AUk.4).12) of the Lanczos decomposition is due to Paige [195].

THE RESTARTED ARNOLDI METHOD Let be an Arnoldi decomposition.As k increases. in order to orthogonalize Auk against the previous vectors we will have to bring them back into main memory. convergence.1 to keep expanding the Arnoldi decomposition until the Ritz pairs we want have converged.5. Uwo requires almost 10 megabytes of memory to store in IEEE double precision.For example. if A is of order 10.This. for large problems our ability to expand the Arnoldi decomposition is limited by the amount of memory we can devote to the storage of Uk. see §2. However. 2. as are the systematic transformations of the resulting Krylov decompositions. .2. some of these Ritz pairs will (we hope) approach eigenpairs of A. Since disk transfers are slow. A possible cure is to move the vectors Uk to a backing store—say a disk—as they are generated.5 shows. In principle. to be treated in the next section.000. the shift has to be extremely accurate for the degradation to be significant. Moreover. We have already noted that the Ritz pairs associated with Uk can be computed from the Rayleigh quotient Hk. Most notably. KRYLOV SEQUENCE METHODS Shift-and-invert enhancement and orthogonalization The fact that a shift that closely approximates an eigenvalue can degrade convergence to other eigenpairs does not seem to be noted in the literature. some of these ideas are implicit in some algorithms for large eigenproblems and large linear systems. the algorithm will spend most of its time moving numbers back and forth between the disk and main memory. An alternative is to restart the Arnoldi process once k becomes so large that we cannot store Uk. Krylov decompositions The idea of relaxing the definition of an Arnoldi decomposition is due to Stewart [267]. These methods are the subjects of §§2. proceeds via an orthonormal Krylov decomposition. However. and deflation. which is then truncated to yield an Arnoldi decomposition. then.316 CHAPTER 5. we can use Algorithm 1. is the way the Arnoldi method is used in practice. Unfortunately. in fact. Sorensen's implicitly restarted Arnoldi algorithm [246].1 and 2. One possible reason is that one does not generally have very accurate shifts. For more. as Example 1. In this chapter we will introduce two restarting methods: the implicit restarting method of Sorensen and a new method based on a Krylov-Schur decomposition. We then turn to practical considerations: stability.

. we can take p to be a Chebyshev polynomial on [a. If the eigenvalues \k+i. The emphasis is on filtering out what we do not want.SEC. 6]. 6] that are small compared to values sufficiently outside [a.. There are many possible choices for a filter polynomial.. 2.. 6] and the eigenvalues AI. n) are small compared to the values p(A. desired eigenvalues near the interval [a. Such a polynomial is called a filter polynomial.pk+i) • • • ( * .. and suppose that we are interested in calculating the first k of these eigenpairs. but two suggest themselves immediately.) (i = 1.. 6] in the first of the above alternatives will not be much enhanced. /zm. Then takep(i) = (t . .. Chapter 4). jLtm correspond to the part of the spectrum we are not interested in. . while what we want is left to fend for itself. Xj). this polynomial has values on [a. we have If we can choose p so that the values p(\i) (i = fc-f 1. Suppose we have at hand a set of Ritz values fii. VolII: Eigensystems . k)..7. Let wi be a fixed vector.. We will begin by motivating this process and then go on to show how the QR algorithm can be used to implement it.. .. 1. it turns out. Ajt lie without..1... A natural choice would be a linear combination of Ritz vectors that we are interested in.. THE RESTARTED ARNOLDI METHOD 2. THE IMPLICITLY RESTARTED ARNOLDI METHOD 317 The process of restarting the Arnold! method can be regarded as choosing a new starting vector for the underlying Krylov sequence. 2. Filter polynomials For the sake of argument let us assume that A has a complete system of eigenpairs (A./*m)« It should be observed that both choices accentuate the negative. This. and suppose that /ifc+i. .. By the properties of Chebyshev polynomials (see Theorem 3... then u will be rich in the components of the Xi that we want and deficient in the ones that we do not want. 6]. A n lie in an interval [a.. Then u\ can be expanded in the form If p is any polynomial. . is a special case of a general approach in which we apply a filter polynomial to the original starting vector. For example..

The integer m is generally chosen so that the decomposition can fit into the available memory. The contraction phase works with a factored form of the filter polynomial. p is of degree m—k. Before describing how implicit restarting is implemented. let We incorporate the factor t — Ki into the decomposition using an analogue of the basic QR step [see (2. There are five points to note about this procedure. KRYLOV SEQUENCE METHODS Whatever the (considerable) attractions of filter polynomials. We begin with an Arnoldi decomposition We then use Algorithm 1.318 Implicitly restarted Arnoldi CHAPTER 5. they are expensive to apply directly. then p(A)ui requires ra—k matrixvector multiplications to compute. we can use shifted QR steps to compute the Krylov decomposition of ICk[p(A)ui] without any matrix-vector multiplications. Postmultiplyingby Qi. Starting with the shifted decomposition we write where Q\R\ is the QR factorization of Hm — «i/. let us look at how it will be used. we have where Note that H^.1 to expand this decomposition to a decomposition of order m. . Specifically. and the implicit restarting process is used to reduce the decomposition to a decomposition of order & with starting vector p( A )ui. If. we get Finally. However. if HI is the starting vector of an Arnoldi decomposition of order m. This technique is called implicit restarting. This process of expanding and contracting can be repeated until the desired Ritz values have converged. is the matrix that results from one QR step with shift KI applied to Hm. At this point a filter polynomial p of degreeTO—fcis chosen.1). Chapter 2]. for example.

Km-Jt. This follows from the proof of the preservation of Hessenberg form by the QR algorithm.K\I). The matrix lib is orthonormal. The matrix Hm is Hessenberg. 4. 5. and at most its last subdiagonal element is zero.1) by ei and remembering that RI is upper triangular. 2.SEC.The result will be a Krylov decomposition with the following properties. 2..e. r^' is nonzero. The first column of Um = UmQ\ is a multiple of (A . The matrix Q\ is upper Hessenberg. 3. only the last two components of b^m'+l are nonzero. This follows from the fact that it can be written in the form where PJ-IJ is a rotation in the (j—1. which establishes the result.2. The vector b^f+l = e^Qi is the last row of Q and has the form i. j)-plane (see Figure 2. THE RESTARTED ARNOLDI METHOD 319 1. For on postmultiplying (2. Chapter 2).3) has the Wilkinson diagram presented below for k = 3 and ra = 6: Vol II: Eigensystems . we get Since Hm is unreduced. We may now repeat this process with K^? • • • . Thus the decomposition (2.

1 is an implementation of one step of filtering. Implementation We have described the contraction algorithm for the implicitly restarted Arnoldi method in terms of the explicitly shifted QR algorithm. After we have applied the shifts «!. From this diagram it is easy to see that the first k columns of the decomposition can be written in the form where Uj. when A is real and the shift is complex. • The algorithm requires 0(m 2 ) work. since when n is large the applications ofFilterArn will account for an insignificant part . However. we will only describe the implicit single step method.m consists of the first k columns of Um » HM cipal submatrix of order k of Hm~ »and g^m is from the matrb Hence if we set ls me leading prin- then is an Arnold! decomposition whose starting vector is proportional to (A—Ki 1} • • -(A— Ac m _jb/)Mi. but we also get its Arnoldi decomposition of order k for free. We have thus restarted the Arnold! process with a decomposition of order k whose starting vector has been filtered by p(A). KJ. KRYLOV SEQUENCE METHODS where the q's are from the last row of Q. We will call the integer m—j the truncation index for the current problem.. • The algorithm is a straightforward implementation of the single implicit QR step.320 CHAPTER 5. For convenience we separate the final truncation from the implicit filtering. The elegance and economy of this procedure is hard to overstate. For single shifts there is no reason not to use this procedure. Not only do we avoid any matrix-vector multiplications in forming the new starting vector. Here are some comments. . A precise operation count is not necessary. we can truncate the decomposition as above to yield a decomposition of order m—j.. The only new feature is that a double shift adds an additional two subdiagonals to the matrix Q and hence moves the truncation index back by two. Algorithm 2.. For large n the major cost will be in computing UQ. A bit of terminology will help in our description of the algorithm. For simplicity. it is natural to use the implicit double shift procedure.

THE RESTARTED ARNOLDI METHOD 321 Let be an Arnoldi decomposition of order m. this algorithm reduces the truncation index by one.1: Filtering an Arnold! decomposition Vol II: Eigensystems . Algorithm 2.SEC. The transformations generated by the algorithm are accumulated in the raxm matrix Q. 2. Given a shift AC.

2. The expansion phase requires m—k matrix-vector multiplications. This is done in Algorithm 2. the elements of Q below the m-k subdiagonal are zero. However. we have ignored the opportunity to reduce the count by taking advantage of the special structure of Q. Algorithm 2. If we implement the product U*Q [:.322 Given an Arnoldi decomposition CHAPTER 5. • Transformations are not accumulated in U. KRYLOV SEQUENCE METHODS and an integer m > k. The restarted Arnold! cycle We are now in a position to combine the expansion phase with the contraction phase. When n is large. this algorithm expands the decomposition to one of order m and then contracts it back to one of order k. so that the expansion phase requires between n(m2-k2) flam and 2n(ra 2 -fc 2 ) flam. It is much cheaper to accumulate them in the smaller matrix Q and apply Q to U at the end of the contraction process. If this is taken into . From (1. In fact. 1: k] naively.2: The restarted Arnoldi cycle of the work in the overall Arnold! algorithm. the bulk of the work in this algorithm is concentrated in the expansion phase and in the computation of U*Q[l:k].10) we know that the Arnoldi step requires between Inj flam and 4nj flam. the operation count is nm k flam.

But experience has shown it to be the best general method for non-Hermitian matrices when storage is limited. the naive scheme permits more blocking..4e-03 8. There is no general analysis that specifies an optimal relation.0e-07 The columns contain the following information: the number of matrix-vector multiplications. These counts are not easy to interpret unless we know the relative sizes of m and k.9e-03 8.0e-03 1. .SEC. the Ritz vector from the restarted subspace).95") (the matrix of Example 3. they can be varied from cycle to cycle. 2. Axu 10 15 20 25 1.3). whereas the former requires only 1. Example 2. the restarted Arnoldi subspace). . It would be rash to take one offhand example as characteristic of behavior of the implicitly restarted Arnoldi method. as long as m is not so large that storage limits are exceeded. The implicitly restarted Arnoldi method can be quite effective. Yet the latter requires 2. The implicitly restarted Arnoldi method was run with k = 5. . A 5.95.7e-05 8.9e-05 4.1. Let A = diag(l. For purposes of exposition we have assumed that m and k are fixed.95 3 . VolII: Eigensy stems . However. The results are summarized in the following table. tan Z(ei.000. 3. the full Krylov subspace).8e-05 8. and starting vector e. we end up with the following table (in which RO stands for reorthogonalization).8e-08 IRA 9. as the following example shows..500 words of floating-point storage. The implicitly restarted Arnoldi method is almost as good as the full Krylov sequence. The main conclusion to be drawn from this table is that it does not much matter how U*Q [:.952. In fact.1.8e-02 4. THE RESTARTED ARNOLDI METHOD 323 consideration the count can be reduced to n(km .f fc2). Thus if we take m = 2k. tan Z(ei.. Chapter 4). . 1 :k] is formed. so that in spite of the operation counts it may be faster (see §1:2. . but the conventional wisdom is to take m to be Ik or greater.0e-07 RV 1. 4.2e-01 5. tan Z (ei.7e-02 2. m = 10. 2.

.3) has the form where T^m *) is an upper triangular matrix with Ritz values pk+i.8) is the Rayleigh quotient for the implicitly restarted Arnoldi algorithm. at the end of the expansion phase the Ritz values—i..e. Let {im. Because Hm~ is block triangular. The matrix Hk in (2..3. Corollary 2. then the resulting matrix H has the form where HU is unreduced.324 Exact shifts CHAPTER 5. Let H bean unreduced Hessenberg matrix.. It follows that the last row of RQ is zero.2 we get the following result. This means that we can choose to keep certain Ritz values and discard others. fik+i be eigenvalues of Hk. • By repeated application of Theorem 2. This fact is a consequence of the following theorem. When this is done the Rayleigh quotient assumes a special form in which the discarded Ritz values appear on the diagonal of the discarded half of the Rayleigh quotient. it follows from the representation of the formation of this product in Figure 2. Specifically. the eigenvalues of Hm — are calculated and divided into two classes: . Chapter 2. Moreover.. .RQ -f p. Chapter 2. .k+i > then the matrix Hm~ ' in (2.f i l ) to triangular form. If one QR step with shifty is performed on H. that all but the last of the subdiagonal elements of RQ are nonzero. Hence H . KRYLOV SEQUENCE METHODS We have already mentioned the possibility of choosing Ritz values—the eigenvalues of H — as shifts for the implicitly restarted Arnoldi method. fJ. Hk must contain the Ritz values that were not used in the restarting process. This reduction is illustrated in Figure 2.6). and let pbean eigenvalue ofH. Theorem 2. from which it is evident that all but the last of the diagonal elements of R are nonzero. Pm on its diagonal. Proof. Hence the last diagonal element of R must be zero.3.. where HU is unreduced. The first part of the QR step consists of the reduction of (H . If the implicitly restarted QR step is performed with shifts /u m ..2.2. But H — fil is not of full rank.1 has the form (2. • • • .

Fortunately. However. we cannot guarantee that the undesirable Ritz values have been purged from the decomposition. k] in statement 11 of Algorithm 2. the definitive implementation of the implicitly restarted Arnoldi method.1. we will be interested in the largest eigenvalues. H[k+l. however. it is easier to move eigenvalues around in a triangular matrix than in a Hessenberg matrix. We now turn to a discussion of this phenomenon. Although forward instability is a real phenomenon. which is called forward instability of the QR step. Forward instability Theorem 2. there is a danger of an unwanted Ritz value becoming permanently locked into the decomposition. if a particular step is forward unstable. THE RESTARTED ARNOLDI METHOD 325 those we wish to retain and those we wish to discard. has been and is used with great success to solve large eigenproblems.2. At worst it can only retard its progress. This phenomenon.e. We must continue to work with the entire matrix H as in Algorithm 2.3 is that in exact arithmetic the quantity H [k+l. A consequence of Corollary 2. For example. the matrix that one computes with rounding error can fail to deflate — i.2. we might retain the Ritz values having the largest real part. KRYLOV-SCHUR RESTARTING The implicitly restarted Arnoldi algorithm with exact shifts is a technique for moving Ritz values out of an Arnoldi decomposition. In stability computations.SEC. First. it should be kept in perspective. 2.5). as we shall see later (Theorem 2. and it is natural to retain the largest Ritz values. In this subsection we describe a more reliable method for purging Ritz values.2 is zero. if a Krylov decomposition can be partitioned in the form Vol II: Eigensystems . so that the computation of fl and u can be simplified. In the presence of rounding error. In cannot generate instability in the Arnoldi process itself.. Second. 2. The latter are used as shifts in the restarting algorithm. Given an exact shift.k] can be far from zero. when n is large. if we are using shift-and-invert enhancement. As we have seen. TO—l)-element can be far from zero. It is based on two observations. An important algorithmic implication of forward instability is that we cannot deflate the decomposition after the application of each exact shift. where an eigenvalue with a positive real part indicates instability. its (TO. In fact. forward instability in the QR step can compromise the progress of the algorithm (though not its stability). In fact. the work saved by deflation is negligible. That is why ARPACK. the Arnoldi relation AU = UH + flu will continue to hold to working accuracy. does not mean that we cannot use Algorithm 2.2 is a classic example of the limitations of mathematics in the face of rounding error. In this way the restarting process can be used to steer the Arnoldi algorithm to regions of interest.

Exchanging eigenvalues and eigenblocks Since the Krylov-Schur method requires us to be able to move eigenvalues around in a triangular matrix. move the desired eigenvalues to the beginning. KRYLOV SEQUENCE METHODS then AUi = U\B\\ + ub^ is also a Krylov decomposition. To derive our algorithms we will consider the real case. it is not enough to be able to exchange just eigenvalues.326 CHAPTER 5. Let where Su is of order n. we will begin with a sketch of how this is done. If the eigenproblem in question is real. we will generally work with real Schur forms. There are four cases to consider. . Thus we must not only be able to exchange eigenvalues but also the 2x2 blocks containing complex eigenvalues. In other words a Krylov decomposition splits at any point where its Rayleigh quotient is block triangular. If we have an algorithm to perform such exchanges. We will call the process Krylov-Schur restarting.2). However. The first thing to note is that to move an eigenvalue from one place to another it is sufficient to be able to exchange it with either one of its neighboring eigenvalues.(i = 1. More precisely. and throw away the rest of the decomposition. Our restarting algorithm then is to compute the Schur decomposition of the Rayleigh quotient.. then we can move any eigenvalue on the diagonal of a triangular matrix to another position on the diagonal by a sequence of exchanges. let a triangular matrix be partitioned in the form where If we can find a unitary transformation Q such that then the eigenvalues su and s22 m the matrix will have traded places.

The first two cases can be treated by what amounts to the first step in a reduction to Schur form (see the proof of Theorem 1. The third case can be treated by the variant of Schur reduction introduced in connection with the QR algorithm [see (2. Chapter 4.2). Then The last case.6. Vol II: Eigensystems . 2.SEC. Then it is easily verified that Note that although Su need not be equal 611. Since £22 is the Rayleigh quotient X^SX it must have the same eigenvalues as 622. Let x be a normalized eigenvector corresponding to 522 and let Q = (x Y) be orthogonal. and similarly for Su. and let Q = (X Y) be orthogonal. the two matrices must have the same eigenvalues (in fact. is more complicated. Let the matrix in question be where Su is of order one or two. X) be an orthonormal eigenpair. Let Let (522. Then by Theorem 1. they are similar). Chapter 2]. Let Let y be a normalized left eigenvector corresponding to Su and let Q = (X y) be orthogonal.12. HI = n^ — 2. Chapter 1). THE RESTARTED ARNOLDI METHOD 327 We will now sketch algorithms for each case.

then.1. KRYLOV SEQUENCE METHODS The problem. The algorithm for the fourth case is not provably stable. after the transformation the matrix Q^SQ will have the form To complete the algorithm. Specifically let the eigenbasis be (PT /)T. Specifically. Once the eigenvector has been computed. the matrix E must be set to zero. . The Krylov-Schur cycle We will now describe the basic cycle of the Krylov-Schur method. we can compute the QR decomposition In particular. In this case we do not use Algorithm 1.9) is a set of four linear equations in four unknowns which can be solved by any stable method. Q — (X Y) can be represented as the product of two Householder transformations.328 CHAPTER 5. For the other two. where P is to be determined. although it works well in practice and its failure can be detected. is to compute the eigenbasis X. we will assume that the matrix in question is complex. To keep the exposition simple. to compute P but instead note that (2. Chapter 1. Consequently. but replacing zero pivot elements with a suitably small number will allow the computation to continue (see Algorithm 2. so that we do not have to be concerned with real Schur forms. the eigenvector can be computed by inspection. The procedure is to compute a nonorthonormal eigenbasis and then orthogonalize it. It is possible for this system to be singular. The algorithms for the first three cases have stable implementations. the matrix Q can be taken to be a Householder transformation (or a plane rotation when n\ — n<2 = 1). These methods are stable in the usual sense. if E is too large—say ||£||oo > lOH^Hc^M— tnen me algorithm is deemed to have failed. For the first case. we get the Sylvester equation which can be solved for P. they can be computed stably by solving a 2 x 2 system of equations. Once we have computed P. Chapter 2). and it can be applied to the matrix in which S is embedded.9. Then From the first row of this equation. For more details on these algorithms see the LAPACK subroutine xLAEXC.

SEC.5) shows this is about the same as for implicit restarting. !:&]. • In the computation of the Schur decomposition in statement 7. The expansion phase is exactly like the expansion phase for the Amoldi method. As (2. and transforms it by a unitary similarity to triangular form. we use the exchange algorithms to transform the eigenvalues we want to keep to the beginning of Sm and truncate the decomposition. the savings are not worth the trouble. THE RESTARTED ARNOLDI METHOD The Krylov-Schur cycle begins with a Krylov-Schur decomposition 329 where Uk+i is orthonormal and Sk is upper triangular. To get such a decomposition initially. However. When n is large. minor savings can be had in the initial reduction of S to Hessenberg form. • The work involved in the expansion phase is the same as for the Arnoldi process. one can use ExpandArn (Algorithm 1. For large n the work in the contraction phase is concentrated in the formation of the product ?7*Q[:. Algorithm 2. since by (2. In fact.3 implements this scheme. Thus the implicitly restarted Arnoldi cycle and the Krylov-Schur cycle are essentially equivalent in the amount of work they perform. Vol II: Eigensystems .1) to implement it. expands it to an Arnoldi decomposition. one chooses a starting vector. 2.10) S is already partially reduced. the expanded Rayleigh quotient now has the form illustrated below for k = 4 and m = 8: Writing the corresponding Krylov decomposition in the form we now determine an orthogonal matrix Q such that Sm = Q^TmQ\s upper triangular and transform the decomposition to the form Finally. Here are two comments.

3: The Krvlov-Schur cycle . this algorithm expands the decomposition to one of order m and then contracts it back to one of order k.330 CHAPTER 5. Algorithm 2. KRYLOV SEQUENCE METHODS Given a Krylov-Schur decomposition of order k and an integer ra > k.

Suppose that an implicitly restarted Arnoldi cycle is performed on P and a Krylov-Schur cycle is performed on Q. At this point denote the first decomposition by and the second by Note that for both methods the final truncation leaves the vector u unaltered. First note that the expansion phase results in equivalent decompositions. it is natural to conclude that if we start from equivalent decompositions and remove the same Ritz values then we will end up with equivalent decompositions. ^m+i are the same up to multiples of modulus one. Then we must show that the subspaces spanned by UP[:. Thus it makes sense to say that both methods reject the same Ritz values.!:*]. l:k] are the same (see statement 14 in Algorithm 2.SEC. Let P be the unitary transformation applied to the Rayleigh quotient in P and let Q be the one applied to the Rayleigh quotient of Q. We must show that the subspaces associated with the final results are the same. set Pk = P[:. Vol II: Eigensvstems . since ll(U) = 11>(V) and in both cases we are orthogonalizingthe the vectors Au. Let be an unreduced Arnoldi decomposition and let be an equivalent Krylov-Schur form. we have Thus H and 5 are similar and have the same Ritz values.3). THE RESTARTED ARNOLDI METHOD The equivalence of Arnold! and Krylov-Schur 331 Given that the implicitly restarted Arnoldi method with exact shifts and the KrylovSchur method both proceed by moving Ritz values out of their respective decompositions. For brevity. although it requires some proving. Since V = UW for some unitary W. l:k] and Qib = <?[:. A2u. • • • . 2. Theorem 2. Now assume that both algorithms have gone through the expansion phase and have moved the unwanted Ritz values to the end of the decomposition. the vectors Uk+2 • • • ^m+i and Vk+2. In fact.4. Proof. then the resulting decompositions are equivalent.2 and statement 11 in Algorithm 2. If the same Ritz values are discarded in both and those Ritz values are distinct from the other Ritz values. l:k] and VQ[:. That is indeed the case.

KRYLOV SEQUENCE METHODS By construction Tl(Pk) spans the eigenspace P of Schur vectors of H corresponding to the retained Ritz values. The former can restart with an arbitrary filter polynomial.3. In particular. it effectively costs no more than a contraction step. On the other hand. when it comes to exact shifts the Krylov-Schur method is to be preferred because exchanging eigenvalues in a Schur form is a more reliable process than using implicit QR steps to deflate. Likewise. The Krylov-Schur method can be used as an adjunct to the implicitly restarted Arnoldi method. We will first treat the behavior of the methods in the presence of rounding error. Although this seems like a lot of work. • This theorem says that the implicitly restarted Arnoldi method with exact shifts and the Krylov-Schur method with the same shifts are doing the same thing. We will then discuss tests for the convergence of approximate eigenpairs. Stability Provided we take due care to reorthogonalize. if the latter produces a persistent unwanted Ritz value. Hence there is a unitary matrix R such that Qk = W^PkR.3 used to return to an Arnoldi decomposition. Since W^W = H.332 CHAPTER 5. T^(Qk) spans the eigenspace Q of Schur vectors of S corresponding to the retained Ritz values. we will show how to deflate converged eigenpairs from a Krylov decomposition. the offending eigenvalue swapped out. We will see in the next subsection that a similar procedure can be used to deflate converged Ritz pairs. This fact is not at all obvious from the algorithmic description of the Krylov-Schur method. For example. the Krylov-Schur method is restarting with a vector filtered by a polynomial whose roots are the rejected Ritz values. the decomposition can be reduced to Krylov-Schur form. STABILITY. we will present our results in the more general notation of the latter. the algorithms of this section are stable in the usual sense. By hypothesis each of these eigenspaces are simple and hence are unique (page 246). the matrix W^Pk spans Q. and Algorithm 1. something the latter cannot do. Comparison of the methods The implicitly restarted Arnoldi method and the Krylov-Schur method each have their place. . AND DEFLATION In this subsection we will consider some of the practicalities that arise in the implementation of the Arnoldi method. CONVERGENCE. 2. We then have It follows that VQk and UPk span the same subspace. Finally. if n is large and we delay multiplying transformations into the decomposition until the end of the process. Since the Arnoldi and Krylov-Schur decompositions are both orthonormal Krylov decompositions.

Vol II: Eigensystems .. Thus the algorithm can be continued for many iterations without fear of instability. Thus we can take the output of our algorithms at face value.-) of B are primitive Ritz pairs of A + E.SEC. 2. they will be accurately computed. • The generic constant 7 grows only slowly as the algorithm in question progresses.13) — that depends on the size of the problem and the number of cycles performed by the algorithm. see the notes and references. THE RESTARTED ARNOLDI METHOD Theorem 2. Let 333 be an orthogonal Krylov decomposition computed by the implicitly restarted Arnoldi method or the Krylov-Schur method. Then there is an orthonormal matrix Uk+i satisfying and a matrix E satisfying such that Here 7 is a generic constant—in general different in (2. • The vectors Uk^i are near the Ritz vectors UkwkIt follows that unless the Ritz pairs are very sensitive to perturbations in A. This theorem has three implications. the eigenpairs (//.5. w. there is a matrix F satisfying such that For a sketch of the the proof. Alternatively.11) and (2. • Since B = UHAU is a Rayleigh quotient of A.

the above analysis is quite crude. we must show how F in (2. and the system must be solved by a direct sparse method or by an iterative method.14) to A"1. To relate the backward error in (2. which replaces A in (2. In most applications AK will be large and sparse. as we have already pointed out. 61) that an appropriate criterion for accepting a purported eigenpair (/^.12). z] is that the residual r = Az — JJLZ . instability in a direct solver or a lax convergence criterion in an iterative method can result in a comparatively large backward error in solving the system AKv = u: 111 conditioning of A K . due to the shift being near an eigenvalue of A. can further magnify the error. KRYLOV SEQUENCE METHODS Stability and shift-and-invert enhancement When shift-and-invert enhancement with shift K. it is possible for the constant 7 to be large. The treatment of each of these methods is worth a volume in itself. the constant 7 can be larger than its counterpart in (2. From the first-order expansion we see that the backward error in AK l is approximately bounded by This bound compares unfavorably with (2. we can say that the computed solution will satisfy where Depending on the stability properties of the direct solver or the convergence criterion used by the iterative method. Unfortunately.12).14) propagates to A~l. However. Convergence criteria We have already stressed in connection with the power method (p. something we have already noted in Example 1.12) in two ways. if K is very near an eigenvalue of A. First.15). Second. However. there is little in the literature to guide us. and in practice accurate shifts may be less damaging than it suggests.334 CHAPTERS. the vector v = (A — K!)~IU will be computed by solving the system where AK = A — K/. then the condition number \\AK\\ \\A~l \\ will be large and will magnify the multiplier (7||AK||||A~1||) in (2. To put things less formally. is combined with the Arnoldi process. All this suggests that one must be careful in using shifts very near an eigenvalue of A. and we must take into account the errors generated in solving this system.5.

19) reduces to ||6HW||2. Z) is an approximate orthonormal eigenpair with a small residual then by Theorem 2. if Z = UW.18) provides a convergence criterion for determining if eigenpairs produced by an algorithm have converged. Z) is an exact eigenpair of A + E. and the expression (2. Z) is an exact eigenpair of A.8. there is a matrix E satisfying such that (M. as the following theorem shows. we will be concerned with ordinary eigenvalues and their eigenvectors. Theorem 2. Z) = (M. Thus. More generally. where the (normwise) relative error in A is tol. But for the most part. Then Proof. Chapter 4. then (M. (2. 2. so that 6H = /tejj-. the computation requires p matrix-vector multiplications. Thus if for some tolerance tol. the residual (2. Moreover: Vol II: Eigensystems . Let be an orthonormal Krylov decomposition and let (M. At first glance. then ||^||F is just the norm of the last row of W. if (M. the residual is cheap to compute. THE RESTARTED ARNOLDI METHOD 335 be sufficiently small. then BW — WM = 0.16) would appear to be expensive to compute. However. If the decomposition is an Arnoldi decomposition. • We have chosen to work with eigenspaces because that is what is produced by complex eigenvalues in a real Schur form. not to mention the arithmetic operations involved in computing ZM.6. UW) be an orthonormal vair.SEC. For if Z has p columns. W) is a primitive Ritz pair. It is easily verified that Since There are three comments to be made about this theorem. where U is from an orthonormal Krylov decomposition. • If (M.

one should use the Rayleigh quotient. if we are concerned with the approximate eigenpair (^. KRYLOV SEQUENCE METHODS where Uk is the last component ofw. Chapter 4. However. However.17) relates to the shifted matrix. when shift-and-invert enhancement is used—say with shift K — the residual R in (2. In particular. Thus we might say that a Ritz pair (Af. z).21) for Ritz pairs . the norm of the Ritz block M is often a ballpark approximation to m||p. Specifically. we must have Now it is reasonable to assume that ||A — K. by Theorem 2. not the refined or harmonic Ritz value. • If (M. we may not be able to compute \\A\\p—especially if we are using shift-and-invert enhancement. In practice. it will be minimized when M is the Rayleigh quotient W^BW. W) is not a Ritz pair. and from (2.336 CHAPTER 5.6. since we are generally concerned with exterior eigenvalues.I\\p and ||A||F are of the same order of magnitude. the criterion becomes Convergence with shift-and-invert enhancement As in the case of stability. then the formula (2. the approximate eigenvalue is K + // l. when checking the convergence of refined or harmonic Ritz vectors. UW) has converged if When the concern is with eigenvectors rather than eigenspaces.22) we have the residual It follows that for the relative backward error in A not to be greater than tol. Hence our convergence test becomes This is exactly the same as the criterion (2. then In terms of the original matrix A.19) cannot be simplified.

Z) — (M. we have AU\ = UiB\\. we will never have an exact eigenspace in our decompositions.7. Of course. Let be an orthonormal Krylov decomposition. THE RESTARTED ARNOLDI METHOD Deflation 337 We say a Krylov decomposition has been deflated if it can be partitioned in the form When this happens. the savings will be marginal. Let (W W_\_) be unitary. In particular. this gives algorithms the opportunity to compute more than one independent eigenvector corresponding to a multiple eigenvalue. 2. the savings become significant. so that U\\ spans an eigenspace of A. The second advantage of the deflated decomposition is that we can save operations in the contraction phase of an Arnoldi or Krylov-Schur cycle. UW) be an orthonormalpair. The following theorem relates the residual norm of an approximate subspace to the quantities we must set to zero to deflate. we can apply all subsequent transformations exclusively to the eastern part the Rayleigh quotient and to (C/2 Us). There are two advantages to deflating a converged eigenspace. First by freezing it at the beginning of the Krylov decomposition we insure that the remaining space of the decomposition remains orthogonal to it. Theorem 2. and set and Then Vol El: Eigensystems . but as its size increases during the course of the algorithm. and we end up with a decomposition of the form Now since B\\ is uncoupled from the rest of the Rayleigh quotient. If the order of BU is small. and let (Af.SEC. The expansion phase does not change.

26) is backward stable. • To see the consequences of this theorem.23) with equality if and only if M is the Rayleigh quotient W^BW.26). we transform the Krylov decomposition AU — UB = ub** to the form tHEN BY (2. and. KRYLOV SEQUENCE METHODS From (2. we may set #21 and 61 to zero to get the approximate decomposition Thus we have deflated our problem by making small changes in the Arnoldi decomposition. Now if the residual norm \\AZ .ZM\\p is sufficiently small. suppose that AZ — ZM is small. If we restore the quantities that were zeroed in forming (2. Stability of deflation The deflation procedure that leads to (2.338 Proof. using (W W±). we have The theorem now follows on taking norms. Let CHAPTER 5.20). we get the following relation: .

2.21) if and only if M is the Rayleigh quotient ZHAZ = WR B W. we can compute a QR factorization of Z and use the orthonormal basis Z in the deflation procedure. If we arrange the vectors in a matrix Z and set M = diag(//i.7. p) of one kind or another. and the vectors in these pairs cannot be guaranteed to be orthogonal. This theorem says that the deflation technique is backward stable. Theorem 2..8. THE RESTARTED ARNOLDI METHOD If we write this decomposition in the form 339 then IfwesetE = -RUH. . Nonorthogonal bases In practice we will seldom encounter a converging subspace unless it is a two-dimensional subspace corresponding to an imaginary eigenvalue in a real Schur decomposition. write the deflated decomposition (2. the residual R — AZ — ZM must be small because the individual residuals are small...p).. . (J.. Instead we will be confronted with converged. normalized Ritz pairs (/^-. Under the hypotheses of Theorem 2. Since the deflation procedure only requires an orthonormal basis for the approximate eigenspace in question..then(A + E)U = UB + utP. We may summarize these results in the following theorem. Deflating an approximate eigenspace with residual norm p corresponds to introducing an error of no more than p'm A. the residual for Z becomes VolII: Eigensystems .26) in the form Then there is an E satisfying such that Equality holds in (2. Z{) (i = 1.SEC. Unfortunately.

if n is large. Specifically. it can be used to deflate harmonic Ritz vectors or refined Ritz vectors. Consequently. but which taken together cannot be deflated. we can exchange it into the (1. as usual.340 CHAPTER 5. If some other diagonal element of 622 ls a candidate for deflation. and the residual may be magnified—perhaps to the point where the deflation cannot be performed safely. Imposition of 622 > and test as above. For example. let the current decomposition have the form Here U\ represents a subspace that has already been deflated. Algorithm 2. This procedure can be repeated as often as desired. The procedure is to reduce #22 to Schur form. two vectors each of which we can deflate. However. it is not expensive. Deflation in an Arnoldi decomposition Deflating an Arnoldi decomposition proceeds by reducing it to a Krylov-Schur decomposition. \\S\\2 will be large. and use Algorithm 1.1)-element of £22 we must set the first component of 62 to zero. To deflate the the Ritz pair corresponding to the (1. The resolution of this paradox is to remember that we are not deflating two vectors but the subspace spanned by them.3 to bring the decomposition back to an Arnoldi form. If. and 622 ls tne Schur form that remains after the contraction phase. Although the procedure is roundabout.4 implements this scheme. we accumulate the transformations in an auxiliary matrix Q and apply them to U only at the end of the deflation process. if Ritz vectors are the concern. KRYLOV SEQUENCE METHODS where M = SMS~l. It may seem paradoxical that we could have. After a cycle of the algorithm. deflate as in the Krylov-Schur form. all we have to do to deflate is to verify that that component satisfies our deflation criterion. Deflation in the Krylov-Schur method The deflation procedure is not confined to eigenpairs calculated by a Rayleigh-Ritz procedure. there is an easy way to deflate Ritz vectors in the Krylov-Schur method. at the end of the contraction phase the Arnoldi decomposition will have the form where #22 is upper Hessenberg. If the columns of Z are nearly dependent. this is an inexpensive algorithm. If the vectors are nearly dependent. . they must be very accurate to determine their common subspace accurately. say.

The criterion (2. 1. There are two possibilities for choosing tol. let the Ritz value sa be a candidate for deflation. where a is some convenient estimate of the size of A. Choose tol to be a modest multiple of the rounding unit. The reason is that it is comparatively inexpensive to do so. The advantage of this choice is that one may save calculations by not demanding too much. 2.SEC. their implementations seldom use an explicit parameter to determine convergence. There is no point in choosing tol smaller. Choose tol to be something that gives a backward error that one can live with. Instead they iterate until the limits set by finite precision arithmetic are reached. For some i > j. since the operations in the algorithms already impose a backward error of the same order (see Theorem 2. such as ARPACK. The reason is that the computations are potentially so voluminous that it is unreasonable to continue them beyond the accuracy desired by the user. the eigenpair comes from a perturbed matrix with relative error tol. Algorithms for large eigenproblems. Algorithm 2. Good packages.4: Deflating in a Krylov-Schur decomposition Choice of tolerance Although algorithms for dense eigenproblems are necessarily iterative. allow both a user-specified option and a default opVol II: Eigensystems . THE RESTARTED ARNOLDI METHOD Let the Krylov-Schur AU = US + ubH have the fora 341 where Su is of order j. In that case. 2. The difficulty is that the choice is highly problem dependent and must be left to the user. on the other hand.5). usually allow the user to furnish a tolerance to decide convergence.18) suggests stopping when the residual norm of the eigenpair in question is less than tol-a. This algorithm determines if the value can be deflated and if so produces a deflated decomposition. It is therefore necessary to say how such a criterion should be chosen.

and shifts Although the computational algorithm is easy to derive. it is interesting to see what lies behind it. See the discussion on page 339.4. One way of doing this is to restart with a new shift and a new vector. . it should be made somewhat larger to allow multiple vectors to be deflated. KRYLOV SEQUENCE METHODS tion based on the rounding unit. it may necessary to use more than one shift. inverses. Thus the problem posed above amounts to restarting the Krylov sequence with the new vector w. We will now describe an algorithm for doing this.K/2/)"1 without having to perform an inversion. However. the Amoldi decomposition may still contain useful information about the eigenvectors near the new shift—information that should be kept around. This subsection is devoted to describing how this is done.342 CHAPTER 5. It is a surprising fact that we can change a Krylov decomposition from one in (A . 2. However.29) says that the Krylov subspace in (A — KI/) 1 with starting vector u is exactly the same as the Krylov subspace in (A — K^)"1 with a different starting vector w. Krylov sequences. Since deflation is backward stable (see Theorem 2.3. However. when the new shift is not far removed from the old. Chapter 4) It follows that Equation (2. then the sequence with its terms in reverse order is so that Moreover by the shift in variance of a Krylov sequence (Theorem 3. RATIONAL KRYLOV TRANSFORMATIONS The principle use of shift-and-invert transformations in Arnoldi's method is to focus the algorithm on the eigenvalues near the shift K. Suppose we have a Krylov sequence If we set v = (A — Ki/) 1 u. the tolerance for deflation should of the same order of magnitude as the tolerance for convergence. when the desired part of the spectrum is spread out.8).KI!)~I to one in (A .

• The algorithm is not unconditionally stable.3. At first sight it might appear that the rational Krylov transformation is a good way of getting something for nothing. the transformations to U can be saved and applied at the end of the algorithm. Then This is a Krylov decomposition.5 implements this procedure. the Q in the factorization in statement 1 can be computed as the product of k plane rotations. If R is ill conditioned. the application of R~l can magnify the residual error in the decompositions. which can be reduced to an Arnoldi decomposition by Algorithm 1. this structure makes it easy to preserve deflated vectors in the initial part of the decomposition. Algorithm 2. • Since /+ («i — KZ )H is upper Hessenberg. As usual. THE RESTARTED ARNOLDI METHOD The rational Krylov method Let us begin with an orthonormal Arnoldi decompositioi 343 Then Henc where To reduce this relation to a Krylov decomposition. As one gets better approximations to an eigenvalue Vol II: Eigensystems . let be a QR decomposition of / + (KI .SEC. • The algorithm is quite economical involving no manipulations of A. 2.K2H). In particular. Here are some comments.

pp. provided the number of eigenvalues is less than the block size.2]. Nor does the method save the trouble of computing a factorization.5: The rational Krvlov transformation of A. this algorithm computes an Arnoldi decomposition Algorithm 2. one must be able to apply (A . KRYLOV SEQUENCE METHODS of order k and a shift ^2. See [45] and [93].5. It is important to distinguish between restarting the Arnoldi process to compute a single eigenpair and to compute several eigenpairs simultaneously. Once the eigenvector has been found it can be removed from the problem by various deflating techniques. 2. Consequently. 584-602] and [233. The former problem is relatively easy: just restart with the best available approximation to the desired eigenvector. the accuracy of eigenvector approximations is not affected. the closer the new shift is to an eigenvalue. 1980] who first presented it as an effective algorithm for computing eigenpairs of large nonHermitian matrices. and filter polynomials Although the Arnoldi method is indeed due to Arnoldi [5]. is was Saad [230.K2/)-1 to a vector. NOTES AND REFERENCES Arnoldi's method. the poorer the new starting vector. To put it another way. Equivalent block methods for several eigenvalues exist. The flaw in this argument is that the rational Krylov transformation does not change the Krylov subspace. which have been surveyed in [300. If one wants to continue the sequence. restarting. §FV. one can use them as shifts and get a sequence of Krylov decompositions with increasingly effective shift-and-invert enhancers.344 Given an Arnold! decomposition CHAPTER 5. the argument goes. .

. have produced ARPACK [162]. Exchanging eigenvalues Stewart [256] produced the first program for exchanging eigenvalues in a real Schur form. The idea of following a sequence of Arnoldi steps with the application of a filter polynomial to purify the restart vector is due to Saad [232. the "implicit" in implicit restarting does not refer to its use of implicit QR steps but to the fact that the starting vector is chosen implicitly by the shifts in contraction procedure. However. To restart with a single approximate eigenvector would destroy the information accumulated about the others. for shift-and-invert enhancement. The simplest example is the computation of Ax. but in 1990 [233. Vol II: Eigensystems . building on a design of Phuong Vu. Lehoucq. A happy exception is the implicitly restarted Arnoldi algorithm. a well-coded and well-documented package that bids fair to become the standard for large non-Hermitian eigenproblems. In 1982. The breakthrough came with Sorensen's 1992 paper [246] entitled "Implicit Application of Polynomial Filters in a fc-Step Arnoldi Method. In an elegant paper Morgan [180. Saad [230] proposed restarting with a linear combination of approximate eigenvectors from the Krylov subspace. which knows its own data. (For other convergence results see [160].) ARPACK New algorithms for large eigenproblems are usually implemented in MATLAB for testing but seldom end up as quality mathematical software. 180] he pointed out that a good combination is hard to find. however. performs the operation and returns the results via another call to the general-purpose routine.1984]. Sorensen. Reverse communication All routines for solving large eigenvalue problems require that some operation be performed involving the matrix in question and a vector. The problem with this procedure is that once again one must choose a vector to purify. An alternative — called reverse communication—is for the general-purpose routine to return to the calling program with a flag specifying the desired operation. 2. THE RESTARTED ARNOLDI METHOD 345 The problem is far different when it is desired to restart the Arnoldi process while several eigenvectors are converging. p. The method is to iterate QR steps with an exact shift until the process converges. and Yang. One solution is for the caller to specify a subroutine to perform the operation in the calling sequence. The calling program. once this subroutine is called.1996] showed why." In it he described the implicitly restarted Arnoldi algorithm as presented here. a more complicated operation may be required — e.) He also gives convergence results for the case of a fixed filter polynomial and the case of exact shifts for a symmetric polynomial. (By the way. which can only be done with messy global variable structures.g. including exact and inexact shifts.SEC. The problem for a general-purpose routine is how to get the caller to perform the operations in question. it must get data from the original program.

Although the method is new. 3]. Stathopoulos. Lehoucq [personal communication] has written code that uses Schur vectors in deflation. From the properties of the Gram-Schmidt algorithm with reorthogonalization [51. due to Stewart [267]. The manipulations are justified by Theorem 1. Second. Regarding the use of Schur vectors. It too can fail. What distinguishes the Krylov-Schur approach is the explicit introduction and systematic manipulation of Krylov decompositions to derive and analyze new algorithms. it is easy to show . which is then truncated to become an Arnoldi decomposition. Forward instability The forward instability of the QR step has been investigated by Parlett and Le [207] for symmetric tridiagonal matrices and by Watkins [289] for Hessenberg matrices. Also see [181].8. it is awkward to deflate converged Ritz pairs. This lead to attempts to formulate noniterative algorithms. solves both these problems by expanding the class of transformations that can be performed on the underlying decomposition. a special case of our Arnoldi-Schur decomposition. but the failures do not seem to occur in practice. Fokkema. the contraction step of the iteratively restarted Arnoldi method produces an orthonormal Krylov decomposition. an unwanted Ritz pair can fail to be purged from the decomposition.346 CHAPTER 5. The Krylov-^Schur method The iteratively restarted Arnoldi procedure has two defects. Ch. Schur vectors can also be used to restart the latter. precursors of various aspects can be found in the literature. The introduction of general orthonormal Krylov decompositions. and van der Vorst [74] explicitly use Schur vectors to restart the Jacobi-Davidson algorithm. It has been implemented in the LAPACK routine xTREXC. and Wu [248] point out that since the unpreconditioned Jacobi-Davidson algorithm is equivalent to the Arnoldi algorithm. But it is easily proved.5 has not appeared in the literature. unwanted Ritz pair. KRYLOV SEQUENCE METHODS The method is stable but can fail to converge when the separation of the eigenvalues in the blocks is poor. Lehoucq and Sorensen [158] propose a procedure for modifying the Krylov subspace so that it remains orthogonal to the left eigenvector of a converged. which shows that they can never yield a decomposition that is not based on a Krylov sequence. Sleijpen. Saad. for symmetric matrices Wu and Simon [306] exhibit what might be called a Lanczos-spectral decomposition. Stability Theorem 2. it is called the Krylov-Schur method. First. The algorithm described here is due to Bai and Demmel [11]. To deal with the problems it poses for the iteratively restarted Arnoldi method. Closer to home.117] and the general rounding-error analysis of unitary transformations [300. Since the method is centered around a Schur decomposition of the Rayleigh quotient. both resulting from its commitment to maintaining a strict Arnoldi decomposition. In fact.

§VI.8. From this one constructs (7. The deflation technique for Arnoldi-Schur decompositions is closely related to Saad's method of implicit deflation [233. 3. THE LANCZOS ALGORITHM We have seen that if A is symmetric and is an Arnoldi decomposition. then the Rayleigh quotient 1 k is a tndiagonal matrix of the form Vol II: Eigensystems . absorbs its error in the residual.3]. who described it for the generalized eigenvalue problem. Chapter 4.SEC. The analysis of convergence criteria for shift-and-invert enhancement is new. The fact that it may be impossible to deflate a group of several individually deflatable eigenvectors appears to be known but has not been analyzed before. Deflation Deflating converged Ritz values from an Arnoldi decomposition is a complicated affair [158]. and Jensen [184] discuss the effect of accurate shifts on the Rayleigh quotient and conclude that the graded structure of the Rayleigh quotient makes it a nonproblem. and it is not clear that the procedure can be extended to arbitrary vectors. Convergence Converge criteria for the Arnoldi processes are discussed in [161].2. 3. THE LANCZOS ALGORITHM 347 that the residual AU ~ UB — w6H of the computed decomposition is of the order of the rounding error. In fact. Parlett. Ericsson. Shift-and-invert enhancement and the Rayleigh quotient Nour-Omid. and that the departure of U from orthonormality is of the same order. and throws the error back on A as in Theorem 2. this is the problem that lead me to orthonormal Krylov decompositions. such as refined and harmonic Ritz vectors. The rational Krylov transformation The rational Krylov method is due to Ruhe [221].

when they are of the order of the rounding unit compared with A. 3. we reserve the right to insert a factor of \\A\\ in our bounds whenever it seems appropriate. A key to deriving Lanczos-based algorithms is knowing when quantities can be neglected. in which the orthogonality of the Lanczos basis is allowed to deteriorate up to a point. we will assume throughout this sftrtinn that: However. In practice. and the chief problem in implementing a Lanczos method is to maintain a satisfactory degree of orthogonality among them.e. strictly speaking. we can reorthogonalize a straying Lanczos vector Uk against its predecessors. the Lanczos vectors Uj computed by the Lanczos algorithm must be orthogonal. since the reorthogonalization causes the Rayleigh quotient to become upper Hessenberg instead of tridiagonal. Mathematically. REORTHOGONALIZATION AND THE LANCZOS ALGORITHM Lanczos with reorthogonalization In finite precision arithmetic the vectors Uk produced by the Lanczos process will deviate from orthogonality. however. i. We will then treat the restarted Lanczos algorithm in which orthogonalization is performed at each step. KRYLOV SEQUENCE METHODS This implies that the vectors Uk can be generated by a three-term recurrence of the form where When we combine this recursion with the Rayleigh-Ritz method. The alternative is to save the Lanczos vectors in a backing store and move them into main memory to reorthogonalize the new Lanczos vectors—presumably only when absolutely necessary..348 CHAPTER 5. To avoid cluttering our expressions with factors of \\A\\. This fact is one of the main complications in dealing with the Lanczos algorithm in inexact arithmetic. The first is to reorthogonalize the vectors at each step and restart when it becomes impossible to store the Lanczos vectors in main memory.3 we will consider semiorthogonal algorithms.1. In §3. they can lose orthogonality. As in the Arnoldi algorithm. The purpose of this section is to describe the practical implementations of this method. In both cases the result of the orthogonalization is something that is not. There are two problems associated with . a Lanczos decomposition. after which orthogonalization is performed. There are two solutions. we get the Lanczos method. We will begin by describing a general form of the algorithm that encompasses most of the variants that have appeared in the literature.

1: Lanczos procedure with orthogonalization this procedure. we can reorthogonalize Uk only against those vectors to which it is actually nonorthogonal. we can make two compromises. 2. the preceding vectors must be stored and then brought back into the fast memory of the machine. They are a useful guide to what follows. To avoid this problem. Second. Algorithm 3. we can agree to tolerate a prescribed degree of nonorthogonality so that we do not have to reorthogonalize at every step.. 1. • Depending on how the decision is made to reorthogonalize and the on details of the reorthogonalization process. it obviously will have to stop at some point—e. • Although we have written a nonterminating loop. since we have to save them anyway to compute Ritz vectors. VolII: Eigensystems . But if a reorthogonalization is performed at each Lanczos step. this code encompasses five common variants of the Lanczos method. the time taken to transfer Lanczos vectors from the backing store will dominate the calculation. Algorithm 3.SEC. THE LANCZOS ALGORITHM 349 Given a starting vector HI this algorithm computes a sequence of suitably orthogonalized Lanczos vectors. to restart or to deflate converged Ritz vectors. First. The following comments should be read carefully. The actual storing of these vectors is not a problem. Full reorthogonalization: v is always reorthogonalized against all its predecessors. First.g. 3. No reorthogonalization: v is accepted as is at each iteration.1 sketches a general procedure that encompasses most of the variants of the Lanczos algorithm.

The third problem is that reorthogonalization contaminates the Lanczos decomposition. This is because in spite of the fact that reorthogonalization restores orthogonality. The matrix Hk is a Rayleigh quotient in the general sense of Definition 2. the Rayleigh quotient becomes upper Hessenberg. In that case Uk effectively spans an eigenspace of A. • Statements 9-11 attempt to enforce global orthogonality—that is. Chapter 4. This is not even an Arnoldi decomposition. 5. In the next two subsections we will treat full and periodic reorthogonalization. Specifically.2). If eigenpairs outside this eigenspace are desired. since with periodic. In what follows we will tacitly assume that this is done whenever necessary. Selective reorthogonalization: v is reorthogonalized against a subspace of the span of its predecessors determined by the current Ritz vectors. the orthogonality of v against Uk-i • There are three problems here. We do this by keeping a running estimate of the inner products U^Uk (see Algorithm 3. The first problem is to determine when global reorthogonalization is needed. the Krylov sequence must be restarted. which is done in statements 7-8. it does not affect the coefficients significantly: the loss of orthogonality corresponds to errors in ak and /^-i that are below the rounding threshold. For the other procedures see the notes and references.Thus we use the term "reorthogonalize" in statement 10 in a general sense—to be specified in the context of the variant in question.It turns out that the vector v in statement 6 can fail to be orthogonal to Uk and Uk-i to working accuracy. • The loss of orthogonality in the Lanczos procedure is of two types: local and global. 4. Note that the coefficients o^ and (3k-i are not updated during this process. partial. The second problem is how to reorthogonalize. and selective reorthogonalization the matrix Uk+i is not required to be orthogonal to working accuracy. KRYLOV SEQUENCE METHODS 3.3. Statements 7-8 attempt to enforce local orthogonality of v against Uk and u^-i. but only when it fails a test for orthogonality. Partial reorthogonalization: v is reorthogonalized against a subset of its predecessors when it fails a test for orthogonality. we simply have to to reorthogonalize Uk against Uk-i • Other variants require that we also reorthogonalize Uk-i against Uk-2. • It may happen that /3k becomes so small that it can be neglected.7. The cure is to reorthogonalize. To distinguish .350 CHAPTER 5. See §3. Nonetheless. there are effective work-arounds for this problem. If we are performing full reorthogonalization. It will be useful to fix our nomenclature and notation at this point. so that instead of a Lanczos decomposition we have the relation where Hk is upper Hessenberg. Periodic reorthogonalization: v is reorthogonalized against all its predecessors.

This implies that the elements above the superdiagonal in the Rayleigh quotient are negligible and can be ignored. Since the elements of Hk below the first subdiagonal are zero.1 will be of paramount importance.7. If we set then the restarted decomposition is VolII: Eigensystems .1) generated by Algorithm 3. since if ((V v) T ) is a left inverse of (Uk Wfc+i). THE LANCZOS ALGORITHM 351 it from the ordinary Rayleigh quotient Tjt we will call it the adjusted Rayleigh quotient.5 by Uk to get where Bk is now the matrix Hk in (3. Ac m _A:.2). Given the Lanczos decomposition and m—k shifts KI .SEC. Thus the main difference between the Lanczos algorithm with full orthogonalization and the Arnoldi algorithm is that the Rayleigh quotient for the former is symmetric (to working accuracy). We will denote this deviation by SL. Note that it is truly a Rayleigh quotient in the sense of Definition 2. The reader is assumed to be familiar with the contents of §2. postmultiply the relation (2.1). In what follows the deviation of Hk from the tridiagonal matrix (3. so that 3.2. . Hence Hk is a small perturbation of a symmetric matrix. the elements above the first superdiagonal must be small. . 3. In this subsection we will briefly sketch the resulting simplifications. then Hk = VTAUk. . To see this. FULL REORTHOGONALIZATION AND RESTARTING When the Lanczos algorithm is run in the presence of rounding error with full reorthogonalization the results are stable in the usual sense.to give the transformed decomposition where Q is the matrix of the accumulated QR transformations.13) in Theorem 2. Implicit restarting In its larger details implicitly restarting a Lanczos decomposition is the same as restarting an Arnoldi decomposition (see §2. . one applies m—k implicit QR steps with shifts «t. Chapter 4.

. A particularly satisfying aspect of the Krylov-spectral decomposition is that the deflation of Ritz pairs is uncoupled.3. it can be set to zero and the decomposition deflated. when exact shifts are to be used in restarting. For large n this makes no difference in efficiency. If this pair is moved to the beginning of the Rayleigh quotient. Krylov-spectral restarting The comments on the forward instability of the QR algorithm for Hessenberg matrices (p. But because the spectrum of A is real. Convergence and deflation If is the current Krylov-spectral decomposition. This number can be used as in §2. we will call this analogue the Krylov-spectral method. it will pay to use the analogue of the Krylov-Schur method.e. Consequently the residual norm for the ith Ritz vector is the magnitude of the ith component /% of 6. 3. then the primitive Ritz vectors are the unit vectors e. The second shows how to maintain semiorthogonality by timely reorthogonalizations.. KRYLOV SEQUENCE METHODS The major difference between implicit restarts for the Lanczos and the Arnoldi decompositions is that implicit symmetric tridiagonal QR steps are used in the former (see Algorithm 1. one can be more sophisticated about choosing filter polynomials than with the Arnoldi method. its component in 6 follows it. 325) apply to the tridiagonal Rayleigh quotient of the Lanczos algorithm. Chapter 3). The first half of this subsection is devoted to the definition and consequences of semiorthogonality. The fact that one Ritz pair has been deflated does not affect the later deflation of other Ritz pairs. Consequently. of course. if we suitably restrict the loss we can compute accurate Ritz values and partially accurate Ritz vectors.. SEMIORTHOGONAL METHODS We turn now to methods that allow the orthogonality of the Lanczos method to deteriorate. (i = 1. if this component satisfies the criterion for deflation. A method that performs reorthogonalization only when there is danger of the loss becoming too great is called a semiorthogonal method. eigenvalues can be moved around by simple permutations—i. m). We cannot. Consequently. The reduction to diagonal form is done by the symmetric tridiagonal QR algorithm. there is no need for special routines to exchange eigenvalues or eigenblocks.352 CHAPTER 5. Contrast this with the discussion on page 339. For more see the notes and references. allow orthogonality to be completely lost..3. However.. The chief difference between the Krylov-Schur and the Krylov-spectral methods is in the treatment of the Rayleigh quotient. Both general and exact shifts can be used in restarting the Lanczos algorithm. . Since the Schur decomposition of a symmetric matrix is its spectral decomposition. Since the Rayleigh quotient is diagonal.3 to determine the convergence of the ith Ritz pair.

we would say that U is orthogonal to working accuracy. Specifically. • Our definition of semiorthogonality is meant to be applied to the bases generated by Algorithm 3. However. there is nothing to keep us from using a larger value of Tmax — saY»to reduce the number of reorthogonalizations. • Choosing 7max to be about the square root of the rounding unit has the convenience that certain quantities in the Lanczos process will have negligible errors on the order of the rounding unit.1. will limit the attainable accuracy of our results. because they are normalized explicitly (statements 12 and 13). THE LANCZOS ALGORITHM Semiorthogonal bases We begin with a definition.1. Then U is SEMIORTHOGONAL if Three comments on this definition. Then If we write R = I+ W. 353 Definition 3. not the rounding unit. and let C = UTU. • Since the diagonal elements of C are one. they will be normal to working accuracy. where W is strictly upper triangular. we will tacitly assume that all our bases are normalized. The QR factorization of a semiorthogonal basis Let U be semiorthogonal and let be the QR factorization U. we have If 11/ . Let U eRnxh have columns of 2-norm one. However. Thus semiorthogonality requires orthogonality to only half working accuracy.SEC. Owing to rounding error. the members of these bases will not have norm exactly one. from the elementary properties of the 2-norm. 3. the definition says that U is semiorthogonal if the elements of / — C are within roughly the square root of the rounding unit. Since this small deviation from normality makes no difference in what follows. The price we pay is that the errors that were formerly at the level of the rounding unit are now of the order 7max» and this number. then Vol II: Eigensystems .C\\2 were of the order of the rounding unit.

we can expect the elements of d to be on the order of ^/CM . we have To get some notion of what this approximation means. these elements are insignificant and can be ignored. Let us suppose that we must reorthogonalize at step k. Then the vector d of reorthogonalization coefficients is Since \\QTq\\2 = 1 and RT £ /. We will now show that if our basis is semiorthogonal. For full reorthogonalization. The size of the reorthogonalization coefficients We have observed [see (3. . In the Lanczos process it is rare to encounter an extremely small /?£. Consequently.354 CHAPTER 5. KRYLOV SEQUENCE METHODS where the approximation on the right is to working accuracy. Let where q£lZ(Q) and q_i_£'R(Q)'L have norm one. destroying its tridiagonal structure. Let Uk be semiorthogonal and let Uk = QRbe its QR factorization. Since the elements of the difference Sk of the adjusted Rayleigh quotient Hk and Tk depend on the components of d [see (3. It follows that up to terms of 0(6 M ) the matrix W is the strictly upper triangular part of C and hence that In particular We will use these facts in the sequel.2)] that reorthogonalization in the Lanczos algorithm introduces elements in the upper triangular part of the Rayleigh quotient. note that 7 is the norm or the projection of Uk+i onto the column space of Uk and hence measures how w^+i deviates from orthogonality to 7£(£/&). Let Pk^k+i — Auk — akUk — fik-\uk-\ be the relation obtained after the Lanczos step but before the reorthogonalization. these elements may be too large to ignore. Now 7 must be larger than 7max> smce a loss of semiorthogonality is just what triggers a reorthogonalization.19)] we can expect it to be of the same order of magnitude.

it can be used to compute accurate Ritz values. its elements decrease as the vector converges. Since by (3. there are mitigating factors. The problem is that although a primitive Ritz vector w will be accurately determined by Tk (provided it is well conditioned) we cannot compute the corresponding Ritz vector QkW. Theorem 3. Thus the product of (R — I)w may be considerably less than y7^. they have the potential to alter the corresponding Ritz vectors. Let be the decomposition computed by Algorithm 3. w) is a primitive Ritz pair.5) we have At first glance this result appears to be discouraging.2. Nonetheless. However. so that most of its elements are much less than one. Moreover. Its proof is beyond the scope of this volume. The effects on the Ritz vectors Although the effects of semiorthogonality on the Rayleigh quotient and its Ritz values are benign. Now if (//. THE LANCZOS ALGORITHM The effects on the Rayleigh quotient 355 We have observed [see (3. Instead we must compute the vector UkW.1.SEC. 3. since it is these numbers that we use to decide on convergence.7) that Vol II: Eigensystems . Perhaps more important is the effect of semiorthogonality on the residual norms. as the following remarkable theorem shows.2)] that the Rayleigh quotient Hk = Tk + Sk is not the same as the tridiagonal matrix Tk computed by Algorithm 3. we see from (3.1 and assume that Uk is semiorthogonal. Let Uk — QkRk be the QR factorization ofUk. only some of the elements ofR-I are of order y^.Then The theorem says that up to rounding error. since it says that the errors in the Ritz vector can be as large as y^. Although w has 2-norm one. Tk is a Rayleigh quotient associated with the space spanned by Uk.Thus we can expect the Ritz values from Tk to be as accurate as those we would obtain by full reorthogonalization.

Suppose that we have computed Ck and we want to compute Ck+i • Write the jth column of (3. Another advantage of saving this matrix is that if a particular Ritz pair stops converging.7) to derive a recurrence for the 7^-. since Hk is tridiagonal with a few spikes above the diagonal in the columns corresponding to reorthogonalization steps. the above analysis shows clearly that we cannot rely on the usual estimate \ftk^k\ to decide on convergence. Unfortunately. However.and HJ. The residual does not have to stagnate once it reaches -)/€M. the comments about eigenvectors apply here also. we will leave this derivation to the end of the section and assume that Sk is available whenever we need it. Let Uk be the Lanczos basis computed by Algorithm 3.7) in the form where hj is a quantity of order || A||eM that accounts for the effects of rounding error in the relation.. since it may be smaller than the right-hand side of (3. it has a QR factorization QR with R = I.1 and let Then the size of the off-diagonal elements of Ck. we can use the inverse power method on Hk = Tk + Sk to correct the vector. For the sake of continuity. Estimating the loss of orthogonality Algorithms based on semiorthogonality allow the orthogonality to deteriorate until a vector Uf. It turns out that we can use the relation (3. Be that as it may.+i violates one of the semiorthogonality conditions at which point Uk+i is reorthogonalized.356 CHAPTER 5. measure the departure from orthogonality of i/. which are the inner products 7^ = ujuj. We will now show how to estimate the loss of orthogonality as the Lanczos algorithm progresses.8). KRYLOV SEQUENCE METHODS Since Uk+i is semiorthogonal. Although it is not expensive to compute and store Sk (or equivalently Hk). since it is the product SkW that determines the the inflation. Such a process is cheap. it is almost as expensive to compute the inner products ujuk+i as to perform a full reorthogonalization. Also write . Instead one must carry along the matrix Sk and use it in computing residual norms. the formulas for the columns of Sk are not easy to derive. Hence the residual norm is Thus the semilarge values in 5^ can inflate the residual norm.

It requires an estimate a of \\A\\2.2 implements this scheme. Algorithm 3. u]. the quantities qjhj and q^hk are unknown. k—2 we can compute all but the first and last elements of row k+l of Ck. THE LANCZOS ALGORITHM 357 Given the output of Algorithm 3. However. and Vol H: Eigensystems .11) we could obtain upper bounds on the loss of orthogonality.1.SEC. we can replace qjhj and q^hj. 3. .2: Estimation of uT . with aeM.We adjust the sign of this replacement so that it always increases 7^+1. they are also on the order of aeM. subtracting..UkSj.9) by u^ and (3. and simplifying.The quantities u^UjSj and uJUkSk are in principle computable. What we need are estimates of the 7. q£hj. But these bounds grow too quickly to be useful: they will trigger reorthogonalizations before they are needed.The first element can be computed by leaving out the term /^-^j-i in (3. u k+l"J Multiplying (3. In exact arithmetic with accurate values of qjhk. where a is a suitable estimate of ||-A||2. Unfortunately. experience indicates that \\ht\\2 < \\A\\2€M. Consequently. the quantity 7^+1.j. Consequently. we obtain Letting j = 2. Algorithm 3. But because we are maintaining semiorthogonality.. this algorithm computes estimates of the inner products 7fc+i.j = u£+lv.. Because we maintain local orthogonality. we can absorb them into the corrections for qjhj and q^hk.* = uk+i uk IS °n the order of the rounding unit.11). Two comments. rp • By taking absolute values in (3.^ themselves.j.10) by uj.

There are three differences. • In Algorithm 3. Consequently. let Uk = QR be the QR factorization of Uk • Recall that we can write R = I+W. driven upward by the choice of sign of r. 1. KRYLOV SEQUENCE METHODS uJUkSk.j-i in the recursion (3. We may have to repeat the orthogonalization of v against Uk • We will treat these items in turn. the 7. we must perform a reorthogonalization step. The fact that we have to reorthogonalize both Uk and v follows from the three-term recurrence If either Uk or Uk+i fails to be orthogonal to Uk-i. the lack of orthogonality will propagate to tifc+2 and hence to the subsequent Lanczos vectors. The process of reorthogonalizing v and Uk against t^-i may affect the orthogonality of these two vectors. Hence the norm of the component of . This problem is also reflected by the presence of jk.j has a value greater than 7max.qL£ 1l(Uk}L.358 CHAPTER 5. the process is more complicated than a full reorthogonalization step. where q£n(Uk). Specifically. the terms of order a£M has little effect on the recursion. The fact that we may have to repeat the reorthogonalization of v is a consequence of the fact that we are using a semiorthogonal basis to perform the reorthogonalization. • In the basic recurrence. Because we have allowed our basis to deteriorate. This means we must replace j3k in the algorithm by \\v\\2. To see what is going on. where to working accuracy W is the strictly upper triangular part of Ck = U^Uk [see (3. The reason seems to be that as the loss of orthogonality grows. Hence the lost orthogonality must be restored.1 we do not test the vector Uk+i but its unnormalized counterpart v. we only need the row k+lto decide when to reorthogonalize uk+i. We must reorthogonalize both uk and v against Uk-i • 2.1.\\q\\2 = \\q±. Moreover. The substitution of upper bounds on these quantities turns the 7^ into an estimate. and \\u\\2 .4)]. Suppose we are trying to orthogonalize a vector of the form u = iq + rqt. the row k+l of C depends only on rows kth and k—1. We must reorthogonalize v against Uk3. Periodic reorthogonalization The ability to estimate the deviation of the Lanczos vectors from orthogonality enables us to be precise about the test in statement 9 in Algorithm 3. at any time we need only store two rows—the current row and the preceding row. experiments show that the 7^ calculated by this algorithm track the true values rather well.j would be exact.11) for the orthogonality estimates.\\2 = 1The orthogonalized vector is u = (I — UU^)u. Nonetheless. if for some j the estimate jk+i.

our reasons for doing so are different. It follows that the reorthogonalization step will tend to reduce the component by a factor of ^/e^. Algorithms 3. as the above analysis shows. The reason is that the reorthogonalization does not significantly change its norm. In the semiorthogonal Lanczos method. When should we reorthogonalize? The size of the component of v in 1^(11k) is T ||<2 -y||2. the danger is that cancellation will magnify otherwise negligible rounding errors. Now the vector v of reorthogonalization coefficients is x = U^v — RQTUk+i. Here are some comments. it may require another orthogonalization. 3. which is done by the modified Gram-Schmidt algorithm. after the process the elements 7^ and 7jt+i. But we only perform a reorthogonalization if 7 is near y^ or larger. they have a limited capacity to reduce unwanted components—even when there is no rounding error involved.4 pull things together. To see this. Owing to limitations of the page size. where the vectors are orthogonal to working accuracy.3 and 3. THE LANCZOS ALGORITHM u lying in 7l(Uk) is the norm of 359 Now ||QTw||2 is the size of the component of it in TZ(Uk)i and (7! is the size of the corresponding component of u. Since reorthogonalization leaves us with a Uk and Uk+i that are orthogonal to the columns of Uk-i to working accuracy. Hence if ||QT^||2 > y^.j should be some multiple of the rounding unit— say -\/n€M. • The reorthogonalization. In the Arnoldi method. the basis vectors are not orthogonal. Thus we can use x/||?. let [in the notation of VolII: Eigensystems . and the cure is to repeat the orthogonalization. Reorthogonalization also solves the problem of how to restart the recursion for the jij. we have ||ar||2 = ||QTv||. the first reorthogonalization step may have failed to eliminate the component. Thus. But they are essentially one algorithm.This is not a problem for Uk. The details will depend on the size of the problem and the choice of backing store. It is worth noting that although we perform reorthogonalizations in the Arnoldi method.SEC. assumes that the previous Lanczos vectors are available. But since v is not semiorthogonal. we have been forced to separate the outer Lanczos loop from the reorthogonalization.||2 to decide whether to reorthogonalize again. it is possible for one reorthogonalization to fail to purge the components along H(Uk). since in this case by semiorthogonality 7 is well below y^. and. • Note that the algorithm does not renormalize Uk. and since R is near the identity matrix.

Algorithm 3.7)1. The reorthogonalization is done in Algorithm 3.3: Periodic orthogonalization I: Outer loop o .4. The algorithm also updates the Rayleigh quotient Hk = Tk + Sk [see (3.360 CHAPTER 5. It is assumed the last two rows of the matrix Ck estimating U^+1 Uk+i are available. KRYLOV SEQUENCE METHODS This algorithm performs Lanczos steps with periodic reorthogonalization.

THELANCZOS ALGORITHM 361 This algorithm replaces statements 12-14 in Algorithm 3.4: Periodic reorthogonalization II: Orthogonalization step o Vol II: Eigensystems .3 Algorithm 3.SEC. 3.

13).362 CHAPTER 5. since the storage is no more than that required for a set of primitive Ritz vectors. we have 72 + r2 = 1. • We have chosen to compute the Rayleigh quotient Hk instead of Sk.from the diagonal and the super. It is not at all clear that this is worthwhile.12)] Uk = 7#-f T<7j_. which must be computed to test for convergence. Since the reorthogonalized vector is rg_L.14) and (3.16) and define Hk by . we can obtain Sk by subtracting the a.and subdiagonals of Hk• Since reorthogonalization is an exceptional event. KRYLOV SEQUENCE METHODS (3.18) into (3. it has norm one to working accuracy.Since by definition Sk = Hk — Tk. Updating Hk We must now derive the formulas for updating Hk after a reorthogonalization. The results of the reorthogonalization replace Uk with and v with Solving (3. most columns of Hk will be zero above the first superdiagonal.17) and (3. Before the reorthogonalization we have the relation which holds to working accuracy. Hence r = 1 to working accuracy. By semiorthogonality. Instead of allocating storage to retain the zero elements. 72 < €M. and /?..15) for Uk and v and substituting the results in (3. Since \\Uk\\2 = 1. we get Also If we substitute (3. one can store only the nonzero elements in a packed data structure.

Thus a knowledge of the adjusted Rayleigh quotient enables us to compute residual norms without having to multiply by A. THE LANCZOS ALGORITHM then it is straightforward to verify that 363 which is a Krylov decomposition involving the new vectors iik and v. it does not matter if we use Tk or Hk to estimate its norm. Consequently. suppose that (/z. But for smaller residuals. but the following analysis does not require it. until the residual falls below y7^. we should use Hk. If we do not require very accurate eigenvectors. From the relation we have thai where It follows that the residual norm p = \\AUkW — ^UkW\\2 satisfies where bv semiorthoeonalitv Thus estimates p to at least half the number of significant figures carried in the calculation. These adjustments have been implemented in statements 21-23 of Algorithm 3. Specifically. Hence the difference between the adjusted Rayleigh quotient Hk and theLanczos matrix T* is also 0( y^). which by semiorthogonality are bounded by 0(^l/€^).19) consist of coefficients from the reorthogonalization step. Computing residual norms A knowledge of Hk makes the computation of residual norms easy.We make no assumptions about how this pair was obtained—in most cases (ju. Specifically. Ukw) is an approximate eigenpair in the space spanned by Uk. 3.SEC. Vol II: Eigensystems . we can avoid accumulating H^. w) will be an eigenpair of Tk. the corrections in (3.4.

Various statistics are summarized in the following table.9e-02 2.20). This means that in spite of the contamination ofTk by the reorthogonalization process.0e-15 c2 3.38. cl: ||Tjb .33. The numbers in the first column illustrate the results of Theorem 3.4e-12 1.2.QTAQ\\2.5e-07 1. .9e-15 3. The third and fourth columns show that the classic estimate for the residual norm deviates from the true residual norm. Wg ') is the fifth eigenpairofthe Lanczos matrix Tk.17. The second column shows that the corrections in Algorithm 3. Orthogonalizations took place for k = 11.20) is a reliable estimate of the residual norm. its eigenvalues converge to eigenvalues of A. The tridiagonalLanczos matrix Tk is to working accuracy the Rayleigh quotient formed from the Q-factor ofUk.28.364 An example CHAPTER 5. in which we are concerned with the approximate eigenpairs (/4 .9e-15 3.3e-15 6.9e-02 2.9e-15 c3 1. c4: The true residual norm.8e-16 8.9e-15 5. as noted above. The fifth column shows that the approximate residual norm (3.5e-07 1.5e-07 2. KRYLOV SEQUENCE METHODS We conclude this subsection with an example that illustrates the behavior of periodic reorthogonalization in the Lanczos algorithm.9e-02 2. where Q is the Q-factor of Uk.4 produce a Krylov decomposition whose residual is of the order of the rounding unit. k 10 20 30 40 cl 3.5e-15 7.22.4e-12 The columns of the table contain the following quantities. where the pair (^5 . In particular the latter stagnates at about 10~~12. Example 33.4e-12 1. whereas the former continues to decrease. c5: The residual norm estimated by (3.4e-12 c5 1. c2: The global residual norm c3: The classical estimate \(3ke£w$ '\ for the residual norm. In this example A is a diagonal matrix of order n = 500 whose first eigenvalue \\ = 1 and whose other eigenvalues are given by the recurrence The periodically reorthogonalized Lanczos algorithm was run on this matrix with a random starting vector and 7max = -^/W~l6/n.9e-15 3. But the two values diverge only after the true residual norm falls below y^.9e-19 c4 1. Ukw^ ').

since his vectors could be combined to produce good approximations to eigenvectors before the algorithm ran to completion.1959]. Krylov-spectral restarting follows from our general treatment of Krylov decompositions. Chapter 4. 3. it is so viewed in an early paper of Wilkinson [296. For an error analysis of full reorthogonalization. which unifies these methods. THELANCZOS ALGORITHM 365 Perhaps the most important point of this example is that Ritz vectors computed from the matrix T^ can fail to converge. the simple form of Hk makes it easy to apply a step of the inverse power method to the eigenvectors of Tk to obtain the primitive Ritz vectors of Hk. periodic reorthogonalization to Grcar [97]. see [6] and [7]. was the first to come to grips with the need for reorthogonalization. Special mention must be made of an elegant paper by Simon [237].Fortunately.SEC. For more on this topic. Wu and Simon [306] introduce it impromptu under the name of thick restarting.1971] and other papers [195. Another possibility is to use Leja points or their approximations. The acceptance of the Lanczos method as an iterative method is due to Kaniel [145. Lanczos knew that his method was in some sense iterative. Semiorthogonal methods The semiorthogonal variants of Lanczos's algorithm all appeared between 1979 and 1984. 3. Full reorthogonalization The need for full reorthogonalization has been generally cited as a defect of the Lanczos method. 198. which. Only when Sorensen showed how to restart the Arnoldi process (see §2) did the fully reorthogonalized method become a reasonable option.4. where the eigenvalues in question lie on the real line. see [195]. 197. However. But for some time the method was chiefly regarded as an exact method for tridiagonalizing a symmetric matrix. Filter polynomials For Arnoldi's method exact shifts are generally used in restarting because the alternatives are cumbersome. who in his thesis [196. there is no cure except to compute Ritz vectors from the adjusted Rayleigh quotient Hk.4. For example. Selective reorthogonalization is due to Parlett and Scott [209]. 199] laid the foundation for much of the work that was to follow. incidentally. Vol II: Eigensystems . inexact shifts are a real possibility. NOTES AND REFERENCES The Lanczos algorithm As was noted in §3. 1966] and especially Paige. In the Lanczos method. For example. and partial reorthogonalization to Simon [238]. When this happens. one could use the zeros of a Chebyshev polynomial to suppress the spectrum on a given interval. and a good deal of research has been devoted to finding a way around this problem.

2 is due to Simon [237]. and the reader is referred to the treatment of Simon [238]. . 238]. The singular value decomposition The singular values and vectors (0-.2 are above 7max. however. Selective reorthogonalization is based on a celebrated result of Paige [196].Although the idea is simple..11) is implicit in Paige's thesis [196]. The recurrence (3. Chapter 3. who shows that it applies to any basis generated by a semiorthogonal method. Then there are constants 7^ of order of the rounding unit such that Now by (3.20) is zero]. the implementation is complicated.. who also gives empirical grounds for believing that the quantities calculated in Algorithm 3. and the reader should look for his presence in the above references. but as the analysis in §3. No reorthogonalization It is possible to use the Lanczos algorithm to good effect with no reorthogonalization. Cullum and Willoughby [46. Broadly speaking. A package LANCZOS is available from Netlib.19) are new. A package LANSO implementing periodic reorthogonalization is available from Netlib. The formulas (3. KRYLOV SEQUENCE METHODS Periodic reorthogonalization is the most straightforward of the methods. Partial reorthogonalization is based on the observation that one only needs to reorthogonalize v against those vectors U{ for which the bounds computed by Algorithm 3. What happens is that after a loss of orthogonality previously converged Ritz values may reappear and spurious Ritz values may be generated. its explicit formulation and computational use are due to Simon [237. For another recurrence see [143]. selective reorthogonalization consists of saving converged and nearly converged Ritz vectors and orthogonalizing the Lanczos vectors against them.. \(3k^Wj \ is the residual norm for the Ritz vector [remember that for Ritz vectors r in (3. The fact that semiorthogonality can cause the convergence of Ritz vectors to stagnate does not appear to have been noticed before (although in the context of linear systems the possible untoward effects are hinted at in [237] and [205]).2. It follows that Uk+i can have large components in 11(1}k) along only Ritz vectors that have almost converged. Let zj (j = 1. A package LANZ implementing partial reorthogonalization is available from Netlib. But as is usual with Lanczos methods. the devil is in the details.20). Theorem 3.2 actually track the loss of orthogonality. This works well for the dominant singular values. k) be the Ritz vectors computed at step k of the Lanczos process and let Wj be the corresponding primitive Ritz vectors. u^ Vi) of a matrix X can be calculated by applying the Lanczos algorithm to the cross-product matrix X^X. shows it may fail to .366 CHAPTER 5..48] have shown how to detect the spurious Ritz values.

Lewis. But it can fail in other. For more see [79. more serious ways. this reduces to the usual Lanczos method. since they are interior eigenvalues. 275]. 4. The basic idea is to transform the generalized eiVol II: Eigensystems . and Liu [210]. and Ruhe [219]. then the Rayleigh quotient Tk = VJ^AVk is tridiagonal.SEC. When A is symmetric and u\ — v\. The cure is a look-ahead Lanczos recurrence.280]. Chapter 4. Such algorithms have been proposed by Cullum and Donath [45]. The method shares with the Lanzcos method the problem of loss of orthogonality (or. Gutknecht [103. For more see [21. that gets around the failure by allowing the Rayleigh quotient to deviate locally from tridiagonality. 89].104] gives a full treatment of the theory of the bi-Lanczos method. Software is available in SVDPACK at Netlib. It turns out that if one orthogonalizes a block Krylov sequence one gets a block tridiagonal Rayleigh quotient. and it follows that the columns of Uk and 14 satisfy three-term recurrence relations. one in A and the other in A1. THE GENERALIZED EIGENVALUE PROBLEM In this section we will consider the use of Krylov methods to solve the generalized eigenvalue problem Ax = \Bx. biorthogonality). Taylor. 47. strictly speaking. An alternative is to apply the method to the matrix whose eigenpairs are However.3. proposed by Parlett. The idea was to generate two Krylov sequences. we pointed out that a Krylov sequence cannot see more than one copy of a multiple eigenvalue and indicated that one way around the problem is to work with block Krylov sequences. 105. The bi-Lanczos algorithm The original Lanczos algorithm [ 157] was actually a method for general non-Hermitian matrices. convergence to the smaller eigenvalues can be expected to be slow. 4. THE GENERALIZED EIGENVALUE PROBLEM 367 give sufficiently accurate results for the smaller singular values and their vectors. which is particularly strong on implementation issues. Block Lanczos algorithms In §3. If we denote bases for these sequences by Uk and Vk and require that they be biorthognal in the sense that V^Uk = /. The resulting generalization of the Lanczos algorithm is called a block Lanczos algorithm. Golub and Underwood [92. For the use of shift-and-invert enhancement in a block Lanczos algorithm see the paper of Grimes. and Simon [98].

For example. we need to factor a matrix—B in the above example. 4. but that is another story. The products = Cu can then be computed as follows. A second issue is how to determine convergence. Unfortunately.2 we will show how we can replace orthogonality in the Arnoldi and Lanczos algorithms with B -orthogonality to obtain new algorithms that are specially tailored to this form of the generalized eigenproblem. TRANSFORMATION AND CONVERGENCE Transformation to an ordinary eigenproblem The first step in applying Krylov sequence methods to a generalized eigenproblem Ax = \Bx is to find an equivalent eigenproblem Cx — fix with the same right eigenvectors. we will call C the generalized shift-and-invert transform.1. In §4. and write the problem in the form Then Hence if we set then Because the transformation from A to /x is the same as for the ordinary shift-and-invert method. (The alternative is to use an iterative method. In many applications the matrix B is positive definite. In §4. if B is nonsingular we can take C = B~1A. . since the usual residual criterion concerns the transformed problem.368 CHAPTER 5. To see how this is done let K be a shift. we can transform to an ordinary eigenproblem and get the benefits of shift-and-invert enhancement for the cost of a single factorization. KRYLOV SEQUENCE METHODS genvalue problem into an ordinary eigenvalue problem and then apply some variant of the Arnoldi or Lanczos method. An important issue here is the nature of the transformation. if we define C properly. As it turns out. we need to make each one count. But when factorizations are expensive.) When the matrix in question is easy to factor. this presents no problems.1 we will discuss these issues. not the original. all transformations require that a linear system be solved to compute the products Cu required by Krylov methods. To solve the linear system.

stop when the backward error is sufficiently small—say some multiple of the rounding unit. THE GENERALIZED EIGENVALUE PROBLEM 369 The eigenvalues of the pencil (A. Chapter 2. 4. Let (X. Chapter 2. see the comments following Theorem 1. • The bounds are optimal in the sense that there are no matrices E and F satisfying for which where at least one of the inequalities is strict. We will proceed in two stages. Theorem 4. Residuals and backward error We turn now to the problem of convergence criteria.SEC.3.1. Namely. Since the residual is in general computable. Thus the generalized shift-and-invert transformation requires only a single factorization. If the matrix A — KB can be factored. We will then show how to use the residual Cx — Xx to bound the norm of r. we can use this theorem to determine convergence. B) that are near K become large. the product v = Cu can be efficiently computed by the following algorithm. For more on the use of backward error to determine convergence.3. and let where Then and Proof. First we will establish a backward perturbation theorem for the generalized eigenproblem Ax = \Bx in terms of residual r = Ax .XBx. Vol II: Eigensystems .XB and let Let || • II denote the Frobemus or 2-norm. so that they will tend to be found first by Krylov methods.x)bean approximate eigenpair of the pencil A . Direct verification. The backward error theorem is an analogue of Theorem 1.

fix.K. if p is less than some tolerance. compute a Cholesky factorization R^R of B and apply the Lanczos algorithm to the symmetric matrix R(A .370 Bounding the residual CHAPTER 5. We could. We will then introduce the Arnoldi and Lanczos algorithms with respect to the B inner product. and simplifying. KRYLOV SEQUENCE METHODS When we use. of course. the residuals we calculate are of the form s = Cx .KB)~1R^. we will have \\A\\ + \K\\\B\\ = \\A\\ + |A|||5||. POSITIVE DEFINITE B A difficulty with applying the generalized shift-and-invert transformation to the S/PD pencil (A.XBx. where the normwise relative error in A and B is bounded bv Consequently. However. we will indicate some of the things that can happen when B is singular or ill conditioned.1 that (A. What we really want is a bound on the norm of the residual r = Ax . This criterion has the drawback that we need to know the norms of A and B. 4. we can declare the pair (A. . which is similar to C. = (X — «)-1.B. We will begin with a brief treatment of the B innerproduct.B)~1B is no longer symmetric and it appears we are forced to use the Arnoldi method rather than the more efficient Lanczos method to solve it. x) is anexacteigenpairof pair(A. Unfortunately. we get Hence It follows from Theorem 4. we can restore symmetry of a sort by working with a different inner product generated by B. which may not be available. the Arnold! method with a generalized shift-and-in vert transformation C = (A . this procedure requires two factorizations—one of B and the other of A — K. Finally.B)~1B.2. B) is that the matrix C = (A — K. Fortunately. To get such a bound multiply s by A — KB to get Dividing by /z. ar) converged. where fj. Hence. say. Consequently. in practice we will be looking for eigenvalues near the shift*. we may take as an approximate bound on the backward error. B).

however.2. if A is symmetric. VolII: Eigensystems . then the generalized shift-and-invert transform (A K. which is how functional analysts expect their inner products to behave.B)~1B is B-symmetric.#)B. if we define then(/iz. 371 Definition 4. This makes no difference when we are working over the field of real numbers. Here are three comments on this definition. In the complex case. y)s — fi(x. The latter proof is essentially the same as for the 2-norm and proceeds via the R-f'aiir'l'iv inrniiialitv • There is a curious inversion of the order of x and y as one passes from the left.B = n(xjy)Bynd(xjfj.y)B — ^(#. 4. THE GENERALIZED EIGENVALUE PROBLEM The B inner product We begin with some definitions.2/)..to the right-hand side of (4.2). y]B-> which is not standard.e. • The reader should prove that the B inner product is actually an innerproduct—i. Then the B INNER PRODUCT is defined hv The 5-NORM is A matrix C is ^-SYMMETRIC if for all x and y. The alternate definition—(x. y)s = x^By — would give (//or. the order matters. that it is a symmetric. • It is easily verified that: The matrix C is B symmetric if and only ifBC is symmetric. bilinear function—and that the -B-norm is actually a norm. definite. Let B be a positive definite matrix. In particular.SEC. Specifically.

5. 2. However. B-Orthogonalization In what follows we will need to J9-orthogonalize a vector v against a set u\. 2. . 4.. 4. On the other hand. 3. 3. v = v — hj*Uj 6.. Uk of j5-orthonormal vectors. KRYLOV SEQUENCE METHODS It is natural to say that two vectors u and v are B-orthogonal if (w. 1.v)B 3. 1. This can be done by a variant of the Gram-Schmidt algorithm in which the B inner product replaces the ordinary inner product. All of the usual properties and algorithms associated with orthogonality carry over to Borthogonality. Unless we are careful. for j = 1 to k Wj = B*UJ hjj — wj*v j v — v — hj*Uj end for j This program requires k multiplications by B. The following easily proved result will be important later. notice that the the notation (z.. we can perform unnecessary multiplications. 1. end for j . V)B = 0. end for j 4. for j — 1 to k hj = (ujJv)B v = v — hj*Uj end for j When we recast this algorithm in matrix terms we get the following. Thus postmultiplication by an orthogonal matrix (but not a ^-orthogonal matrix) preserves 5-orthonormality. y)s conceals a multiplication of a vector by B.372 Orthogonality CHAPTER 5. consider the B variant of the classical Gram-Schmidt algorithm. forj = 1 to k 2. If U is square it is Borthozonal. consider the following B variant of the modified Gram-Schmidt algorithm. To see this. hJ = (uj. We will say that U is B-orthonormal if U^BU — I. for j = 1 to k 5.

so that we can find the primitive Ritz vector v as an eigenvector of HkOn the other hand.Specifically.SEC. Now it is well known that the modified Gram-Schmidt algorithm is numerically superior to the classical Gram-Schmidt algorithm. where z = UkW.6) where u?k is the last component of w. There are two ways around this difficulty. Hence it is desirable to use the classical Gram-Schmidt algorithm whenever possible. The procedure amounts to substituting f?-orthogonalizationfor ordinary orthogonalization. Because reorthogonalization is always used in the Arnoldi process. Ritz pairs may be computed from the Rayleigh quotient Hk = U^BCUk.6) Vol Hi Eigensystems . then from (4. as suggested in (4. • Ritz pairs and convergence. In large problems multiplications by B may dominate the computation. we can use the classical Gram-Schmidt algorithm. if Hw = vw> then with z — Uw. The B-Arnoldi method The J?-Arnoldi method generates an Arnoldi decomposition of the form where C is the generalized shift-and-invert transform and Uk is #-orthonormal. • Orthogonalization. to save multiplications by B. if Cz = vz. 4. It follows on premultiplication by U^B that EkM = isw.5). Here are some details. we have from (4. THE GENERALIZED EIGENVALUE PROBLEM In matrix terms this algorithm becomes 373 This version requires only one multiplication by B.

we can determine convergence of Ritz pairs directly from the decomposition and the primitive Ritz vector without having to compute the residual vector. the J?-Arnoldi method for the generalized eigenproblem with shiftand-invert enhancement is essentially a minor modification of the standard Arnoldi method. (In our terminology the transformed decomposition is a #-orthonormal Krylov decomposition. However. this quantity is one. suppose we have transformed our decomposition into the form Then we can deflate the pair (*/. u\) by setting #21 and 7^+1. In the . Let Q be orthogonal. ||WAT+I||B is one. .3). and ||wjt+i||2 will generally be different from one. Ukw] is Thus. KRYLOV SEQUENCE METHODS Hence the residual 2-norm of the Ritz pair (*/.In the ordinary Arnoldi method. however. as usual. It should be noted that when we are deflating Ritz values the vector #21 is zero. • Deflation. in the restarted Arnoldi method this matrix will be in main memory and we can use the bound to compute a cheap bound. Deflation is somewhat more complicated than restarting. • Restarting.#-Arnoldi method. so that all we need to calculate is ||Uk+i \\2> To summarize.0ueT to while preserving the 5-orthonormality of UQ. Note that (4. we can transform the J?-Arnoldi decomposition CU = UH .374 CHAPTER 5. The error introduced into the decomposition by this deflation is The problem is that we must estimate ||(£/2 Wjt+i)||2. This means that both implicit and Krylov-Schur restarting can be applied to the B-Arnoldi method without change. By (4. To see what is involved.7) contains || wjt+i ||2. which is a rather large matrix. the main difference being the need to compute the matrix-vector products of the form Bv.1 to zero.) Thus we are free to perform orthogonal similarities on the Rayleigh quotient H^.

2 are unchanged. The B -Lanczos method with periodic reorthogonalization In §3. To see this. The estimates of the deterioration of orthogonality in Algorithm 3. we get Bx — 0. Semidefinite B Since the generalized shift-and-invert transform C = (A — K. We will now make the simplifying assumption that so that any vector z can be uniquely written in the form z — x+y. To see what is going on. We first observe that the null space of B and C are the same. On the other hand if a: is a null vector of C. (x + z. 4. and the J9-norm a seminorm. The B semi-inner products is blind to vectors lying in the null space of B — that is. the algorithms described above can be formally applied when B is positive semidefinite.4). BC is symmetric. rank(C) + null(C) = n. and since H is Hessenberg. to avoid forming matrix-vector products involving B. If the block size is small. where x € T^(C) and y € J\f(B). i. y) = (ar. so that x 6 N(B). we need a little geometry. On the other hand if the block size is large or multiplications by B are cheap. y) whenever z G N(B) — and this fact presents complications for our algorithms. THE GENERALIZED EIGENVALUE PROBLEM The restarted 5-Lanczos method 375 We have already noted that if A is symmetric then C = (A—KB) ~ 1 B is B-symmetric.K. although they now estimate deviation from B -orthogonality.3 we described the Lanczos method with periodic reorthogonalization. Multiplying by A .5). Just as the restarted B-Arnoldi method is a slightly modified version of the usual restarted Arnoldi method. so the restarted J9-Lanczos method is a modification of the ordinary restarted Lanczos method (see §3. For if there is an x such that Vol II: Eigensystems . it may be more efficient to process each block by the classical GramSchmidt algorithm as in (4.2). the best we can hope for is a block modified Gram-Schmidt algorithm. In this case the B -inner product is said to be a semi-inner product..e. as suggested in (4. Thus the 5-Arnoldi method becomes a 5-Lanczos method.B. Thus we cannot implement the classical Gram-Schmidt algorithm in the B -inner product.nB)~lBx = 0.B)~l B does not involve the inverse of B. it must be tridiagonal. the columns of Uk must be brought in from backing store in blocks that fit in main memory. note that if z is a null vector of B then # is a null vector of C. it makes sense to store the columns of BUk.null(B). When it is necessary to reorthogonalize. It follows that the Rayleigh quotient H = U^BCU is symmetric. and rank(C) = n . then (A . Once again the B version of the method can be obtained by obvious modifications of the original method.) The assumption is always true when C is nondefective. (This decomposition of z into x + y is not an orthogonal decomposition.SEC. The result now follows from the fact that for any matrix C of order n.

. . 2. Even if a primitive Ritz vector w is calculated accurately. Then it follows that and The contamination of the Rayleigh quotient is not a problem. lie in 7£((7). something that can happen only if C has a nontrivial Jordan form. the null-space error may contaminate the Ritz vector UkW. other than the null vectors of B. To analyze these possibilities. Mathematically. Since Uk+i is a linear combination of ^i.Thent^ £ 7£(C). be. from (4. Now the eigenvectors of C.11) that Hk is essentially the Rayleigh quotient with respect to a set of vectors uncontaminated by contributions from M(B). say. a random vector and let HI = Cv/\\Cv\\B. . The null-space error in Uk may somehow effect the Rayleigh quotient Hk. this can be done as follows. Now suppose that wi. KRYLOV SEQUENCE METHODS Cx is nonzero and lies in J\f(B). . . There are two potential dangers here. Since the B -inner product cannot see vectors in N(B\ it is possible for the Lanczos or Arnoldi process to magnify these small errors. at each stage rounding error will introduce small errors lying in N(B). .. Specifically. 1. then C2x — 0. our Krylov subspaces lie in 7£(C). Uk and Cuk. In practice. and such magnifications are empirically observed to happen.10) we have It then follows from (4. let us suppose that we have the J?-Arnoldi decomposition where Fk is an error matrix of order of the rounding unit that accounts for rounding errors made during the Arnoldi process. Lett.. Write where the columns of Uk are in T^(C) and the columns of Vk are in M(B). if we are going to use a Krylov method to calculate eigenvectors of B we must insure that the Krylov vectors lie in 7£(C). Thus if we start with a vector in 7£(C).376 CHAPTER 5. Consequently. so that the primitive Ritz vectors are calculated inaccurately. it must also lie in 7£(<7). « * lieinT^(C)..

3. In it P denotes the projection ontoJ\f(B) corresponding to the decomposition R n = T^(C} + Af(B) [see (4. The following example illustrates this phenomenon. let and In this example we apply the B-Arnold! method to C with n — 500 and m = 400.95k~l). But |/3^^|||w||2 is the residual norm of the Ritz pair (i/. The following table summarizes the results. From (4. . Hence the null-space error in UkW is small in proportion as the cmantitv is small.SEC. 4. The starting vector u\ is of the form Cr/\\Cr\\s. THE GENERALIZED EIGENVALUE PROBLEM 377 It turns out that the errors Vk do not contaminate the Ritz vectors either—at least asymptotically. Ukw). w) be a primitive Ritz pair. Form > n/2. Hence. To see this let (V.. Vol II: Eigensystems .. Example 4.8) and the following discussion].9) it follows that where. .. as usual. Uk is the last component of w. as the Ritz pair converges. the null-space error will go to zero. where r is a random vector. .95. LetDk = diag(l. Each multiplication by C was followed by the addition of a random perturbation on the order of the rounding unit. Now the left-hand side of this relation has no components in M(B).

. but as the residual norm decreases so does the component.14) by g[l:Ar. Then By definition Uk-i — UkQ[l'k.1 :k—IJ is an unreduced Hessenberg matrix. Hence if we multiply equation (4. We will return to the remaining columns later. Initially the component in the null space increases. LetAUk-i = UkHk be the decomposition resulting from one step of zero-shift implicit restarting. The third column contains the norm of the component in N(B) of the Ritz vector corresponding to the fifth eigenvalue ofC.4. where Hk is unreduced. This puts a limit on how far we can extend an Arnoldi or Lanczos sequence. we can add to it any matrix V in N(B) such thatVfh. Instead. which is seen to grow steadily as k increases. Proof.15) is an unreduced Arnoldi decomposition. If Hk — QR is the QR factorization of Hk. Thus (4.15) is the decomposition that results from one step of zero-shifted implicit restarting. The fourth column contains the product \ftk^k\> which along with \\Uk+i\\2 estimates the residual norm. Define Uk — Uk+iQ. the first column of Uk is a multiple ofCui.378 CHAPTER 5. KRYLOV SEQUENCE METHODS The second column contains the norm of the component ofUk+i in N(B). So far all is according to theory. Other examples of rapid growth of null-space error may be found in the literature.l:A?-l]. it also holds for the #-Arnoldi process. = 0. (4. • Although we have proved this theorem for the ordinary Arnoldi process. even when B is semidefinite. Because Hk is unreduced Hessenberg. from (4. note that the null-space error (column 2) ultimately increases far above one. l:k—l]. Moreover. The chief difference is that Uk+i is no longer unique. It is based on the following characterization of the effects of a step of implicit restarting with zero shift. Hence by the uniqueness of the unreduced Arnoldi decomposition. Theorem 4.weget Since R is nonsingular and Q [1:fc. However. Qk is unreduced Hessenberg and Rk is nonsingular. since in finite precision arithmetic the null space error will ultimately overwhelm any useful information in the Krylov vectors. The following suggests a way of purging the unwanted components from the entire sequence.14) the first column of Uk satisfies Since m ^ 0. Let AUk = Uk+\ Hk be an Amoldi decomposition. then (up to column scaling) Uk = UkQ. l:k—l] is also unreduced Hessenberg. H k-i = RQ[l:k.

4.3. For this to happen it is essential that . Unfortunately. at present we lack a reliable recursion for the null-space error. The fifth column is the null-space error in Uk+iQ. 319]. requires a factorization of B.13) illustrate the effectiveness of this method of purging null-space error. so is the right-hand side—i. so that the the initial factorization of B can be prorated against the several factorizations of A — KB). 4. NOTES AND REFERENCES The generalized shift-and-invert transformation and the Lanczos algorithm The idea of using a shift-and-invert strategy for large generalized eigenvalue problems arose in connection with the Lanczos process. one of B and one of A — KB (however. The reason is that the null-space error in Uk (column 2) has grown so large that it is destroying information about the component of Uk in 7l(C}.SEC. Ericsson and Ruhe [72] proposed applying the Lanczos algorithm to the matrix R(A — KB)~1R^. It helps that we can tell when the null-space errors are out of hand by the sudden growth of the Arnoldi vectors. They called their transformation "the spectral transformation. p. With the Lanczos algorithms. The last two columns in (4. For the first 50 iterations it remains satisfactorily small. the authors assume that one will use several shifts." Here we have preferred the more informative "shift-and-invert transformation. Since in practice the Arnoldi method must be used with restarts. since in the presence of rounding error it will magnify the residual error Fk in the decomposition. showing that it required no factorization of B. The cure is to perform periodic purging in the same style as periodic reorthogonalization. thereafter it grows. B]. This approach has the difficulty that it requires two factorizations. citing Vol II: Eigensystems . in which the Lanczos vectors are B -orthogonal. He also gives a version of the Lanczos algorithm for B~l A. Scott [236] applied Parlett's algorithm to the pencil [B(A. who rejects it. including a zero shift among the restart shifts may suffice to keep the errors under control. which.9) we have 379 Since the left-hand side of this equation is in 1Z(C).e.. although some experimentation may be required to determine the longest safe restarting interval. The #-Arnoldi and Lanczos methods with semidefinite B are imperfectly understood at this time. where B = R^R is the Cholesky factorization of B." The transform (A-KB)~1B appears implicitly in a book by Parlett [206.KB)~1B. However.ft"1 not be large. however. The chief problem is that the null-space error in the Arnoldi vectors can increase until it overwhelms the column-space information. the problem of purification is analogous to that of reorthogonalization—it requires access to the entire set of Lanczos vectors. (He also says that the algorithm is not new. one step of zero-shifted implicit restarting purges the B-Arnoldi vectors of null-space error. THE GENERALIZED EIGENVALUE PROBLEM The application of this theorem is simple. which is needed to know when to purge. From (4.

He also considers the case of semidefinite B. Ericsson. the semidefinite B-Arnoldi algorithm. but for a different purpose. Meerbergen and Spence [176] treat the nonsymmetric case—i. They point out that the null-space error in the Lanczos vectors can grow and give an example in which it increases 19 orders of magnitude in 40 iterations.1 follows from generalizations of a theorem of Rigal and Caches [216] for the backward error of linear systems and is due to Fraysse and Toumazou [77] and Higham and Higham [112].#-Lanczos algorithm is by Nour-Omid. Finally.) The fact that the purging becomes unnecessary as a Ritz vector converges appears to have been overlooked.12) can be used to purge a Ritz vector of null-space error.380 CHAPTER 5. The hypothesis (4. He credits Parlett with recognizing that the algorithm can be regarded as the Lanczos algorithm in the B-irmer product. Residuals and backward error Theorem 4. which they combine with the purification of Nour-Omid et al. this pencil becomes (C.282]. and Jensen [184]. Parlett. Their example of the need for purification increases the null-space error by 13 orders of magnitude in 11 iterations. They introduce the elegant method of using zero-shift implicit restarting to purge the null-space error. They also point out that the formula (4.e. KRYLOV SEQUENCE METHODS two technical reports [183. His solution is a double purging of the Ritz vectors based on the formula (4. Semidefinite B The basic paper for the semidefinite .12).. the authors give a rounding-error analysis of their procedures. (The same formula was used earlier by Ericsson and Ruhe [72]. to handle the problem of overlapping null and column spaces. .) When reduced to a standard eigenvalue problem by multiplying by B~l.8) that 7l(C) and Af(B) have only a trivial intersection is violated in practical problems. in which C is what we have been calling the generalized shift-and-invert transform. /). Ericsson [71] considers these problems in detail for the Lanczos algorithm.

Subspace iteration is a block generalization of the power method. We next turn to methods based on Newton's method for solving systems of nonlinear equations. then the first term in the expansion dominates. SUBSPACE ITERATION In motivating the power method with starting vector UQ (§ 1. we get the Jacobi-Davidson algorithm. Since Newton's method requires the solution of a linear system. When | A21 is near | AI |. it sometimes wins the race. converges to x j. and Aku0j suitably scaled. the converges of the power method will be slow. When we combine Newton's method with Rayleigh-Ritz refinement. In this chapter we will consider two alternatives: subspace iteration and methods based on Newton's method—particularly the Jacobi-Davidson method. However. its implementation is comparatively simple. there are other approaches. these methods appear to be mostly suitable for dense problems. 1. But with some care they can be adapted to large sparse eigenprobems. It is an older method and is potentially slow.1. If and 71 / 0.6 ALTERNATIVES Although the most widely used algorithms for large eigenproblems are based on Krylov sequences. and it is easy to use. if we perform the power method on a second vector u'Q to get the expansion 381 . combined with shift-and-invert enhancement or Chebyshev acceleration. Chapter 2) we assumed that A was diagonalizable and expanded in terms of the eigenvectors Xj of A. with which this chapter and volume conclude. But it is very reliable. Moreover.

ALTERNATIVES we can eliminate the effect of A2 by forming the linear combination whose convergence depends of the ratio |As/Ai|. the power subspace contains approximations to other dominant eigenspaces. use a Rayleigh-Ritz procedure. and the extraction of Schur vectors. But in fact we are only interested in the space spanned by AkUo. since they all tend toward the dominant eigenspaces of A.3 we will treat the symmetric case and in particular show how to use Chebyshev polynomials to accelerate convergence. It turns out that power subspace 7£( A*'£/o) generally contains a more accurate approximation to x\ than could be obtained by the power method. it is more natural to use a variant of the Rayleigh-Ritz method that computes the dominant Schur vectors of A. In §1. in § 1. whose union is A(A). the relation to the QR algorithm.382 CHAPTER 6. Moreover. First. Beginning with an nxm matrix UQ. However. Chapter 4] where LI is of order I. We could. Convergence Let UQ be an nxm matrix. since we are working with orthonormal vectors. The eigenvalues of these matrices.1 we will develop the basic theory of the algorithm. However. Finally. We will treat each of these in turn.7). The second problem is how to extract approximations to eigenvectors from the space. This cure leads to an interesting and useful connection between subspace iteration and the QR algorithm. the theory of subspace iteration divides into three parts: convergences of the subpaces. In §1. one considers the powers AkUo. If this ratio is less than |A 2 /Ai |. the effect is to speed up the convergence. Subspace iteration is a generalization of this procedure. the columns of AkUo will tend toward linear dependence. We wish to show that as k increases the space K(AkUo) contains increasingly accurate approximation to the dominant eigenspaces of A and also to determine the rates of convergence. We will suppose that A has the spectral representation [see (1. and L$ is of order n—m.1. L^ is of order m—l. there are two problems that must be addressed before we can derive an algorithm based on the above observations. 1. THEORY As suggested in the introduction to this section.2 we will show how to implement subspace interation. of course. are assumed to be ordered so that . Hence we can cure the problem of dependency by maintaining an orthonormal basis for 7£( AkUo).

L^ is oforder m-1.SEC. Then If we partition Vk = (V^ V2(A:)). Chapter 1. To make this more precise. 1. by Theorem 2. then By construction K(V} ') C K(AkUQ).1. and their eigenvalues are ordered so that Vol II: Eigensystems . Let the matrix A of order n have the spectral representation where LI is of order I. Multiplying (1.9. we have that for any e > 0 Thus we see that the column space of AkUo contains an approximation to the eigenspace IZ(Xi) that converges essentially as |A m +i/A/| fe . so that the column space of AkUo will approach the eigenspace TZ(Xi).1).2) suggests that the first term in the expression will dominate. in which we recapitulate a bit. Theorem 1. we get The bottommost expression in this equation is analogous to (1. L% is of order n-m. SUBSPACE ITERATION 383 where d = Y^UQ. Moreover.3) by Ah and recalling that AXi = XiLi. We summarize these results in the following theorem. (1. assume that C\2 = (Cj1 C^)H is nonsingular and let where D\ has t columns. Moreover.

• The theorem would appear to depend on choosing m so that | Am | is strictly greater than | A m+ i |. Chapter 1.1 then applies. the columns space of AkUo will contain approximations to each of the eigenspaces corresponding to the first £ eigenvalues of A. the convergence of these approximations deteriorates as t increases. the columns will all tend to lie along the dominant eigenvector of A. since we will not know the eigenvalues of A when we choose m. Moreover. adding back the missing columns of UQ can only improve the accuracy of the approximation to 7£(XL). This observation is important in applications. Theorem 1.Let Ok be the largest canonical angle between K(AkUo) and K(Xi). for example. Consequently. we can replace it with AkUoS.j-i/A^|. • The theorem contains no hypothesis that A is nondefective. • We are free to position t at any point in the spectrum of A where there is a gap in the magnitudes of the eigenvalues. where S is chosen to restore independence. However.14). If. so that the theorem is valid for the original UQ.384 Let UQ G C n x m be given and write CHAPTER 6. If . For example. the eigenvalues A^ and A m+ i are nondefective. which is in general smaller than | A 2 /A!|.9. which Krylov sequence methods do not handle well [see the discussion following (3. we are not interested in the matrix AkUo—only the subspace spanned by it. If the matrix (C^ C^)H is nonsingular. Forif^ = 1.4). when the columns of AkUo begin to deteriorate. then Theorem 2. the magnitudes of the first ra eigenvalues of A are distinct. if |Ai| > \\2\. ALTERNATIVES where d = Y) H E/b. then the convergence depends essentially on the ratio |A m +i/Ai|. Fortunately. then for any e > 0. Chapter 4]. The QR connection One difficulty with the naive version of subspace iteration is that the columns of Ak UQ tend to become increasingly dependent. implies that we can take e = 0 in (1. This suggests that subspace iteration can deal with defectiveness. There are four comments to be made about this theorem. But if that condition fails. If. • The theorem shows that subspace iteration is potentially much faster than the power method. A natural choice is to make AkUoR orthogonal. we can reduce the size of UQ to the point where the hypothesis is satisfied (at the very worst we might end up with m = €). since the convergence ratio is essentially |Am. however.

Chapter 1].2. SUBSPACE ITERATION 385 we orthogonalize by computing a QR factorization after each multiplication by A. so that some eigenvalues have equal magnitudes.2.2. The diagonal elements of R converge to the dominant eigenvalues of A. As in Theorem 2. the convergence of the jth column of Qk is governed by the largest of the ratios \Xj/Xj_i \ and \Xj+i/Xj\ (but only the latter when j — 1). We now turn to the problem of how to extract an approximation that converges at this faster rate. the proof of Theorem 2. Partition X = (Xi X^}. Let U0eCnxm be orthonormal and let where Suppose that X 1U0 has anLU decomposition LT. Chapter 2. Theorem 1. The importance of this theorem is that the matrices Uk generated by (1. In fact. QkRk is me QR factorization of AkUo. Nonetheless. Chapter 2) that if Ak = QkRk is the QR decomposition of Ak then Qk and Rk are the accumulated Q. then there are diagonal matrices Dk with \Dk\ = I such that UkDk —» Q. An easy induction shows that if Rk = RI • • • Rk that is. where U is unit lower trapezoidal and T is upper triangular.5) converge to the Schur vectors spanning the dominant eigenspaces [see the comments at (1. where there is a gap in the magnitudes of the eigenvalues. Vol II: Eigensystems . IfAkUo has the QR factorization UkRk. This suggests that subspace iteration is in some way connected with the QR algorithm.1. and let X\ = QR be the QR factorization ofXi. we get the following algorithm. Now recall (Theorem 2. the corresponding columns of Qk will not settle down. Chapter 2.SEC.9).and R-factors from the unshifted QR algorithm. But we know that H(Qk) must contain an approximation to Xj that is converging as \Xj/Xm+i\k.6) is not satisfied. When the hypothesis (1. can be adapted to establish the following result. 1. the space spanned by the corresponding columns of Qk converge to the corresponding dominant eigenspace.

IMPLEMENTATION In this subsection we will sketch at an intermediate level what is needed to produce an implementation of subspace iteration. we can obtain the iterate following the SRR refinement by orthogonalizing VW.5) to generate approximations to eigenvectors of A by the standard Rayleigh-Ritz method. Compute the Schur decomposition B = WTWn. The reader should keep in mind that actual working code requires many ad hoc decisions. The power step must be performed at each iteration. the matrix Qk W contains Ritz bases for all the dominant eigenspaces^j. V = AQk 2. at the outer level is the performance of an SRR step. which we do not treat here. The following procedure. see the notes and references. The basic result is that if | Aj | > | \j+i \. What we really want to do is to rearrange the orthogonal matrix Qk so that its columns reflect the fast convergence of the dominant subspaces. so that the dominant eigenspace Xj is well define the approximations 1^[Qk('-. Since AQkW = VW. The matrix B is the Rayleigh quotient B = Q^AQk. 4. we would compute an orthonormal basis Wj for the corresponding invariant subspace of B and for QkWj. B = Q*V 3. But the columns of Wj are just the first j Schur vectors of B. 1. The full analysis of this method is complicated and we will not give it here. The next level is the orthogonalization of U. But since we are only concerned with the subspace spanned by U. we can use (1. we do not need to orthogonalize until there . l:j)l converge to Xj essentially as |Am/Aj|*. which is called the Schur-Rayleigh-Ritz (SRR) method. However.2. one of the strengths of subspace iteration is that it gives meaningful results when the matrix A is defective or has nearly degenerate eigenvectors. For further details.Now if we wish to use the Rayleigh-Ritz method to compute an orthonormal approximation to the dominant eigenspace Xj. accomplishes this. A general algorithm The theory of the last section shows that there are three nested operations that we must perform. ALTERNATIVES Given the sequence Qk. V = VW 5. 1. Consequently. Finally. Thus the SRR procedure gives properly converging approximations to all the dominant eigenspaces at once. where the eigenvalues of B are arranged in descending order of magnitude Compute the QR factorization V — Qk+iRk+i The motivation for this procedure is the following.386 Schur-Rayleigh-Ritz refinement CHAPTER 6. The innermost operation is a power step in which we calculate AU (for notational simplicity we drop the iteration subscript).

end while 8. Thus we may be able to perform several power steps before we need to Orthogonalize U. As usual we will test a residual. while (the columns of U are sufficiently independent) 4. end while Algorithm 1. so that where T is upper triangular with its eigenvalues arranged in descending order of magnitude. SUBSPACE ITERATION 387 This algorithm is an outline of a general subspace iteration for the first nsv dominant Schur vectors of a general matrix A. U = A*U 5.1: Subspace iteration is a danger of a nontrivial loss of orthogonality. Similarly. end while 6. Test for convergence and deflate 10. Because it is so general. 3. The input to the program is an n x m orthonormal starting matrix U. while (convergence is not expected) 3. 1. where m > nv. How do we test for convergence of Schur vectors? How do we deflate converged Schur vectors? How can we tell when to perform an SRR step? How can we tell when to Orthogonalize? We will treat each of these problems in turn. the jth column of U satisfies Vol II: Eigensystems . 4. 1. 2. the SRR step is needed only when we expect some of the columns of U to have converged. Orthogonalize the columns of U 1. In particular.1.SEC. while (the number of converged vectors is less than nsv) 2. Perform an SRR step 9. To motivate the test. These observations are summarized in Algorithm 1. the algorithm leaves several important questions unanswered: 1. suppose that U is composed of the dominant Schur vectors of A. Convergence Convergence is tested immediately after an SRR step.

This criterion is backward stable. Finally. This freezing results in significant savings in the power. ALTERNATIVES where tj is the jth column of T.8. to perform an SRR step. the magnitude of the jth eigenvalue in the Schur decomposition [cf.. Then when we multiply by A we form the matrix ({/i. In other words.. suppose the first i columns of U have converged.. and stop with the first one to fail the test. Let where T( consists of the first I columns of T. Deflation Because the columns of U converge in order. However. Because of the triangularity of TH.2. and partition U — (U\ #2). Our converge theory says that the Schur vectors corresponding to the larger eigenvalues will converge more swiftly than those corresponding to smaller eigenvalues. (2. Then by Theorem 2.g.. Chapter 4. we calculate the Rayleigh quotient Here T\\ is the triangular part of the Schur decomposition that corresponds to U\. we should test convergence of the Schur vectors in the order j = 1.. we do not multiply U\ by A. orthogonalization. Chapter 5]. and SRR steps. there is a matrix E with \\E\\2 = \\R\\z such that (A + E)U = UTi.21). and a better choice might be 7 = \TJJ\ —i. the matrix W of Schur vectors of B has the form and the new U has the form .388 CHAPTER 6. A natural choice of 7 is an estimate of some norm of A. Suppose that I vectors have converged. eM) and 7 is a scaling factor. At each orthogonalization step we orthogonalize the current U% first against U\ and then against itself. AU%)—i. then such an estimate might be too large. Specifically.e. For this reason. where U\ has i columns.e. Thus we declare that the jth column of U has converged if where tol is a convergence criterion (e. deflation simply amounts to freezing the converged columns of U.. the first I columns of U are exact Schur vectors of A + E. if A has been obtained by shift-andinvert enhancement.

we wish to determine / so that Hence It follows from (1. especially in the early stages when linear convergence has not set in.8) is a crude extrapolation and must be used with some care. Chapter 4. so that we can estimate a by We can now use cr to predict when the residual norm will converge. and suppose at step j it has a residual norm pj and at step k > j. The cure is to orthogonalize the columns of A1U for some /. then we perform unnecessary work. On the other hand. To see what that risk is. If we take / too small.2.A2U. it will pay to extrapolate the average of their residual norms. if we take / too large we risk losing information about the subspace spanned by A1 U. become increasingly dependent. let X = A1U. One must also take care that the formula does not slightly underpredict /. and consider the effects on H(X) of the perturbation X + E.e. VolII: Eigensystems . Specifically. When to orthogonalize As we have mentioned. if A has nearly equimodular eigenvalues. SUBSPACE ITERATION 389 When to perform an SRR step The theory of the last section suggests that the residuals for the approximate Schur vectors will converge linearly to zero. Finally. if Q is an orthonormal basis for H(X + E) and Q j.AU. Let us focus on a particular Schur vector. We can use this fact to make a crude prediction of when we expect a Schur vector to converge. then pk = vk~Jpj.. For more see the notes and references.. where E is comparable to the error in rounding X on a machine whose rounding unit is eM. 1. then HQj^lh is the sine of the largest canonical angle 9 between H ( X ) and 1l(X + E).. where \\E\\% < \\X\\eM—i. is an orthonormal basis for 7£(X) 1 .. so that one SRR step detects the convergence of the entire cluster. so that an unnecessary SRR step is performed just before convergence. By Theorem 2. Precautions must be taken against it wildly overpredicting the value of /.SEC. If the residual norms are converging linearly with constant cr. the columns of the matrices U. is has a residual norm pk.7) that The formula (1.

Hence.(Tl) < K(T)1.(R). where Amax(T) and Xmin(T) denote the largest and smallest eigenvalues of T in magnitude. Thus if we decide that no deterioration greater than 10* is to be tolerated. the sensitivity of A1 U is proportional to K. The reason is that for any matrix T we have «(T) < IAmax(r)/Amin(T)|. we will have where T is the Schur form from the last SRR step. then Thus Hence and the degradation in the subspace of X is proportional to K. Thus / in (1. let CHAPTER 6. where / satisfies An implication of this analysis is that if |Ai| > |A2| —as it might be when we use shift-and-invert enhancement with a very accurate shift—then we will have to reorthogonalize at each step. To apply this result to subspace iteration. Hence if Q±_ is any orthonormal basis for 7£(X). The common sense of this is that the larger the value of |A!/A 2 |. Hence as convergence sets in.390 To find an orthonormal basis for 1Z(X + E). ALTERNATIVES be the QR factorization of X.(T) > | Ai/A2|. the more all the columns . we will almost certainly have K. we should orthogonalize after at most / steps. Then has orthonormal columns. Since £/is orthonormal.10) will be zero. note that once the process starts converging. which by hypothesis is large.

we can use an auxiliary matrix V and proceed as follows. In statement 4 of Algorithm 1. There are two economies. once it settles down it is likely to be pessimistic. we will want to work with the real Schur form of the Rayleigh quotient (Theorem 3. we must continue to work with the full T. however.SEC. whether or not its eigenvector has been deflated.12) obviously uses less storage than (1. we can obtain some obvious economies. SYMMETRIC MATRICES When the matrix A is symmetric. so that there is no need of a postprocessing step to compute the eigenvectors. 1. which has been treated in §2. Moreover. since we have used several norm inequalities in deriving it. Chapter 2).2. This can be done in two ways. Economization of storage In economizing storage it is important to appreciate the tradeoffs. the Rayleigh quotient is symmetric. that are less obvious. we can introduce an auxiliary vector v of order n and write Algorithm (1. which we must eliminate by orthogonalization. AI still makes its contribution to nonorthogonality. indicating that the product A U is to overwrite the contents of U.1 we wrote U = AU. The tradeoff is that we must perform m separate multiplications by the matrix Vol II: Eigensystems . For example. However. It should be stressed that after deflation.11) — significantly less if n is large. First we can make do with a single array U instead of the two arrays required by the nonsymmetric algorithm.1. More important.10) with care. Most of the above discussion goes through mutatis mutandis. For example. The only problem is the ordering of the blocks of the real Schur form. in practice the technique has shown itself to be effective. Once again we must treat the crude extrapolation (1. SUBSPACE ITERATION 391 of AU will be contaminated by components of a?i. First. Real matrices If A is real. The reason is that in the powering step. not the trailing principal submatrix of T associated with the remaining subspaces. the Schur vectors are now eigenvectors. Chapter 5. so that we can use the more efficient symmetric eigenvalue routines to perform the SRR step. Alternatively. we can speed up convergence by replacing the power step with a Chebyshev iteration.3. l.

Thus there is a true choice in the non-Hermitian case. we have effectively scaled A by s. In the algorithm (1. but we will need few orthogonalization steps. but we will perform many orthogonalization steps. we have Thus Pk(A)U will contain approximations to the dominant eigenvector that converge at least as fast as e~k cosh ' Al I converges to zero. then convergence will be slow. A natural choice is the scaled Chebyshev polynomial where |A m+ i| < s < |A m |. where pk is a polynomial of degree k. we can use the power steps to accelerate convergence. on the other hand. Conversely. which brings the tradeoffs full circle. Chebyshev acceleration We have noted that if the eigenvalues of A are well separated in magnitude. In this case. if the eigenvalues of A are clustered in magnitude.12) that price must be paid ra times. ALTERNATIVES A. chosen to enhance the gap between | AI | and | Am+11.11). and a price must be paid for accessing that structure. then so is B. Chapter 4.392 CHAPTER 6. This can be done without an auxiliary array as follows. both algorithms have the option of economizing storage. it will generally be represented by some data structure. In the algorithm (1. we need only access A once to compute the entire product A U—but. at the cost of twice the storage. The analogous result is true for the other eigenvalues. of course. Specifically. If the matrix A is sparse. So far as computing the product AU is concerned. How that choice will be made must depend on the representation of A and the properties of the machine in question. so that we may assume that |A m +i| = 1 and hence Ic^A^-i-i)! = 1. To see why. and we need only compute its upper half. For the purposes of argument. instead of computing AkU we will compute pk(A)U. But if A is Hermitian. note that when we evaluate*^ (A/s). however. we have no choice when A is non-Hermitian: we must use an extra array V.8). assume that we have chosen s = |Am+1|. But in calculating the Rayleigh quotient B = U^AU. From (3. then convergence will be fast. .

In implementing Chebyshev acceleration there are three points to consider. On the other hand. Note that we have used two extra n x m arrays — U and V—to implement the recurrence. VolIP. The choice has the drawback that as the iteration converges. Chapter 1) we know that) A m+1 | < |i/TO| < | A m |. How do we compute Ck(A/s)Ul 3. If we generate the final U column by column. How does the acceleration affect our procedure for predicting when to perform orthogonalization steps? Let V{ (i — 1.. so that we do not get the best convergence ratio. we can use the three-term recurrence (see item 1 in Theorem 3. we can make do with two extra n-vectors.6. the tradeoff is that we must call the routine to multiply A into an array more often. suppose that AI = 1..SEC. Eigensystems . so that ordinarily we could expect a convergence ratio of 1/1. For example. if we use Chebyshev acceleration. As above.01. Since by the interlacing theorem (Theorem 3. we get a convergence ratio of 0. the convergence ratio 393 can be much smaller than the usual ratio | AI | k obtained by powering.01 = 0. so that this drawback is of minimal importance.. z/m —> A m . . But in subspace iteration we generally choose m somewhat greater than the number of eigenpairs we desire. SUBSPACE ITERATION When | AI | is near one. 1. m) be the Ritz values computed in the SRR step arranged in descending order of magnitude. To evaluate Ck(A/s)U.99. What value should we use for the scaling factor 3? 2. it is natural to take 5 = \vm\ for the scaling factor.87—a great improvement. Chapter 4) The following algorithm implements this recurrence. 1.7.

394 CHAPTER 6. [16. so that where by (3. 1965] introduced the orthogonal variant. NOTES AND REFERENCES Subspace iteration Subspace iteration originated in Bauer's Treppeniteration. Stewart [249.1969]. The latter paper also introduces the technique of extrapolating residual norms to predict convergence. The idea of predicting when to orthogonalize is due to Rutishauser [228] for the symmetric case and was generalized to the nonsymmetric case by Stewart [257]. Thus we can avoid the invocation of a condition estimator and compute ft(T) as the ratio of vi/vm of the Ritz values. First. For symmetric matrices Jennings [130.Stewart [257.1955] as the orthogonal version presented here does to the QR algorithm. it is necessary to prove that the SRR procedure retrieves them. and Clint and Jennings [41. Chapter 4.1969].13) by 1. 1976] introduced and analyzed the Schur-Rayleigh-Ritz method for approximating Schur vectors. Second.1967] introduced an approximate Rayleigh-Ritz step using first-order approximations to the primitive Ritz vectors. Although the hypothesis |A m | > |A m+ i| is not necessary for the spaces T^(Uk) to contain good approximations to the Schur vectors.9). we must modify the formula for the interval between orthogonalizations in two ways. the Schur T of the Rayleigh quotient is diagonal. A counterexample is given in [257]. ALTERNATIVES Turning now to orthogonalization. In The Algebraic Eigenvalue Problem Wilkinson [300. since A is symmetric. where it is argued that the phenomenon is unlikely to be of importance in practice. Hence we must replace the formula (1. For nonsymmetric matrices. we will be performing Chebyshev iteration.1957].1970]. Clint and Jennings [42. if / is greater than one. which bears the same relation to Rutishauser'sLR algorithm [224. The full Rayleigh-Ritz acceleration was introduced independently by Rutishauser [227.4. .1971] compute Ritz vectors to get good approximations to eigenvectors from the Uk.

2. suppose we wish to compute ^/a for some positive a. The Newton correction is On the other hand if we write where d is presumed small. s] rather than on [-s. and drop terms in d2. This amounts to solving the equation f ( x ) = x2 — a = 0.271] is designed for real nonsymmetric matrices and uses Rayleigh-Ritz refinement to compute eigenvectors. NEWTON-BASED METHODS Chebyshev acceleration 395 The use of Chebyshev polynomials to accelerate subspace iteration for symmetric matrices is due to Rutishauser [228]. 2. Chebyshev acceleration can also be performed on nonsymmetric matrices and can be effective when eigenpairs with the largest real parts are desired. expand the square. When A is positive definite it is more natural to base the Chebyshev polynomials on the interval [0. The program LOPS I of Stewart and Jennings [270. For details and further references see [68]. as is done in the text. All these references contain detailed discussions of implementation issues and repay close study. we get the approximate linear equation from which we find that VolII: Eigensy stems . Software Rutishauser [228] published a program for the symmetric eigenproblem.SEC. The program SRRIT of Bai and Stewart [12] uses SRR steps to compute Schur vectors. Duff and Scott [68] describe code in the Harwell Library that includes Chebyshev acceleration for the symmetric and nonsymmetric cases. s]. For example. NEWTON-BASED METHODS One of the most widely used methods for improving an approximation to a root of the equation f ( x ) = 0 is the Newton-Raphson formula It is a useful fact that specific instances of this formula can be derived without differentiation by using first-order perturbation expansions.

Later we will consider the consequences of choosing w — z. For example.396 CHAPTER 6.. it becomes the Jacobi-Davidson method. we add the condition where w is a suitably chosen vector. A general approximate Newton method Suppose we are given an approximate eigenpair (/i. this equation is underdetermined: there are only n relations to determine the n components of v and the scalar 77. it also yields some new methods and suggests ways of implementing them for large sparse eigenproblems. APPROXIMATE NEWTON METHODS This section concerns the derivation of Newton-based methods and the analysis of their orthogonal variants. where E is small. z + v) be an exact eigenpair of A. For reasons that will become clear later. we will assume that we have an approximation to A.1) to the true correction d are identical. Then If we discard products of small terms. When combined with the Rayleigh-Ritz method. However.?+v) is an improved approximation. We will begin with the general method and then specialize. In this section we are going to explore the consequences of applying this approach to the eigenvalue problem. It turns out that it yields some old friends like the Rayleighquotient method and the QR algorithm. we obtain the following approximate equation in 77 and v: We therefore define approximate corrections to be the solutions of the equation Unfortunately. we could choose w — e^. To determine the system completely. In §2. For purposes of derivation. . z) of A and wish to determine r\ and v so that (//+77.1. 2. in which case the fcth component of z + v remains unchanged. let (// + r).2 we will treat the Jacobi-Davidson algorithm. ALTERNATIVES Thus the Newton correction d and the approximation (2. In the next section we will introduce approximate Newton methods and a class of variants that we will call orthogonal correction methods.

involving only v. Let where r j_ is any vector that is orthogonal to r for which r^z ^ 0. If A — A. P^-z = 0 and P/r = r. In particular.4). P^~ is an oblique projector onto the orthogonal complement of r j_ along z.SEC. Vol II: Eigensy stems . It follows that nOW LET be the projector onto the orthogonal complement of w along u. To derive it. It actually characterizes the correction vector v.4) the approximate Newton correction formula. that is of theoretical and practical use.4) in the form The first order of business is to get rid of 77. NEWTON-BASED METHODS Now in matrix form our equations become 397 By multiplying the first row of this relation by WH( A — /z/) the second row. In the language of projectors. we obtain the triangular system l and subtracting it from Hence We will call (2. write the first equation in (2. as the following theorem shows. we will simply call it the Newton correction formula. The correction equation There is an alternative to the correction formula (2. 2.5) the approximate Newton correction equation. Then follows that We will call (2.

5). where y is an arbitrary vector. In other words. For if v J. which exhibits the correction explicitly. In the above notation. something that is easily done. We will return to this point later. if we use an iterative method based on Krylov sequences to solve the correction equation.398 CHAPTER 6. . Orthogonal corrections The presence of oblique projectors in the correction equation is disturbing. On the other hand. However. is does not seem to offer much computational advantage over the correction formula. ALTERNATIVES Theorem 2. Then v satisfies (2.1. the correction formula is cast in terms of inverses. and their implementation requires a factorization of the matrix A — pi. which may be too expensive to compute for large sparse problems. and large projections can cause numerical cancellation in expressions that contain them.pi be nonsingular.4) if and only ifv satisfies (2. let A .5). Suppose now that v satisfies (2. we can replace the projections P^.5).5). We first observe that it is natural to take w = z—that is. z and v is small compared with z. The correction equation says that the correction v to z satisfies a simple equatior involving the projected operator Although the equation is pretty. if we restrict p appropriately. we need only compute matrix-vector products of the form A± y. v is close to the smallest possible correction. then a simple picture shows that the minimum distance between z and span(£ + v) is approximately \\v\\z.and P^ by a single orthogonal projection. Then Thus v satisfies (2. We have already shown that if v satisfies (2.4) then v satisfies (2. However. to require the correction v to be orthogonal to the approximate eigenvector z. Proof. Oblique projectors can be arbitrarily large.

dropping the approximate when A = A. • We call the vector v of Theorem 2. Now by Theorem 1. NEWTON-BASED METHODS If we take w = z.2. and correction equation. 2. we will usually drop the qualifier "orthogonal" and speak of the approximate Newton correction. Theorem 2. Vol II: Eigensystems . Let (^. and (2.2 the orthogonal (approximate Newton) correction. if we choose fj.6) and (2. to be the Rayleigh quotient then r is indeed orthogonal to z. in this case the correction equation becomes and this equation is consistent only if r J_ z.SEC. Thus we have established the following result.4. But since we will be concerned exclusively with orthogonal corrections in the sequel.7) the orthogonal correction formula and the orthogonal correction equation. v is the unique solution of the correction equation Three comments on this theorem. then we may take 399 which is an orthogonal projection. z) be an approximate normalized eigenpair of A where and lei Let and assume that A is nonsingular. Then the Newton-based correction v that is orthogonal to z is given by the correction formula Equivalently.to be the same. However. Chapter 2. We should also like P/. correction formula.

However. The distinction.I is projected onto the orthogonal complement of z in the correction equation. The correction equation can now be written in the form .7) is a projection of A suggests that for the purposes of analysis we should rotate the problem into a coordinate system in which the projector projects along the coordinate axes. In the Z coordinate system. even when A = A.fj. Specifically. however.400 CHAPTER 6. But equally important is that the operator A . • We have used the term "orthogonal" it more than one sense. Analysis The fact that the operator in the projection equation (2. let Z = (z Z_\_) be a unitary matrix and partition We will also write where [see (2. The obvious sense is that we require v to be orthogonal to z.2)]. we have the following correspondences: in which p is to be determined. The reason is that we use the normalized vector (z + v)/\\z + v\\2 and its Rayleigh quotient in place of z + v and /i + 77. the iterated method is not strictly speaking Newton's method. is in the higher-order terms. ALTERNATIVES • We have used the expression "approximate Newton method" rather than simply "Newton method" because both the correction formula and the correction equation contain the approximation A.

There are three ways to simplify the analysis. since v _L z. the problem is trivial in the Z coordinate system. Third. First. The remaining component is By the correction formula (2. Vol II: Eigensystems . NEWTON-BASED METHODS Hence we have 401 This expression corresponds to the correction formula.Consequently. f has the smallest possible residual. We now turn to the iterative application of the approximate Newton method. and our only problem is to find a reasonably small residual. as pointed out in the last paragraph. although we should. if v is small—which we may assume it to be in a convergence analysis — z is essentially normalized. we may choose v at our convenience. we have the following theorem. By definition it uses the Rayleigh quotient ft corresponding to z. it is not necessary to compute f. For in this case the residual for the corrected vector is where. Chapter 2. and hence by Theorem 1. A natural quantity to measure the quality of the current approximation z is the residual norm \\r\\2.SEC. in principle.8) the first two terms in this expression vanish. work with the normalized vector .Thus our problem is to develop a bound for the residual norm corresponding to the corrected vector z = z + v in terms of the original residual norm ||r||2. Thus any residual norm will serve to bound ||r||2. The easiest choice is v — fi -f /iHp. Second. Thus in the Z coordinate system the correction equation and the correction formula are so simply related that there is no need to prove a theorem to establish their equivalence.\\v\\2. we have \\z\\2 = \\z\\2 + \\v\\2 — 1 -f. Hence Inserting iteration subscripts.4. which makes the first component of the residual equal to zero. 2.§/p||2.

so that by Theorem 2. hk = <%. Since (j. the idea is to obtain uniform upper bounds a on (7k and e on e^ (the quantities r/k are uniformly bounded by ||A||2). z^ converge to an eigenpair (A. If then for all sufficiently small H^H. In other words. the bound ae < I is essentially What this says is that if we are going to use an approximation A instead of A in our iteration. convergence to a poorly separated eigenvalue requires a correspondingly accurate approximation. the quantity a^k + &kilk\\Tk\\2) & uniformly bounded below one.3. When A is Hermitian.. Chapter 4. In particular. .11. the error in the approximation must be reasonably less than the separation of A from its neighbors in the sense of the function sep. Let Tk = Az/j . if ||ro||2 is sufficiently small.10. ALTERNATIVES Theorem 2. When € = 0. zjt) (k — 0. the convergence is quadratic.t II2 converge monotonically to zero. M. that Hence for £ not too large. We will not present the proof here.k and Mk are converging to A and M. then the residual norms ||f.402 CHAPTER 6. Chapter 4. If e > 0. then the convergence of the method is linear with convergence ratios not greater than ae.) be a sequence of normalized approximate eigenpairs produced by the orthogonal approximate Newton method applied to the matrix A = A + E. It can then be shown that Hk and Mjt (suitably defined) approach fixed limits A.. the eigenpairs (nk. Let (^. z) of A.Further let Z = (zk Z^) be unitary. so that rjk = \\Tk\\2. and define and Then Local convergence The recurrence (2. we have by Theorem 2..9) can be made the basis of a local convergence proof for the orthogonal Newton-based method.Hence the convergence is cubic. However.HkZk.1.

If we lack a natural approximation. NEWTON-BASED METHODS 403 Three approximations In the approximate Newton method.7). As with eigenvalue problems. we solve equations involving the matrix A — fikl. In the pure Newton method. where D is the diagonal of A and E = D — A. This is no coincidence. In this case.rl) l\\2. for the local convergence of the QR algorithm. It turns out the analysis given above covers this case. It is natural to ask what effect a residual-based convergence criterion has on the convergence of the approximate Newton method. In this case the approximate Newton's method consists of inner iterations to solve the correction equations and outer iterations consisting of Newton steps. Such approximations arise when A can be split into a matrix A that is easy to invert (in the sense of being able to solve systems involving A — [ikI) and a small remainder E./^/ = A .SEC. the method will converge rapidly. This is an example of an approximation that changes from iteration to iteration. Here we give three situations in which it may be desirable to take A ^ A. (Note that r is sometimes illogically called a target in the literature. Moreover. Since //^ changes with each iteration. our iteration bound (2. But this is not always possible. • Inexact solutions. so that the method converges quadratically. Vol II: Eigensystems . This diagonal approximation is associated with the name of Jacobi. where r is the center of a focal region containing the eigenvalues of interest. • Constant shift approximations. 2. Chapter 2. this may require a matrix factorization at each step. This variant of the approximate Newton method is closely related to the RQ variant of the QR algorithm. if A is diagonally dominant with small off-diagonal elements. it is impossible to solve the equation A — Hkl directly. For example. Ideally.rl. we might split A = D — E. See the notes and references.9) simplifies to where in the above notation a^ — ||(M* . we generally terminate an inner iteration when the residual is sufficiently small. This bound is isomorphic to the bound (2.12).) If we write A = A + Ek. we would take A = A. In this case the approximate Newton method can be implemented solving only diagonal systems. See the notes and references. then we must use an iterative method to solve the correction equation (2. • Natural approximations. where Ek = (fJtk ~ r K> then A . if E is small enough. In many instances it is sufficient to replace A — \i^I by (A — r/). In some applications. the rate of convergence depends on the accuracy of the approximation A = A + E.

ALTERNATIVES such that Now since (/ . and its presence in a work whose concern is mostly the tried and true is something of an anomaly. Chapter 2. it is reasonable to assume that the iterative method in question produces an approximate solution v that satisfies (/ — zz^)v = v. We will begin with a general formulation of the Jacobi-Davidson algorithm and then introduce the refinements necessary to implement it. .9) can guide our choice of a convergence tolerance for the inner iteration. The method is new and less than well understood. THE JACOBI-DAVIDSON METHOD The Jacobi-Davidson method is a combination of the approximate Newton iteration and Rayleigh-Ritz approximation. Nontheless.zzH)r = r. This implies that we may move E inside the brackets in (2. it produces information that later speeds convergence to other eigenpairs in the focal area. The basic Jacobi-Davidson method The approximate Newton method is suitable for improving a known approximation to an eigenpair. it may converge to eigenpairs whose eigenvalues lie entirely outside the region we are interested in. as we shall see.3. and it is technically interesting— and that is reason enough to include it. it can be combined with an effective deflation technique that prevents it from reconverging to pairs that have already been found. it has two drawbacks.2.10) that (/ . it is promising. Moreover.This fact along with the iteration bound (2. 1. it followsfrom(2.404 From Theorem 1. Moreover.ZZE)S = s. so that Thus the effect of terminating the inner iteration with a residual 5 is the same as performing the outer iteration with an approximation in which ||£||2 = ||s||2. However. it sometimes happens that while the method is converging to one eigenpair. there is a matrix CHAPTER 6. It has the advantage that the Rayleigh-Ritz procedure can focus the Newton iteration on a region containing the eigenvalues we are interested in. Moreover. 2. there is no real control over what eigenvalue the method will converge to.11). If we do not start the method close enough to an eigenpair. it has been used successfully. In particular.

VQ. 2. Solve the system (/.VQ.. The algorithm contains three shift parameters: r. •> vk and use a Rayleigh-Ritz procedure to generate 2^+1. T is the center of a focal region in which we wish to find eigenpairs. the correction vector v^ is applied once to the current approximate z^ and discarded—much as past iterates in the power method are thrown away.z 5. this algorithm attempts to compute an eigenpair whose eigenvalue is near a focal point r. 4. It consists of a basic Rayleigh-Ritz step to generate a starting vector for an approximate Newton step. In particular (up to normalization). if (r is sufficiently small) return (^. Algorithm 2. Moreover. 1. because we can choose from among k+1 Ritz pairs. z)'. 11. we mean that the method attempts to do as well as possible at each step without any attempt at global optimization.. By greedy.. 10.SEC. Specifically. 2. 3. so that we can mitigate the effects of wild jumps in the Newton iterates. the new approximation will generally be better than (2. fi Choose a shift K near r or fj. Although the Rayleigh-Ritz method is not infallible...kmax Using V" compute a normalized Ritz pair (/^. This suggests that we save the vectors ZQ. 9. 6.1 implements the basic Jacobi-Davidson algorithm.ZZH)V = -r. The matrix A is a suitable approximation of A. .12). As stated in the prologue. The quantity // is the Rayliegh quotient corresponding to z. It is greedy. 8. we are free to choose one near a focal point. If it is expensive to vary K in solving Vol II: Eigensystems .. z] such that /z is near r r — Az — fj.. . and K. Vk. V =v for & = !. v J_ z Orthonormalize w with respect to V V = (V v) end for k Signal nonconvergence Algorithm 2.1: The basic Jacobi-Davidson algorithm o 2. 1.. and it is possible that another vector in this space will provide a better approximate eigenvector. . NEWTON-BASED METHODS 405 Given a normalized starting vector z. Yet this is only one vector in the correction space spanned by z0.K!)(! . //. The correction is then folded into the correction subspace and the process repeated.zzu)(A.

. in Case 2. To slow convergence so that the process has time to evolve.406 CHAPTER 6. The following example shows that the Rayleigh-Ritz refinement can improve the convergence of the approximate Newton method.952...95".. which is atypical.. where e is a vector of random normal deviates.4. The convergence ratios for the first case decrease from about 0.24 to 0.1. while for the second case they increase from a value of 0.1: Convergence of the Jacobi-Davidson method o the correction equation.) . we begin with a diagonal matrix A with eigenvalues 1. Two cases were considered: Case 1 in which the Rayleigh-Ritz step was included and Case 2 in which it was not.1.95.. (All this excludes the first step. Chapter4. then by the analysis of the last section the choice K = fj.. The Jacobi-Davidson algorithm was used to approximate the eigenvector ei. will give quadratic convergence—provided something does not go wrong with the Rayleigh-Ritz procedure. The first column shows the convergence of the residual norm in Case 1. As in Example 3. the second. However..041. Example 2.67.953. if it is inexpensive to vary K and A — A. it will generally be chosen equal to r.28 to 0. It is seen that the JacobiDavidson method is converging faster than the ordinary approximate Newton method.1 exhibits the results. A starting vector u was obtained by normalizing a vector of the form ei + 0.05e. ALTERNATIVES Figure 2. K was set to // + 0. The table in Figure 2.

It follows that Vol II: Eigensystems .. Algorithm 2...1 is designed to find a single eigenpair... and write Now suppose (/i. We begin by observing that we can extend (2. z] is a normalized eigenpair of B.. Let us begin by assuming that we have an orthonormal matrix U of Schur vectors of A. We wish to extend this partial Schur decomposition to a partial Schur decomposition of order one greater. An algorithm to do this would allow us to build up ever larger partial Schur decompositions a step at a time. that is. If we have found a set ui. we must restrict the iteration to proceed outside the space spanned by the previous eigenvectors. U satisfies the relation for some upper triangular matrix T = U^AU.13) by mimicking the reduction to Schur form. we know that the other Schur vectors lie in the orthogonal complement of span(wi. and let (z Z±) be unitary. Extending Schur decompositions We now turn to implementation issues. up}.SEC. Specifically. They track the residual norms rather closely and show that it is the Rayleigh-Ritz approximation that is driving the convergence. When more than one eigenpair is to be computed.If we decide at the last iteration to declare the iteration has produced a satisfactory approximation to ei and start looking for approximations to e2. Thus..e. we cannot say just how the iteration should be restricted without knowing the unknown eigenvectors. up of Schur vectors. . the restriction can be cast in terms of orthogonal projections involving known vectors. i. 2. The example actually shows more. . let (U t/j_) be unitary. Unfortunately. The first issue is the matter of deflation. Then is block upper triangular. The last column exhibits the tangents of the angles between the correction subspace and the subdominant eigenvector 62. A cure for this problem is to work with Schur vectors. we will begin with a very accurate starting vector. we must take care that the iteration does not reconverge to a vector that has already been found. NEWTON-BASED METHODS 407 The third column shows the tangents of the angles between the correction space and the eigenvector ei.

We begin by orthogonalizing v with respect to V and incorporating it into V. z). we will use the Jacobi-Davidson procedure. Note that a knowledge of the eigenvector y is all that is necessary to extend the partial Schur decomposition. Then y = U±zt where z — U±y. it follows on multiplying by U± that Bz = fj. However. CHAPTER 6. We now compute the residual . ALTERNATIVES is a partial Schur decomposition of A of order one higher than (2. Let be the orthogonal projector onto the column space of U±.z..14) that the extended partial Schur decomposition is where To compute a vector y satisfying (2. an orthonormal matrix V of orthogonalized corrections. suppose y satisfies (2. Then it is easily verified that Now suppose that (p. We next compute the Rayleigh quotient B = VTA±V and a Ritz pair (/i. Then (ji. the Rayleigh quotient can also be computed from the original matrix A in the form VTAV. Note that because V is orthogonal to H(U). In other words there is a one-one correspondence to eigenvectors of A± lying in 7£( t/j_) and eigenvectors ofB. z) is an eigenpair of B. Since py = A±y = U±Bz. the following observation allows us to recast the algorithm in terms of the original matrix A. and let y = U±.15).z. and a new correction vector v. This typically dense matrix process is not much use in the large sparse problems that are our present concern. The ingredients for a step of the procedure are the matrix U of Schur vectors.13).15). For it follows from (2.408 Hence. y) satisfies Conversely. We will assume that V and v are orthogonal to 7£(J7).

we solve the correction equation where v _L U. • Again excluding operations with A. To assess the size of s. if we multiply s by Pj_ = U±U±. On the other hand. we get Hence Ps = 0. First.17)—that is. What we really want is to get a satisfactorily small residual when we substitute z for y in (2. • As V grows. • We do not compute the residual r directly. to get another correction to add to V.\iz and orthogonalize it against U. There is a subtle point here. Finally. z) can be accepted as our eigenpair. There are several comments to be made.16) and (2. However.SEC. Thus v should be orthogonalized against both U and V—with reorthogonalization if necessary. multiplying s by UH. Rather we compute Az . it is not necessary to recompute the entire Rayleigh quotient C — only its southeast boundary. z. This is done in statement 5. and how to restart when the arrays V and W are about to overflow storage. NEWTON-BASED METHODS 409 and test to see if it is small enough that (//. the focal point r (which may vary). we can expect the operation count to be a multiple of n(m—£)2 flam. we compute the size of its components in U and U±. the results will no longer be orthogonal to U to working accuracy. the storage requirements are up to 2nm floating-point numbers. • The routine EPSD (Extend Partial Schur Decomposition) is designed to be used by a higher-level program that decides such things as the number of Schur vectors desired. Thus if U is not too large and the algorithm returns at k = m < m. we get It follows to that we may use r in a convergence test. Vol II: Eigensystems . the mainfloating-pointoperations come from matrix-vector products such as Vp. since by definition U^v = 0. • Excluding operations with A (e. 2. if there is cancellation when v is orthogonalized against V.g. we want to be sufficiently small. Algorithm 2.2 implements this scheme.. • In principle. we do not have to orthogonalize v against U in statement 3. in statements 4 and 15).

. an additional correction vector v. EPSD returns the next correction vector v.410 CHAPTER 6. Pj_ = / . EPSD extends it to a partial Schur decomposition of the form The inputs. and AX = PL APj. ALTERNATIVES Given a partial Schur decomposition AU = t/T. If convergence does not occur before that happens. in addition to /7.2: Extending a partial Schur decomposition o . W = AV.C/f/ H . an nx (•£—!) matrix of orthonormalized correction vectors. Algorithm 2. C — WHV\ a focal point r. and an approximation A = A + E. are V. It is assumed that the arrays V and W cannot be expanded to have more than m columns. In the following code.

then Vol II: Eigensystems . NEWTON-BASED METHODS 411 • As mentioned above. we noted that an interior eigenvalue could be approximated by a Rayleigh quotient of a vector that has nothing to do with the eigenvalue in question. the Rayleigh-Ritz method may fail. the Jacobi-Davidson algorithm builds up a basis V from which eigenvectors can be extracted by the Rayleigh-Ritz method. V will become so large that it exceeds the storage capacity of the machine in question. for example. and a radius />. Recall (Definition 4. Here we use the notation of Algorithm 2. Harmonic Ritz vectors In §4.4. 2. Ultimately. It is possible. We can. a focal point r.11. detect these imposters by the fact that they have large residuals. we replace V with a basis for the primitive Ritz subspace corresponding to the Ritz values within p of r. But it is better to have a good vector in hand than to know that a vector is bad. it may make the correction equation harder to solve—requiring. Vp) is a harmonic Ritz pair with orthonormal basis V and shift r if Moreover. If such a vector is accepted as a Ritz vector.2. At that point we must restart with a reduced basis. Chapter 4) that (r + £. information that should not be discarded in the process of restarting. but since \JL varies with fc.r!)U. a matrix factorization at each step. Fortunately we can extract that information by using a Schur-Rayleigh-Ritz process. which discourages imposters. however.SEC. Chapter 4. Restarting Like Krylov sequence methods. The basis V contains hard-won information about the vectors near the focal point. Unlike Krylov sequence methods. if WR is the QR factorization of (A . of course. The later choice will give asymptotically swifter convergence. given V". the natural choice of K is either r or /z. Otherwise put. An alternative is to use the harmonic Rayleigh-Ritz procedure. we may proceed as follows. the Jacobi-Davidson method does not have to preserve any special relation in the new basis — we are free to restart in any way we desire. to abuse that freedom. Specifically.

The principle change in the harmonic extension (Algorithm 2. it has comparable storage requirements and operation counts. Specifically. is in how the harmonic Ritz vectors are . Chapter 4. so that EPSD need only return the converged Ritz pair (statement 13). so that only the upper half of C need be retained. Having computed a primitive harmonic Ritz pair (p. For example. In §4.18). we have the following algorithm. Hermitian matrices When A is Hermitian some obvious simplifications occur in Algorithm 2. other than the natural economies listed in the last paragraph. The algorithm for restarting the harmonic extension algorithm is more complicated than Algorithm 2.3). we cannot use the harmonic eigenvalue in computing the residual. A little algebra then shows that Algorithm 2. the Rayleigh quotient C is Hermitian.2. Similarly economies can be achieved in the restarting procedure (2. Although it is more complicated than Algorithm 2.2 owing to the need to keep the columns of W orthogonal. The Schur form is diagonal.3 implements the harmonic version of partial Schur extension. Note that the matrix B — W^ U can be updated as we bring in new correction vectors v. r + 6). Instead we must use the ordinary Rayleigh quotient p^VAVp.412 CHAPTER 6. ALTERNATIVES so that we can calculate 6 and p from this simpler and numerically more stable generalized eigenproblem. 1:1] and computing its QR factorization. we showed that where z — Vv is the harmonic Ritz vector. Note the indirect way of updating the QR factorization of the restarted matrix It is faster than simply forming WRQ[:.4. since that value will not make the residual orthogonal to V.2 (EPSD).

an additional correction vector v\ the QR factorization WR of (A — r!)V. It is assumed that the arrays V and W cannot be expanded to have more than m columns. HEPSD extends it to a partial Schur decomposition of the form using harmonic Ritz vectors. The inputs. HEPSD returns the next correction vector v. 2. In the following code P.UUH. If convergence does not occur before that happens. in addition to U. are V. the current harmonic pair (#. and an approximation A = A + E. R). a focal point r. NEWTON-BASED METHODS 413 Given a partial Schur decomposition AU — UT. Algorithm 23: Harmonic extension of a partial Schur decomposition o Vol II: Eigensystems . an nx(l—1) matrix of orthonormalized correction vectors.SEC.L = I . and Aj_ = P^AP_L.

Since x _L Q.4.4 implements this scheme. where a is a solution of (2. it can be used whenever another solution involving M and Q is required. then x satisfies (2. Solving projected systems We now turn to the problem of solving the correction equation It will pay to consider this problem in greater generality than just the correction equation. we will assume that a solution x exists.22). Therefore.21). Algorithm 2. the nonsymmetric eigenproblems (4. Chapter 4. then the bulk of the work in the algorithm will be concentrated in the formation of Q = M~1Q. . If C has been precomputed. both for residuals for the correction equation and for restarting.414 CHAPTER 6. we will consider the general system in which we will assume M is nonsingular. we have or Hence. It is easy to verify that if x is defined by (2.23). As noted in §4. Chapter 4. a linear system from which we can compute a. To derive the solution of this system. Here are three comments. can give complex harmonic Ritz pairs. • Statement 1 anticipates the use of this algorithm to solve several systems. If it is expensive to solve systems involving the matrix M.22) and (4. We should therefore use the Hermitian eigenproblem to compute harmonic Ritz pairs.20). ALTERNATIVES computed. when A is Hermitian.

the algorithm requires that we solve systems of the form Vol II: Eigensystems .SEC. The mechanics of the algorithm argue strongly against changing the shift K frequently. we must allocate additional storage to contain Q — at least when M is nonsymmetric. then the vector x will not be large. • If Mis ill conditioned but (I-QQH)M(I-QQH) is not. it should be one that can be stably updated without pivoting—e. x is likely to have inaccuracies that will propagate to the solution x. In the first place. In this case. if n is very large. and M is nonsingulai Algorithm 2. For more see the discussion on economization of storage on page 391. But if a factorization is to be updated. if Q is augmented by a single vector z.4 to solve the correction equation. while the vector x can be very large. The correction equation: Exact solution We now return to the solution of the correction equation Since z _L U. we can set Q = (U z) and use Algorithm 2. then one can solve the system Mz z and update C in the form In both cases there is a great savings in work. the savings will be insignificant. Even when there is no cancellation.. 2. However. we can expect cancellation in statement 5. • We could go further and precompute or update a factorization of C by which we could calculate the solution of the system in statement 4. NEWTON-BASED METHODS This algorithm solves the projected equation 415 where Q is orthononnal.g.4: Solution of nroiftcted ennations Likewise. However. a QR decomposition.

Fortunately. Our comments on the accuracy of Algorithm 2. we can vary K — at least within the limits for which the preconditioner is effective. Even though such a focal point will speed up convergence to A. not the shifted matrix. and update it by adding the vector z as described above. if we can find an effective preconditioner that does not depend on K.23) is the following algorithm. it is easy to use Krylov sequence iterations with it. The system in statement 2 can be solved by Algorithm 2. The idea is to determine a matrix M — usually based on special characteristics of the problem in question—such that M~1K is in some sense near the identity matrix. however. ALTERNATIVES (A — K!)X = y.2 and 2. The subject of iterative methods is a large one—far beyond the scope of this volume. the system to be solved involves the preconditioner. In application to the Jacobi-Davidson algorithm. The use of a preconditioned solver gives us more freedom to choose a shift K.4 suggest that we should avoid ill conditioning in A — K!. where Q = (U z). Let us consider a general linear system Ku — b. All this suggests that we use the focal point r in Algorithms 2.The preconditioner M is determined from the matrix A — AV/. This means we should avoid a focal point r that accurately approximates an eigenvalue A of A. by a process called preconditioning.4. we can precompute the matrix C. In the Jacobi-Davidson method.QQH)( A — «/)(/ QQH). to describe how iterative solutions are used in the Jacobi-Davidson method. it itself must be projected. if K does not change. the matrix is (/. the matrix K is. The convergence of most Krylov sequence methods can be improved. as usual.416 CHAPTER 6. Second. First. Thus our equivalent of (2. The most successful of the various iterative methods for solving this system are based on Krylov sequences.3 as the shift K. which has to be recomputed whenever K changes. since the preconditioner must approximate the projected system. Second. we can form the matrix-vector product z = M~lKw by the following algorithm. we need only know the operations we are required to perform with the matrix of our system. The correction equation: Iterative solution An alternative to the exact solution of the correction equation is to use an iterative method to provide an approximate solution. we have more . If it is cheap to solve systems in M. But that is not the whole story. This requires a factorization of A. Since the Jacobi-Davidson algorithm already requires that we be able to form matrix-vector products in A. often dramatically. Thus. (/ — QQH)(A — K!)(! — QQ^). it could retard convergence to other eigenpairs. This means that we only need to be able to form the matrix-vector product Kw for an arbitrary vector w.

this lack of accuracy can exact a price when we come to compute other eigenpairs. though not as fast as in Example 2. The convergence is satisfactory. However.SEC.01 is introduced in the correction equation. There is a tension between the local convergence driven by the approximate Newton iteration and the broader convergence driven by the correction subspace. Example 2. the approximations to 62 in the correction space stagnate after an initial decline. 2.2: Convergence of the Jacobi-Davidson method with errors in the correction equations freedom to place K near an eigenvalue.4. there is very little hard analysis to guide us in this matter. Unfortunately. From a local point of view. However.5. since we work only with the projected operator (/ — QQ^)(A — Kl)(I — QQ^} and never have to explicitly solve systems involving A -AC/. If we decrease the error in the solution to the correction equation in the above exVol II: Eigensystems . Since we are free to stop an iterative method at any time. only a certain number of figures of v actually contribute to the improvement of z. The data is the same as in Example 2. we can choose to solve the correction equation more or less accurately. and there is no reason to solve the correction equation more accurately than to that number of figures.2 shows the progress of the iteration (for a different random starting vector).4 (which the reader should review). The table in Figure 2. The only difference is that at each stage a norm wise relative error of 0. The following example illustrates this phenomenon. NEWTON-BASED METHODS 417 Figure 2.

we must have (here we are ignoring second-order terms in the normalization of the first row and column of the Q-factor). the convergence to ei is marginally faster. For some references. The formulation in terms of derivatives is due to Simpson. He then continued by expanding q(d -f e) = r(e). suggesting that the approximations to vectors other than the current vector will stagnate at about the level of the error in the solution of the correction equations. other than the one currently being found.418 CHAPTER 6. It is ironic that although Newton cannot be credited with the formula that bears his (and Raphson's) name. Many people have applied Newton's method to the eigenvalue problem.and Q-factors. then up to second-order terms in w we must have WTZ = 0. but the convergence of approximations to 62 stagnates at 5-10~3 and 5-10~4. see [4]. — obviously a computationally expensive procedure. It then follows that .3. The scheme involved explicitly computing the polynomials p. They justify the choice w — z by observing that if we require z and z + w to be normalized. working with a polynomial equation p(x) = 0.3). etc. expanded p(x + d) = q(d) and ignored higher terms in d to get a linear equation for d. any convergence of the correction subspace should be toward eigenvectors of A = A + E. and solving for e. should stagnate at the level of \\E\\. More generally. 2. with w — z is due to Peters and Wilkinson [211]. The relation of Newton's method to the RQ variant of the QR algorithm is best seen by considering the RQ factorization of A—K! in the Z coordinate system. partitioning the R. Newton was actually interested in computing series expansions for x. ALTERNATIVES ample to 10~3 and 10~4. The formula (2.) Simpson showed how to effect the same calculations by working only with p. q. Specifically. Peters and Wilkinson [211] explore in detail the relation of Newton's method to the inverse power method. he seems to have been the originator of the idea expanding and linearizing to compute corrections to zeros of functions —just the approach we use here. The fact that asymptotically Newton's method and the Rayleigh quotient method produce essentially the same iterates can be established by a theorem of Denis and Schnabel [64] to the effect that any two superlinearly converging methods launched from the same point must up to higher-order terms produce the Newton direction. Newton. (However. NOTES AND REFERENCES Newton's method and the eigenvalue problem According to Kollerstrom [150] and Ypma [307] neither Newton nor Raphson used the calculus in the derivation of their methods. r. linearizing. so that approximations to eigenvectors of A. when the approximate Newton method is used. in which case the extra work in computing the polynomials is justified.

Dembo. and it is instructive to see how the current name came about. less favorable problems it tends to stagnate. Using first-order perturbation expansions.5: Davidson's method o so that up to second-order terms the correction from the Q-factor and the Newton correction are the same. so that the asymptotic relation between the inverse power method and Newton's method insures that Davidson's method will work well. which is sketched in Algorithm 2.5. nroiftction on to 7?. 1975]. z) approximating the fcth smallest eigenvalue of A and an augmented correction matrix R Here. Eisenstat. in which the Newton equation is solved inexactly. The Jacobi-Davidson method As it has been derived here. In 1990. To honor these distinctions we use the term "approximate" rather than "inexact" Newton method. it differs from the approximate Newton method given here in that we are working with a special constrained problem and the approximation is not to the Jacobian itself. Pi is the.SEC. For diagonally dominant matrices the approximation is quite good. 2. Inexact Newton methods In 1982. Algorithm 2. In some sense these methods are related to the approximate Newton methods introduced here. which is essentially an approximate inverse power method. since any residual in the approximate solution of the Newton equation can be thrown back onto the Jacobian and conversely any perturbation of the Jacobian in the Newton equation corresponds to an approximate solution with a nonzero residual. NEWTON-BASED METHODS 419 This algorithm takes a symmetric diagonally dominant matrix A = D + E. Olsen [186] and his coauthors observed that the lack of orthogonality of v to z was a drawback. However. The story starts with Davidson's algorithm [53. from which it computes a Ritz vector and a new correction that is folded back into the subspace. where D is the diagonal of A and an orthonormal matrix V of correction vectors and returns a Ritz pair (^u. But in other. (V"!-1-. and Steihaug [55] introduced and analyzed inexact Newton methods. they rediscovered Vol II: Eigensystems . It is very close to Algorithm 2.1 — it builds up a correction subspace. the Jacobi-Davidson method might better be called the Newton-Rayleigh-Ritz method. The chief difference is in the correction equation.

It is the key to making the method effective for large sparse matrices. There the authors accumulate R~l in the basis V.1846] for approximating eigenvectors of diagonally dominant matrices. Specifically. ALTERNATIVES the matrix formulation of Peters and Wilkinson.6). Sleijpen and van der Vorst extended this example to general matrices. he computes a refined eigenvector of the form This is easily seen to be one step of the approximate Newton method with a diagonal (Jacobi) approximation. 239. which expresses the correction in terms of a projection of the original matrix. Thus Olsen's method is Davidson's method with an approximate Newton correction. and solved it to get the correction formula (2. but with an approximation as in (2. 244]. Hermitian matrices The implementation for harmonic Ritz vectors suggested here differs somewhat from the implementation in [243]. In particular. and hence what they actually used was the approximate Newton method. Templates may be found in [241. We have preferred to leave V orthogonal and work with the harmonic Rayleigh-Ritz equation (2. In later papers [74. Jacobi partitioned a dominant symmetric matrix A in the form Assuming an approximate eigenvector in the form ei. they went on to show how to deflate the problem and to incorporate iterative solutions with preconditioning. 242. and their colleagues also adapted the algorithm to harmonic Ritz vectors and the generalized eigenvalue problem. their major contribution was the introduction of the correction equation. From the point of view taken here.19).243.420 CHAPTER 6. In 1996. 245] Sieijpen. thus deriving the orthogonal Newton step. van der Vorst. .3). It is worth noting that the author's application was so large that they could not save the corrections. Sleijpen and van der Vorst [240] derived essentially the same algorithm by considering an algorithm of Jacobi [128.

2. Here we confine ourselves to a review of some of the more important notation and facts. An n-vector is a collection of n scalars arranged in a column. This background has been presented in full detail in Volume I of this work. the set of complex mxn matrices by C m x n . The scalars that form a vector are called the components of the vector. Scalars are real or complex numbers. its components will be denoted by a corresponding lowercase Latin or Greek letter. The set of real nvectors is denoted by Rn. Vectors are denoted by lowercase Latin letters. 5. SCALARS. If a vector is denoted by a letter. its elements will be denoted by a corresponding lowercase Latin or Greek 421 . the set of complex n-vectors by C n . 4. If a matrix is denoted by a letter. VECTORS. The set of real mxn matrices is denoted by R m xn. The scalars in a matrix are called the elements of the matrix. An m x n matrix is a rectangular array of mn scalars having m rows and n columns.APPENDIX: BACKGROUND The reader of this volume is assumed to know the basics of linear algebra and analysis as well as the behavior of matrix algorithms on digital computers. AND MATRICES 1. 3. The set of real numbers will be denoted by R.I gives the Greek alphabet and its corresponding Latin letters (some of the correspondences are quite artificial). 1. Thus the vector a may be written Figure A. the set of complex numbers by C. Matrices are denoted by uppercase Latin and Greek letters. All scalars are denoted by lowercase Latin or Greek letters.

We also write 10. The conjugate of a matrix A is written A. det(A~l) = det(A)-1. 1. det(AB) = det(A)det(£). If A is nonsingular. The transpose of A is written A1. and the conjugate transpose as A^.422 APPENDIX: BACKGROUND Figure A. ^22. 1. Thus the matrix A may be written 6. It is written trace(A). then det(A) = det(4n) det(422) • • -det(Akk). The trace of a square matrix is the sum of its diagonal elements. its inverse is written A~l. 5.I: The Greek alphabet and Latin equivalents o letter. If A is block triangular with diagonal blocks AH . 8. then det(iiA) = iin(A). . If A is of order n. det(AH) = det(A). 4. Akk. 11. • • • . 3. 2. 9. The determinant of A is written det(A). It has the following useful properties. The matrix sum and matrix product are written as usual as A -f B and AB. An n x n matrix is said to be square of order n.

If ii. AND MATRICES 423 6. . 23. The absolute value of a matrix A is the matrix whose elements are the absolute value of the elements of A. f>n is written diag(<$i. Similarly for the relations A < B.. VECTORS.. 15.SEC. A matrix with orthonormal rows or columns is said to be orthonormal. Vol II: Eigensystems . Any matrix whose components are all zero is called a zero matrix and is written 0. If A and B are matrices of the same dimension. we write A < B to mean that ij < bij. 20. Postmultiplying a matrix by P rearranges its columns in the order *!>•••. .. . det( A] is the product of the eigenvalues of A.. A square matrix whose off-diagonal elements are zero is said to be diagonal. 22.in- 19. . in is a permutation of the integers 1. A real matrix U is orthogonal if U^U — I. . Premultiplying a matrix by P rearranges its rows in the order n . .. or O m X n when it is necessary to specify the dimension. an identity matrix is written simply /. It is Hermitian if AH = A. .. n. 7. . 17. 16. Any vector whose components are all zero is called a zero vector and is written 0. 1. A matrix U is unitary if UHU = /. . The matrix /„ whose diagonal elements are one and whose off-diagonal elements are zero is called the identity matrix of order n. A matrix A is symmetric if A1 = A. A > B. The matrix is called the cross operator. . The vector whose Uh component is one and and whose other components are zero is called the z'th unit vector and is written e t . When the context makes the order clear. Postmultiplying a matrix by the cross operator reverses the order of its columns. Premultiplying a matrix by the cross operator reverses the order of its rows. SCALARS. or On when it is necessary to specify the dimension. and A > B. . 18.. A diagonal matrix whose diagonal elements are ^ j . 12. It is written \A\. Sn).. a 13.. | det( A) | is the product of the singular values of A. 21. 14. The vector whose components are all one is written e. in. the matrix is called a permutation matrix.

but a zero is always 0 or 0.g. En. or matrices are denoted by calligraphic letters — e. etc. Triangular or trapezoidal matrices whose main diagonal elements are one are unit triangular or trapezoidal. 2. A rectangular matrix whose elements above the main diagonal are zero is upper trapezoidal. The rank A is 6. LINEAR ALGEBRA 1. The null space of A is N(A) — {x: Ax — 0}. A rectangular matrix whose elements below the main diagonal are zero is upper trapezoidal. vectors. 4. Letters other than X are often used tc denote such elements. then 3.424 APPENDIX: BACKGROUND 24. 25. A Wilkinson diagram is a device for illustrating the structure of a matrix.. The symbol "X" stands for a quantity that may or may not be zero (but usually is not). With the exception of the sets R. 2. If X and y are disjoint subspaces. 27. The dimension of a subspace X is written dim(X}. C. The nullity of A is .. 5. sets of scalars. The column space of a m x n matrix A is the subspace H(A) = {Ax: x 6 F71}. The subspace spanned by a set of vectors X is written span(^f). Lower and upper triangular matrices have Wilkinson diagrams of the form 26. The following is a Wilkinson diagram of a diagonal matrix of order four: Here the symbol "0" stands for a zero element. X. Let F be R or C depending on the underlying scalars.

Then A is positive definite if If x / 0 => x Ax > 0. It is positive definite if and only if X is of full column rank. An nxp matrix X is of full column rank if rank(J^) = p.1. Then X^AX is positive semidefinite. then there is a positive (semi)definite matrix A 7 such that 8. We say that X is of full rank when the context tells whether we are referring to column or row rank (e. It is of full row rank if rank(X) = n. Then A can be factored uniquely in the form where L is unit lower triangular and U is upper triangular. Let A be of order n. Any principal submatrix of a positive definite matrix is positive definite. when n > p). Let A be of order n and assume that the first n-1 leading principal submatrices of A are nonsingular. its determinant is nonnegative.3).. then A is nonsingular and A~l is positive definite. 4. We call the factorization the LU decomposition of A and call the matrices L and U the L-factor and the U-factor. In the real case. Gaussian elimination (Algorithm 1:3.SEC. its determinant is positive. POSITIVE DEFINITE MATRICES 1. i 9. 2. THE LU DECOMPOSITION 1. 5. 6. Let A be positive definite and let X G C n x p .1) computes the LU decomposition.1. 3. Vol II: Eigensystems . The diagonal elements of a positive definite matrix are positive. If A is positive definite. 3. 5. THE CHOLESKY DECOMPOSITION 425 7. If A is positive (semi)definite. 7. then A is positive semidefinite. 2. some people drop the symmetry condition and call any matrix A satisfy ing the conditions ^ 0 => xHAx > 0 a positive definite matrix. Be warned. Pivoting may be necessary for stability (see Algorithm 1:3. 4. If A is positive semidefinite. If A is positive definite.g. The matrix A 2 is unique. A is positive (semi)definite if and only if its eigenvalues are positive (nonnegative).

Any positive definite matrix can be uniquely factored in the form where R is an upper triangular matrix with positive diagonal elements. Specifically. there is a permutation matrix P such that where the elements of R satisfy Algorithm 1. 3.2. . See Algorithm 1:3.2 computes a pivoted Cholesky decomposition. This factorization is called the Cholesky decomposition of A.1. The matrix R is called the Cholesky factor of A. The matrix Q is the Q-factor of A. Let X be an nxp matrix with n > p. 3. Then there is a unitary matrix Q of order n and an upper triangular matrix R of order p with nonnegative diagonal elements such that This factorization is called the QR decomposition of A. THE CHOLESKY DECOMPOSITION 1. the QR factorization is unique. If X is of full rank. It is unique. There is a pivoted version of the decomposition. 6. 2. 4. THE QR DECOMPOSITION 1.2. If we partition Q — (Qx QL)-> where Qx has p columns.5. If X is of full rank.426 APPENDIX: BACKGROUND 5. then Qx is an orthonormal basis for the column space of X and Q± is an orthonormal basis for the orthogonal complement of the column space of X. 2. The Cholesky decomposition can be computed by variants of Gaussian elimination. then we can write which is called the QR factorization of R. and the matrix R is the R-factor.

The QR factorization can be computed by the Gram-Schmidt algorithm (Algorithms 1. 7. A = XHX is positive definite. It projects onto y± along X. If P2 = P.5.1 computes a pivoted QR decomposition. Any projector P can be written in the form The spaces X — 7Z(X) and y = TZ(Y) uniquely define P. then P is a projector (or projection).P)x <E y±. If P is a Hermitian projector. I—P is its complementary projector.4. and R is the Cholesky factor of A. 6. 3. PROJECTORS 427 5.11-13). Any vector z can be represented uniquely as the sum of a vector Px G X and a vector (/ . If P is a projector onto X along y±. where X is any orthonormal basis for 7£(P). 4. 7. there is a permutation matrix P such that XP has the QR decomposition where the elements of R satisfy Algorithm 1. which is said to project onto X along the orthogonal complement y± of y. Its complementary projector is written Pj_. The QR decomposition can be computed by orthogonal triangularization (Algorithm 1. then P = X Jf H . If X is of full rank. Specifically. There is a pivoted version of the QR decomposition.2.2).1.SEC.4. Vol II: Eigensystems . Such a projector is called an orthogonal projector. PROJECTORS 1. 2.1. 7.

This page intentionally left blank .

Demmel.1926. International Journal of High Speed Computing. Anderson. B. Dongarra. Bai and J. J.REFERENCES [1] A. 186:73-95. Cited in [29]. On swapping diagonal blocks in real Schur form.org/lapack/lawns/lawn08 . Sorensen. E. [5] W. Quarterly of Applied Mathematics. The principle of minimized iterations in the solution of the matrix eigenvalue problem. 9:17-29. J. McKenney. [4] P.1951. [11] Z. 2000. M. D. Dongarra. and L. 1:97-112. J. Greenbaum. 1995. Also available online as LAPACK Working Note 8 from http: //www. [8] Z. W. Reichel. II: The evaluation of the latent roots and latent vectors of a matrix. second edition. The solution of characteristic value-vector problems by Newton's method. Du Croz. Linear Algebra and Its Applications. Bischof. Aitken. Bai and J. Anselone and L. BIT. Bai. and L. Proceedings of the Royal Society of Edinburgh. Aitken. [7] J. S. J. 46:289-305. Ruhe. D. Demmel. 104:131-140. Studies in practical mathematics. Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Ostrouchov. Amoldi. [3] E. Note on the quadratic convergence of Kogbetliantz's algorithm for computing the singular value decomposition. [6] J. On a block implementation of Hessenberg QR iteration. Proceedings of the Royal Society of Edinburgh.1998. Rail. 57:269-304. Computation of a few small eigenvalues of a large matrix with application to liquid crystal modeling. 429 . Philadelphia. Demmel. Hammarling. SIAM. Reichel. Iterative methods for the computation of a few eigenvalues of a large symmetric matrix. Bai. Linear Algebra and Its Applications. Z. C. [10] Z. On Bernoulli's numerical solution of algebraic equations. [9] Z.ps.1988. Demmel. LAPACK Users' Guide.1968. A. Baglama. [2] A. 146:203-226.1993. Bai. Journal of Computational Physics. van der Vorst. and H. S. Philadelphia. 11:38-45. A. and D. Baglama. 1989.netlib. Calvetti.1996. 37:400-440. SIAM. A. C. C. Numerische Mathematik.1937. Calvetti. J.

Bavely and G.430 REFERENCES [12] Z. Barlow. Wilkinson. W. [20] A. Communications oj'the ACM. [23] R. W. Algorithm 432: The solution of the matrix equation AX . Martin. Numerical Methods for Least Squares Problems. [14] R. 1990. SIAM. Berry. Barth. New York. Bauer. [25] A. Stewart. Giornale di Matematiche ad Uso degli Studenti Delle Universita. Calculation of the eigenvalues of a symmetric tridiagonal matrix by the method of bisection. SIAM Journal on Applied Mathematics. W. . [18] R. Perturbation Bounds for Matrix Eigenvalues. 1997. Algorithm 776: SRRIT—A Fortran subroutine to calculate the dominant invariant subspace of a nonsymmetric matrix. 1970. Bhatia. An English translation by D. [24] R. [16] F. 14:598-816.1990. L.1997. Minneapolis.1957. Bellman. [13] J. Large scale singular value computations. Ben-Israel. [21] M. 1996. Bhatia. Zeitschriftfur angewandte Mathematik undPhysik. [15] W. Introduction to Matrix Analysis. Bartels and G. Springer. Linear Algebra and Its Applications. An algorithm for computing reducing subspaces by block diagonalization. Boley is available in Technical Report 90-37. Berman and A. Philadelphia. Das Verfahren der Treppeniteration und verwandte Verfahren zur Losung algebraischer Eigenwertprobleme. Essex. S. Sulle funzioni bilineari. [19] E. Published in the USA by John Wiley. second edition. 23:494-513. Longmann Scientific & Technical. New York. McGraw-Hill. pages 249-256]. W. Also in [305.BX = C. 1979. Stewart.1992. 1993. Krause.1971. [22] R. 142:195-209. L. Department of Computer Science.1972. 1987. and G. Numerische Mathematik. Eisner. 21:51-54. Bounds for the variation of the roots of a polynomial and the eigenvalues of a matrix. A note on pencils of Hermitian or symmetric matrices. Harlow. SIAM Journal on Matrix Analysis and Applications. [17] C. 8:214-235. ACM Transactions on Mathematical Software. H. University of Minnesota. Pitman Research Notes in Mathematics. Stewart. 16:359-367. R. Bjorck. International Journal of Supercomputer Applications. Matrix Analysis. Bai and G. SIAM Journal on Numerical Analysis. and J. L. 9:386-393. 11:98-106. 6:13-49. 8:820-826. Bhatia.1873.1967. Error analysis of update methods for the symmetric eigenvalue problem. Beltrami. H.

An efficient and stable algorithm for the symmetric-definite generalized eigenvalue problem.1982. P. maintaining well focused shifts. 1993. 31:31-48. M. 1988. 475-496]. Braman. C. [30] J. Vangegin. In Oeuvres Completes (IIe Serie). Van Dooren. [28] C. SIAM Journal on Matrix Analysis and Applications. 1999. Chan. Thirteen volumes plus index. Sur F equation a 1'aide de laquelle on determine les inegalites seculaires des mouvements des planetes. Borchart. editors.2000. and level 3 performance. Reprinted 1963 by Johnson Reprint Corporation. R. Lawrence. [39] F. VolII: Eigensystems . University of Kansas. Sorensen. Algorithm 581: An improved algorithm for computing the singular value decomposition. 52:270-300. R. Arthur Cayley and A. Department of Mathematics. Byers. 1889-1898. A reprint of Volumn 34. R. v. Technical report. volume 9.REFERENCES 431 [26] A.1980. 53:281—283. A memoir on the theory of matrices. An improved algorithm for computing the singular value decomposition. Bjorck. Cambridge University Press. New York. F. Chan. and D. Cauchy. W. Mathias. Mathematics of Computation. [34] T. 1982. 21:1202-1228.1978. 2. P. Philosophical Transactions of the Royal Society of London. H. [38] F. and R. Numerical methods for computing angles between linear subspaces. C. Chatelin. Eigenvalues of Matrices. Expanded translation of [38]. Cayley. The Collected Mathematical Papers of Arthur Cayley. Schneider.1858. Numerische Mathematik. [36] S. [29] K. 148:17-37. Plemmons. Cay ley. Chatelin. The multi-shift QR-algorithm: Aggressive deflation. New York. New York. Large-Scale Matrix Problems.1973. [37] J. 1981. Chandrasekaran.1988. [35] T. [27] A. L. Bemerkung iiber die beiden vorstehenden Aufsatze. [31] A.1857. 8:84-88. and P. Masson. J. 8:72-83. 27:579-594. Wiley. Numerische Mathematik. R. North-Holland. Cited and reprinted in [33. of Linear Algebra and Its Applications. [32] A. and H. F. ACM Transactions on Mathematical Software. ACM Transactions on Mathematical Software. Cambridge. Rank one modification of the symmetric eigenproblem. editors. Charlier. Paris. On efficient implementations of Kogbetliantz's algorithm for computing the singular value decomposition. [33] A. Bjorck and G. Forsythe. Journal fur die reine und angewandte Mathematik. Golub. pp. Nielsen. 1829. Valeurs Propres de Matrices. Bunch.

W. The nearest definite pair for the Hermitian generalized eigenvalue problem. K.1972. [46] J. The evaluation of eigenvalues and eigenvectors of real symmetric matrices by simultaneous iteration. A. Cheng and N. [50] A. 1995. Linear Algebra and Its Applications. 13:7680. R. 1983. pages 505-509. A block Lanczos algorithm for computing the Q algebraically largest eigenvalues and a corresponding eigenspace of large. Numerische Mathematik.1971. A simultaneous iteration method for the unsymmetric eigenvalue problem. Algorithm 646 PDFIND: A routine to find a positive definite linear combination of two real symmetric matrices. J. 36:177-195. and M. R.1986. H. Higham. Willoughby. Willoughby. Clint and A. 1970. ACM Transactions on Mathematical Software. Birkhauser. Clint and A. R. [45] J. NJ.1981. 4:197-215. Englewood Cliffs. A Lanczos algorithm for computing singular values and vectors of large matrices. [44] C. Computing eigenvalues of very large symmetric matrices—An implemention of a Lanczos algorithm with no reorthogonalization. 51:37-48. CA. 1999. Crawford. Cited in [25]. Donath. sparse real symmetric matrices. ILPrograms. S. [48] J. Brooks/Cole. . K. R. Prentice-Hall. Computer Journal. Journal of Computational Physics. Cullum and W.1983. Jennings. Journal of the Institute for Mathematics and Applications. [41] M.1984. Stuttgart. [43] C. [47] J.1974. Jennings. Lake. Linear Algebra and Its Applications. [52] B. Cullum and R. [51] J. Datta. Crawford and Y. A divide and conquer method for the symmetric eigenproblem. 10:118124. 8:111-121. Curtis and J. [49] J. Pacific Grove. M. Cuppen. N. On the automatic scaling of matrices for Gaussian elimination. 302-303:6376. J. AZ. 44:329-358. E. Numerical Linear Algebra and Applications. A. Cullum. Lanczos Algorithms for Large Symmetric Eigenvalue Computations Volume 1: Theory. 1971. A.Daniel. 12:278-282. SIAM Journal on Scientific and Statistical Computing. Journal of the Institutefor Mathematics and Applications. Moon. Willoughby.432 REFERENCES [40] S. In Proceedings of the 1974 IEEE Conference on Decision and Control. 1985. The Approximation Minimization of Functionals. Finding a positive definite linear combination of two Hermitian matrices. Cited in [94]. [42] M. Cullum and R. Reid. Phoenix.

Davis and W. Kagstrom. A fully parallel algorithm for the symmetric eigenvalue problem. 20:599-610. 88/89:139-186. Dennis and R. 11:873-912. [61] J.1992. motivations and theory. B.1997. Journal of Computational Physics. SIAM Journal on Scientific and Statistical Computing.1987. Kahan.1986. Veselic. Quasi-Newton methods. 1998. More. 17:87-94. 13:1204-1245. 19:400-408. [56] J. [57] J. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. A new O(n2) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. [65] I.1970. Computing the singular value decomposition with high relative accuracy. Linear Algebra and Its Applications. 299:21-80. SI AM Journal on Numerical Analysis. S. Philadelphia. Prentice-Hall. Davidson. Sorensen. Accurate singular values of bidiagonal matrices. C. Demmel and W. [64] J. Demmel. Demmel. The rotation of eigenvectors by a perturbation.REFERENCES 433 [53] E. SIAM Journal on Scientific Computing. W. and Z. 19:46-89. Eisenstat. Jacobi's method is more accurate than QR. Dennis and J. Gu. S1AM Journal on Numerical Analysis. Steihaug. Computing stable eigendecompositions of matrix pencils. Demmel and K. R.1975. 1983. Demmel. [58] J. 1997.1987. [62] J. [60] J.1997. 1983. J. I. Linear Algebra and Its Applications. [54] C. K. M. 8:sl39-sl54. W. [59] J. SIAM Journal on Numerical Analysis. University of California at Berkeley. [55] R. 18:1200-1222. SIAM. Slapnicar. [63] J. E. Englewood Cliffs. C. [67] Z. Demmel.1977. [66] J. S. Veselic. The condition number of equivalence transformations that block diagonalize matrix pencils. J. Dongarra and D. S. 7:1-46. NJ. SIAM Journal on Matrix Analysis and Applications. and T. Demmel and B.1982. SIAM Journal on Scientific and Statistical Computing. Technical Report CSD-97-971. Eisenstat. S. Computing stable eigendecompositions of matrices. Computer Science Division. Dembo. Vol II: Eigensystems . Kahan. Implementation of Jacobi rotations for accurate singular value computation in floating point arithmetic.1990. Applied Numerical Linear Algebra. PA. E. W. III. Schnabel. Inexact Newton methods. Linear Algebra and Its Applications. 79:163-193. Dhillon. Drmac. SIAM Review. The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices. Drmac.

1998. 1991.1985. Fokkema. [80] F. The spectral transformation Lanczos method for the numerical solution of large sparse generalized symmetric eigenvalue problems. 94:1-23. Fernando and B. New York. and H. Mathematics of Computation. Freund. Henrici. Transaction of the American Mathematical Society. L. An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices. A.1993. Eidgenossische Technische Hochsulle Zurich. R. editors.1980. North-Holland.1960. 1962. Linear Algebra and Its Applications. Research Report 91-06.434 REFERENCES [68] I. Chelsea Publishing Company. A. S. ACM Transactions on Mathematical Software. Computer Journal. The QR transformation. Scott. 1986. SIAM Journal on Scientific Computing. Forsythe and P. [77] V. [69] L. The cyclic Jacobi method for computing the principle values of a complex matrix. [70] L. Numerical Linear Algebra with Applications. . Nachtigal. [76] J. Jacobi-Davidson style QR and QZ algorithms for the reduction of matrix pencils. Gutknecht. G. Freund. On the variation of the spectra of matrices. Accurate singular values and differential qd algorithms. 1. [72] T. Numerische Mathematik. A generalized eigenvalue problem and the Lanczos algorithm. Eisner. G. H. [79] Roland W. Willoughby. van der Vorst. 67:191-229.1998. Computing selected eigenvalues of large sparse unsymmetric matrices using subspace iteration. Eisner. E. Amsterdam. 47': 127-138. Quasi-kernel polynomials and their use in non-Hermitian matrix iterations. A note on the normwise perturbation theory for the regular generalized eigenproblem Ax = \Bx. R. parts I and n. 4:265-271. An optimal bound for the spectral variation of two matrices. Duff and J. Fraysse and V. Large Scale Eigenvalue Problems.1982. M.1961. and N. [73] K. Toumazou. Ericsson and A.1994. [71] T. Cullum and R. 1959. Gantmacher. 20:94-125. Ruhe. Journal of Computational and Applied Mathematics. [78] R. N. Vols. The Theory of Matrices. 11:77-80. Parlett. G. V. Francis. Sleijpen. W. [75] G. 332-345. 35:1251-1268. F.11. pages 95-120. 1992. 43:135-148. 5:1-10. [74] D. Ericsson. 19:137-159. Interdisciplinary Project Center for Supercomputing. In J. Linear Algebra and Its Applications. M.

Matrix Computations. and matrix approximations. Van Loan. P. Stanford University. Technical Report 1574. and M. IEEE Transactions on Automatic Control. Overton. Vol II: Eigensystems .1979. third edition. Uber die AbgrenzungderEigenwerteeiner Matrix. [90] G. [95] G. Horowitz. H. H. and C. F. Oak Ridge National Laboratory. AC24:909-913. ACM Transactions on Mathematical Software. Golub. Matrix eigensystem routines—EISPACK guide extension. [94] G. H. T. 13:44-51. and C. H. [84] H. Cited in [94]. Baltimore. Van Loan. B. [92] G. Hessenberg-Schur method for the problem AX + XB = C. Golub and C. A procedure for the diagonalization of normal matrices. H. Mathematics and Natural Sciences. Stanford. Golub and C. Boyle. Mathematical Software HI. 6:59-96. Dongarra. F. Technical Report STAN-CS-72-270. 1977. [82] S. H. [87] G. Wilkinson. 6:749-754. Goldstine and L. Underwood. Oak Ridge. NumerischeMathematik.1970. Singular value decomposition and least squares solution. 1977. 1954. JournaloftheACM. Johns Hopkins University Press. 6:176-195. The Lanczos algorithm for the symmetric Ax = XBx problem. The Jacobi method for real symmetric matrices. Golub. singular values. TN. Rice. Underwood. second edition. Golub. 1972. and J. 1996. [86] G. Springer-Verlag. F. Johns Hopkins University Press. The block Lanczos method for computing eigenvalues. Calculating the singular values and pseudo-inverse of a matrix. Givens. Golub and R. SIAM Review. A. A block Lanczos method for computing the singular values and corresponding singular vectors of a matrix. Baltimore. H.1973. Izvestia AkademiiNauk SSSR. MD. editor. F. MD. Kahan. [93] G. Nash. H. Luk.1931. Also in [305. H. L. H.1959. [83] W. Goldstine. Academic Press. Aplicace Mathematicky. New York. S. Garbow. 2:205-224. In Lecture Notes in Computer Science. In J.1968. CA. 7:149-169. SIAM Journal on Numerical Analysis. Matrix Computations. pages 134151]. [88] G. H. New York. [91] G. Van Loan. J. Computer Science. Golub. 15:318-344. Golub. Golub and C. J. 1989. Reinsch. von Neumann. Murray. and J.1981. pages 364-377. Numerical computation of the characteristic values of a real matrix. Some modified matrix eigenvalue problems.1959. Moler. J. Golub and W. J. Least squares. H. Gerschgorin. 14:403-420. R. H. M. [85] H.1965. [89] G.REFERENCES 435 [81] B. JournaloftheACM. S.

8:741-754. [107] V. Hadamard. SIAM Journal on Matrix Analysis and Applications.1991. A shifted block Lanczos algorithm for solving sparse symmetric generalized eigenproblems. SIAM Journal on Matrix Analysis and Applications. and R. On Jacobi methods for singular value decompositions. A completed theory of the unsymmetric Lanczos process and related algorithms: Part II. SIAM Journal on Matrix Analysis and Applications.1986. Lewis. 1981. H. Gutknecht. 15:228-272. H. Heath. A divide-and-conquer algorithm for the symmetric tridiagonal eigenvalue problem. [105] M. [101] M. Technical Report LBL-35100. Gutknecht and K. D. 15:15-58. Gu and S. Berkeley. 13:594-639. Laub.1976. [108] V. On sharp quadratic convergence bounds for the serial Jacobi methods.1892. A. and H. Look-ahead procedures for Lanczos-type product methods based on three-term Lanczos recurrences. C. PhD thesis.1994. Computing the SVD of a product of two matrices.1994.1994. Eisenstat. Paige. 15:1266-1276. SIAM Journal on Scientific and Statistical Computing. [102] M. [106] M. Eisenstat.2000. University of Illinois at UrbanaChampaign. Hari and K. G. Gu and S. J. 16:11'2-191. deMath.1995. 18:578-619.436 REFERENCES [96] G. SIAM Review. Essai sur 1'etude des fonctions donnees par leur developpement de taylor.1987. Ward. H. Grcar. Ressel. Stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem. [97] J. CA. C.1995. 21:1051-1078. C. SIAM Journal on Matrix Analysis and Applications. Golub and J. SIAM Journal on Matrix Analysis and Applications. C. H. Gu and S. .1992. 16:79-92. [103] M. Finding well-conditioned similarities to block-diagonalized nonsymmetric matrices is NP-hard. H. Eisenstat. Grimes. Veselic. A divide-and-conquer algorithm for the bidiagonal SVD. Simon. Physics Division. T. [109] M. J. [100] M. 8:101-186. Analyses of the Lanczos Algorithm and of the Approximation Problem in Richardson's Method. Gu. SIAM Journal on Matrix Analysis and Applications. SIAM Journal on Scientific and Statistical Computing. (4e seriej. Journ. 60:375-407. Lawrence Berkeley Laboratory. 7:1147-1159. Gutknecht. G. J. SIAM Journal on Matrix Analysis and Applications. J. Hari. C. Ill-conditioned eigensystems and the computation of the Jordan canonical form. A completed theory of the unsymmetric Lanczos process and related algorithms: Part I. Wilkinson. [99] M. [104] M. Numerische Mathematik. 1994. C. [98] R.

Higham. [112] D.2000. Journal of the Society for Industrial and Applied Mathematics. [Ill] P. R. J. Matrix Analysis. New York. 1985. Structured backward error and condition of generalized eigenvalue problems. On certain methods for expanding the characteristic polynomial. Ipsen. pages 151-201. Higham. [116] D. [122] A. The accuracy of floating point summation. Johnson. C.1958. Originally published by Blaisdell. McGraw-Hill. 20:493-512. 14:783-799. Higham and N. Householder. 6:144-162. The Numerical Treatment of a Single Nonlinear Equation. J.1993. Horn and C. Higham. The Theory of Matrices in Numerical Analysis. Numerische Mathematik. 1964. 24:417'-441 and 498-520. Householder and F. A collection of papers that appeared in Gott. 1970.1989. New York. F. Higham. [120] H. [121] A. [118] R. Cambridge. Grundziige einer allgemeinen Theorie der linearen Integralgleichungen. [114] N. Cambridge University Press. Hilbert. Teubner. Hoffman. Henrici. In Acta Numerica. Cambridge University Press. 1998. G. On the speed of convergence of cyclic and quasicyclic Jacob! methods for computing eigenvalues of Hermitian matrices. Leipzig and Berlin. Computing. S. QR factorization with complete pivoting and accurate computation of the SVD. [117] W. A. Accuracy and Stability of Numerical Algorithms.1958. [119] R. Computing an eigenvector with inverse iteration. J. J. A. inverses. Cambridge University Press. Cambridge. S. 1:29-37. Nachr. C. Henrici. 1991. B. Journal of Educational Psychology. [113] N. 39:254-291. Analysis of a complex of statistical variables into principal components. Cambridge. VolII: Eigensy stems . Ipsen. L. spectral variation and fields of values of nonnormal matrices. Iterative algorithms for Gram-Schmidt orthogonalization. F. J.REFERENCES 437 [110] P. Philadelphia.1962. Dover. SIAM Review. 1996. 1904-1910. Relative perturbation bounds for matrix eigenvalues and singular values. Householder. Linear Algebra and Its Applications.1998. S. 4:24-39. SIAM. [124] I. New York. 41:335-348. Johnson. [125] I. [123] A. SIAM Journal on Matrix Analysis and Applications. [115] N. R. Bauer. Topics in Matrix Analysis. 1912. SIAM Journal on Scientific Computing.1997. Numerische Mathematik. Horn and C.1933. 309:153-174. Bounds for iterates. Hotelling.

Essai sur la geometric a n dimensions. Ipsen. [134] Z. Linear Algebra and Its Applications. Proceedings of the Cambridge Philosophical Society. Linear Algebra and Its Applications. G. Astronomische Nachrichten. G. Jia. Refined iterative algorithm based on Arnoldi's process for large unsymmetric eigenproblems. Zhang. 6:437-443. 1870. Jia and G. Algorithm 560: JNF. 309:45-56. .1857.1980. An analysis of the Rayleigh-Ritz method for approximating eigenspaces. On the convergence of Ritz values. 30:51-94. 1999. Jacobi. 19:35-54. [139] B. Transactions on Mathematical Software. Jacobi.2001. [137] C.2000. [136] C. Kagstrom and A. Mathematics of Computation. 32:35-52. University of Maryland. Department of Computer Science. J. A new shift of the QL algorithm for irreducible symmetric tridiagonal matrices. and refined Ritz vectors. 70:637-647. Absolute and relative perturbation bounds for invariant subspaces of matrices. Linear Algebra and Its Applications. Paris. 22:297-306.2000. W. Jordan. [128] C. [129] C. J. [138] C. Ruhe. Ritz vectors. W. [130] A.1846. A direct iteration method of obtaining latent roots and vectors of a symmetric matrix. [131] Z.1967.3:lQ3-174.1845. Journal de Mathematiques Pures et Appliquees. Memoire sur les formes bilineaires. Jennings. Jiang and Z. Applied Numerical Mathematics. 65:261-272.1997. G. Jia. Deuxieme Serie.1874. [132] Z. Jia and G. 1985. [127] C. Stewart. Journal fur die reine und angewandte Mathematik. Technical Report TR-3986. Uber eine elementare Transformation eines in Buzug jedes von zwei Variablen-Systemen linearen und homogenen Ausdrucks.Jordan. Cited in [167]. Bulletin de la Societe Mathematique.1875. Ueber eine neue Auflosungsart der bei der Methode der kleinsten Quadrate vorkommenden linearen Gleichungen. [133] Z. Traite des Substitutions et des Equations Algebriques. Uber ein leichtes Verfahren die in der Theorie der Sacularstorungen vorkommenden Gleichungen numerisch aufzulosen. 63:755-765. Jordan. [135] E. 259:1-23. posthumous. F. Stewart. A refined subspace iteration algorithm for large sparse eigenproblems. Jacobi. Journalfiir die reine und angewandte Mathematik. an algorithm for numerical computation of the Jordan normal form of a complex matrix.438 REFERENCES [126] LC. J. College Park. 53:265-270.

REFERENCES

439

[140] B. Kagstrom and A. Ruhe. An algorithm for numerical computation of the Jordan normal form of a complex matrix. Transactions on Mathematical Software, 6:398-419,1980. [141] B. Kagstrom and P Wiberg. Extracting partial canonical structure for large scale eigenvalue problems. Numerical Algorithms, 24:195-237,1999. [142] W. Kahan. Inclusion theorems for clusters of eigenvalues of Hermitian matrices. Technical report, Computer Science Department, University of Toronto, 1967. [143] W. Kahan and B. N. Parlett. How far should you go with the Lanczos process? In J. Bunch andD. Rose, editors, Sparse Matrix Computations, pages 131-144. Academic Press, New York, 1976. [144] W. Kahan, B. N. Parlett, and E. Jiang. Residual bounds on approximate eigensystems of nonnormal matrices. SIAM Journal on Numerical Analysis, 19:470484,1982. [145] S. Kaniel. Estimates for some computational techniques in linear algebra. Mathematics of Computation, 20:369-378,1966. [146] T. Kato. Perturbation Theory for Linear Operators. Springer-Verlag, New York, 1966. [147] M. Kline. Mathematical Thought from Ancient to Modern Times. Oxford University Press, New York, 1972. [148] A. V. Knyazev. New estimates for Ritz vectors. Mathematics of Computation, 66:985-995,1997. [149] E. G. Kogbetliantz. Solution of linear systems by diagonalization of coefficients matrix. Quarterly of Applied Mathematics, 13:123-132,1955. [150] N. Kollerstrom. Thomas Simpson and "Newtons method of approximation": An enduring myth. British Journal for the History of Science, 25:347—354, 1992. [151] J. Konig. tiber eine Eigenschaft der Potenzreihen. Mathematische Annalen, 23:447^49,1884. [152] L. Kronecker. Algebraische Reduction der Schaaren bilinearer Formen. Sitzungberichte der Koniglich Preuftischen Akademie der Wissenschaften zu Berlin, pages 1225-1237,1890. [153] A. N. Krylov. O Cislennom resenii uravnenija, kotorym v techniceskih voprasah opredeljajutsja castoy malyh kolebanil materiaFnyh. Izv. Adad. Nauk SSSR old. Mat. Estest., pages 491-539,1931. Cited in [121].

Vol II: Eigensystems

440

REFERENCES

[154] V. N. Kublanovskaya. On some algorithms for the solution of the complete eigenvalue problem. Zhurnal VychislitelnoiMatematiki i Matematicheskoi Fiziki, 1:555-570,1961. In Russian. Translation in USSR Computational Mathematics and Mathematical Physics, 1:637-657,1962. [155] J.-L. Lagrange. Solution de differents problems de calcul integral. Miscellanea Taurinensia, 3,1762-1765. Cited and reprinted in [156, v. 1, pages 472-668]. [156] J.-L. Lagrange. (Euvres de Langrange. Gauthier-Villars, Paris, 1867-1892. [157] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards, 45:255-282,1950. [158] R. Lehoucq and D. C. Sorensen. Deflation techniques for an implicitly restarted Arnoldi iteration. SIAM Journal on Matrix Analysis and Applications, 17:789821, 1996. [159] R. B. Lehoucq. The computation of elementary unitary matrices. ACM Transactions on Mathematical Software, 22:393-400,1996. [160] R. B. Lehoucq. On the convergence of an implicitly restarted Arnoldi method. Technical Report SAND99-1756J, Sandia National Laboratories, Albuquerque, NM, 1999. [161] R. B. Lehoucq and J. A. Scott. An evaluation of software for computing eigenvalues of sparse nonsymmetric matrices. Preprint MCS-P547-1195, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, 1996. [162] R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK Users' Guide: Solution of Large Scale Eigenvalue Problems by Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia, 1998. [163] C.-K. Li and R. Mathias. The definite generalized eigenvalue problem: A reworked perturbation theory. Department of Mathematics, College of William & Mary, 2000. [164] R.-C. Li. Solving secular equations stably and efficiently. Technical Report UCB//CSD-94-851, Computer Science Division, University of California, Berkeley, 1993. [165] R-C. Li. Relative pertubation theory: I. Eigenvalue and singular value variations. SIAM Journal on Matrix Analysis and Applications, 19:956-982,1998. [ 166] R-C. Li. Relative pertubation theory: II. Eigenspace and singular subspace variations. SIAM Journal on Matrix Analysis and Applications, 20:471-492,1998. [167] C. C. Mac Duffee. The Theory of Matrices. Chelsea, New York, 1946.

REFERENCES

441

[168] M. Marcus and H. Mine. A Survey of Matrix Theory and Matrix Inequalities. Allyn and Bacon, Boston, 1964. [169] R. S. Martin, C. Reinsch, and J. H. Wilkinson. Householder tridiagonalization of a real symmetric matrix. Numerische Mathematik, 11:181-195,1968. Also in [305, pages 212-226]. [170] R. S. Martin and J. H. Wilkinson. Reduction of the symmetric eigenproblem Ax — \Bx and related problems to standard form. Numerische Mathematik, 11:99-110,1968. Also in [305, pages 303-314]. [171] W. F. Mascarenhas. On the convergence of the Jacobi method for arbitrary orderings. SI AM Journal on Matrix Analysis and Applications, 16:1197-1209, 1995. [172] R. Mathias. Accurate eigensystem computations by Jacobi methods. SIAM Journal on Matrix Analysis and Applications, 16:977-1003,1995. [173] R. Mathias. Spectral perturbation bounds for positive definite matrices. SIAM Journal on Matrix Analysis and Applications, 18:959-980,1997. [174] R. Mathias. Quadratic residual bounds for the Hermitian eigenvalue problem. SIAM Journal on Matrix Analysis and Applications, 18:541-550,1998. [175] H. I. Medley and R. S. Varga. On smallest isolated Gerschgorin disks for eigenvalues. III. Numerische Mathematik, 11:361-369,1968. [176] K. Meerbergen and A. Spence. Implicitly restarted Arnoldi with purification for the shift-invert transformation. Mathematics of Computation, 66:667-689, 1997. [177] D. S. Mitrinovic. Analytic Inequalities. Springer-Verlag, New York, 1970. [178] C. B. Moler and G. W. Stewart. An algorithm for generalized matrix eigenvalue problems. SIAM Journal on Numerical Analysis, 10:241-256,1973. [179] R. B. Morgan. Computing interior eigenvalues of large matrices. Linear Algebra and Its Applications, 154-156:289-309,1991. [180] R. B. Morgan. On restarting the Arnoldi method for large nonsymmetric eigenvalue problems. Mathematics of Computation, 65:1213-1230,1996. [181] R. B. Morgan. Implicitly restarted GMRES and Arnoldi methods for nonsymmetric systems of equations. SIAM Journal on Matrix Analysis and Applications, 21:1112-1135,2000. [182] C. L. Miintz. Solution direct de 1'equation seculaire et de quelques problemes analogues transcendents. Comptes Rendus de I'Academie des Sciences, Paris, 156:43^6,1913. VolII: Eigensy stems

442

REFERENCES

[183] M. Newman and P. F. Flanagan. Eigenvalue extraction in NASTRAN by the tridiagonal reduction (PEER) method—Real eigenvalue analysis. NASA Contractor Report CR-2731, NASA, Washington DC, 1976. Cited in [236]. [184] B. Nour-Omid, B. N. Parlett, T. Ericsson, and P. S. Jensen. How to implement the spectral transformation. Mathematics of Computation, 48:663-67'3,1897. [185] W. Oettli and W. Prager. Compatibility of approximate solution of linear equations with given error bounds for coefficients and right-hand sides. Numerische Mathematik, 6:405-409,1964. [186] J. Olsen, P. J0rgensen, and J. Simons. Passing the one-billion limit in full configuration-interaction (FCI) calculations. Chemical Physics Letters, 169:463^72,1990. [187] E. E. Osborne. On pre-conditioning of matrices. Journalof the ACM, 7:338345,1960. [188] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. I. Archive for Rational Mechanics and Analysis, 1:233-241,1958. [189] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors, n. Archive for Rational Mechanics and Analysis, 2:423-428,1958. [190] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. HI (generalized Rayleigh quotient and characteristic roots with linear elementary divisors). Archive for Rational Mechanics and Analysis, 3:325-340,1959. [191] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. IV (generalized Rayleigh quotient for nonlinear elementary divisors). Archive for Rational Mechanics and Analysis, 3:341-347,1959. [192] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. VI (usual Rayleigh quotient for nonlinear elementary divisors). Archive for Rational Mechanics and Analysis, 4:153-165,1959. [193] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. V (usual Rayleigh quotient for non-Hermitian matrices and linear elementary divisors). Archive for Rational Mechanics and Analysis, 3:472-481,1959. [194] A.M. Ostrowski. Solution ofEquations and Systems ofEquations. Academic Press. New York. 1960.

REFERENCES

443

[195] C. C. Paige. Practical use of the symmetric Lanczos process with reorthogonalization. BIT, 10:183-195,1970. [196] C. C. Paige. The Computation of Eigenvalues and Eigenvectors of Very Large Sparse Matrices. PhD thesis, University of London, 1971. [197] C. C. Paige. Computational variants of the Lanczos method for the eigenproblem. Journal of the Institute of Mathematics and Its Applications, 10:373—381, 1972. [198] C. C. Paige. Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix. Journal of the Institute of Mathematics and Its Applications, 18:341-349,1976. Cited in [25]. [199] C. C. Paige. Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem. Linear Algebra and Its Applications, 34:235-258,1980. Reprinted in [27]. [200] C. C. Paige. Computing the generalized singular value decomposition. SIAM Journal on Scientific and Statistical Computing, 7:1126-1146,1986. [201] C. C. Paige, B. N. Parlett, and H. van der Vorst. Approximate solutions and eigenvalue bounds from Krylov subspaces. Numerical Linear Algebra with Applications, 2:115-134,1995. [202] C. C. Paige and M. A. Saunders. Toward a generalized singular value decomposition. SIAM Journal on Numerical Analysis, 18:398-405,1981. [203] C. C. Paige and P. Van Dooren. On the quadratic convergence of Kogbetliantz's algorithm for computing the singular value decomposition. Linear Algebra and Its Applications, 77:301-313,1986. [204] B. N. Parlett. The origin and development of methods of the LR type. SIAM Review, 6:275-295,1964. [205] B. N. Parlett. A new look at the Lanczos algorithm for solving symmetric systems and linear equations. Linear Algebra and Its Applications, 29:323-346, 1980. [206] B. N. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall, Englewood Cliffs, NJ, 1980. Reissued with revisions by SIAM, Philadelphia, 1998. [207] B. N. Parlett and J. Le. Forward instability of tridiagonal qr. SIAM Journal on Matrix Analysis and Applications, 14:279-316,1993. [208] B. N. Parlett and C. Reinsch. Balancing a matrix for calculation of eigenvalues and eigenvectors. Numerische Mathematik, 13:293-304,1969. Also in [305, pages 315-326].

Vol IT. Eigensystems

444

REFERENCES

[209] B. N. Parlett and D. S. Scott. The Lanczos algorithm with selective orthogonalization. Mathematics of Computation, 33:217-238,1979. [210] B. N. Parlett, D. R. Taylor, and Z. A. Liu. A look-ahead Lanczos algorithm for unsymmetric matrices. Mathematics of Computation, 44:105-124,1985. [211] G. Peters and J. H. Wilkinson. Inverse iteration, ill-conditioned equations, and Newton's method. SIAM Review, 21:339-360,1979. Cited in [94]. [212] D. A. Pope and C. Tompkins. Maximizing functions of rotations: Experiments concerning speed of diagonalization of symmetric matrices using Jacobi's method. JournaloftheACM, 4:459-466,1957. [213] D. Porter and D. S. G. Stirling. Integral Equations. Cambridge University Press, Cambridge, 1990. [214] M. J. D. Powell. A new algorithm for unconstrained optimization. In J. B. Rosen, O. L. Mangasarian, and K. Ritter, editors, Nonlinear Programming, pages 31-65. Academic Press, New York, 1970. Cited in [64, page 196]. [215] Lord Rayleigh (J. W. Strutt). On the calculation of the frequency of vibration of a system in its gravest mode, with an example from hydrodynamics. The Philosophical Magazine, 47:556-572,1899. Cited in [206]. [216] J. L. Rigal and J. Gaches. On the compatibility of a given solution with the data of a linear system. JournaloftheACM, 14:543-548,1967. [217] W. Ritz. liber eine neue Method zur Losung gewisser Variationsprobleme der mathematischen Physik. Journal fiir die reine und angewandte Mathematik, 135:1-61,1909. [218] A. Ruhe. Perturbation bounds for means of eigenvalues and invariant subspaces. BIT, 10:343-354,1970. [219] A. Ruhe. Implementation aspects of band Lanczos algorithms for computation of eigenvalues of large sparse symmetric matrices. Mathematics of Computation, 33:680-687,1979. [220] A. Ruhe. Rational Krylov algorithms for nonsymmetric eigenvalue problems. H. Matrix pairs. Linear Algebra and Its Applications, 197-198:283-295,1994. [221] A. Ruhe. Rational Krylov subspace method. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 246-249. SIAM, Philadelphia, 2000. [222] H. Rutishauser. Anwendungen Des Quotienten-Differenzen-Algorithmus. Zeitschrift filr angewandte Mathematik und Physik, 5:496-508,1954. [223] H. Rutishauser. Der Quotienten-Differenzen-Algorithmus. Zeitschrift fiir angewandte Mathematik und Physik, 5:233-251,1954.

REFERENCES

445

[224] H. Rutishauser. Une methode pour la determination des valeurs propres d'une matrice. Comptes Rendus de VAcademic des Sciences, Paris, 240:34-36,1955. [225] H. Rutishauser. Solution of eigenvalue problems with the LR-transformation. National Bureau of Standards Applied Mathematics Series, 49:47-81,1958. [226] H. Rutishauser. The Jacobi method for real symmetric matrices. Numerische Mathematik, 9:1-10,1966. Also in [305, pages 202-211]. [227] H. Rutishauser. Computational aspects of F. L. Bauer's simultaneous iteration method. Numerische Mathematik, 13:4-13,1969. [228] H. Rutishauser. Simultaneous iteration method for symmetric matrices. Numerische Mathematik, 16:205-223,1970. Also in [305, pages 284-301]. [229] Y. Saad. On the rates of convergence of the Lanczos and the block Lanczos methods. SIAM Journal on Numerical Analysis, 17:687-706,1980. [230] Y. Saad. Variations of Arnoldi's method for computing eigenelements of large unsymmetric matrices. Linear Algebra and Its Applications, 34:269-295,1980. [231] Y. Saad. Projection methods for solving large sparse eigenvalue problems. In B. Kagstrom and A. Ruhe, editors, Matrix Pencils, pages 121-144. SpringerVerlag, New York, 1983. [232] Y. Saad. Chebyshev acceleration techniques for solving nonsymmetric eigenvalue problems. Mathematics of Computation, 42:567'-588,1984. [233] Y. Saad. Numerical Methods for Large Eigenvalue Problems: Theory and Algorithms. John Wiley, New York, 1992. [234] J. Schur. Uber die charakteristischen Wiirzeln einer linearen Substitution mit einer Anwendung auf die Theorie der Integralgleichungen. MathematischeAnnalen, 66:448-510,1909. [235] H. R. Schwarz. Tridiagonalization of a symmetric band matrix. Numerische Mathematik, 12:231-241,1968. Also in [305, pages 273-283]. [236] D. S. Scott. The advantages of inverted operators in Rayleigh-Ritz approximations. SIAM Journal on Scientific and Statistical Computing, 3:68-75,1982. [237] H. D. Simon. Analysis of the symmetric Lanczos algorithm with reorthogonalization methods. Linear Algebra and Its Applications, 61:101-132,1984. [238] H. D. Simon. The Lanczos algorithm with partial reorthogonalization. Mathematics ofComputation, 42:115-142,1984. [239] G. L. G. Sleijpen, A. G. L. Booten, D. R. Fokkema, and Henk van der Vorst. Jacobi-Davidson type methods for generalized eigenproblems and polynomial eigenproblems. BIT, 36:595-633,1996. Vol II: Eigensystems

446

REFERENCES

[240] G. L. G. Sleijpen and H. A. van der Vorst. A Jacobi-Davidson iteration method for linear eigenvalue problems. SIAM Journal on Matrix Analysis and Applications, 17:401^25,1996. [241] G. L. G. Sleijpen and H. A. van der Vorst. Generalized Hermitian eigenvalue problems: Jacobi-Davidson methods. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 123-127. SIAM, Philadelphia, 2000. [242] G. L. G. Sleijpen and H. A. van der Vorst. Generalized non-Hermitian eigenvalue problems: Jacobi-Davidson method. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 238-246. SIAM, Philadelphia, 2000. [243] G. L. G. Sleijpen and H. A. van der Vorst. The Hermitian eigenvalue problem: Jacobi-Davidson methods. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 88-105. SIAM, Philadelphia, 2000. [244] G. L. G. Sleijpen and H. A. van der Vorst. Non-Hermitian eigenvalue problems: Jacobi-Davidson methods. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 221-228. SIAM, Philadelphia, 2000. [245] G. L. G. Sleijpen, H. A. van der Vorst, and E. Meijerink. Efficient expansion of subspaces in the Jacobi-Davidson method for standard and generalized eigenproblems. Electronic Transactions on Numerical Analysis, 7:75-89,1998. [246] D. C. Sorensen. Implicit application of polynomial filters in a fc-step Arnoldi method. SIAM Journal on Matrix Analysis and Applications, 13:357-385, 1992. [247] D. C. Sorensen and P. T. P. Tang. On the orthogonality of eigenvectors computed by divide and conquer techniques. SIAM Journal on Numerical Analysis, 28:1752-1775,1991. [248] A. Stathopoulos, Y. Saad, and Kesheng Wu. Dynamic thick restarting of the Davidson and the implicitly restarted Arnoldi algorithms. SIAM Journal on Scientific Computing, 19:227-245,1998. [249] G. W. Stewart. Accelerating the orthogonal iteration for the eigenvalues of a Hermitian matrix. Numerische Mathematik, 13:362-376,1969. [250] G. W. Stewart. Error bounds for approximate invariant subspaces of closed linear operators. SIAM Journal on Numerical Analysis, 8:796-808,1971. [251] G. W. Stewart. On the sensitivity of the eigenvalue problem Ax = \Bx. SIAM Journal on Numerical Analysis, 4:669-686,1972.

REFERENCES

447

[252] G. W. Stewart. Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM Review, 15:727-764,1973. [253] G. W. Stewart. Introduction to Matrix Computations. Academic Press, New York, 1973. [254] G. W. Stewart. Modifying pivot elements in Gaussian elimination. Mathematics of Computation, 28:537-542,1974. [255] G. W. Stewart. Gerschgorin theory for the generalized eigenvalue problem Ax = XBx. Mathematics of Computation, 29:600-606,1975. [256] G. W. Stewart. Algorithm 407: HQR3 and EXCHNG: FORTRAN programs for calculating the eigenvalues of a real upper Hessenberg matrix in a prescribed order. ACM Transactions on Mathematical Software, 2:275-280,1976. [257] G. W. Stewart. Simultaneous iteration for computing invariant subspaces of a non-Hermitian matrix. Numerische Mathematik, 25:123-136,1976. [258] G. W. Stewart. On the perturbation of pseudo-inverses, projections, and linear least squares problems. SIAM Review, 19:634-662,1977. [259] G. W. Stewart. Computing the CS decomposition of a partitioned orthogonal matrix. Numerische Mathematik, 40:297-306,1982. [260] G. W. Stewart. A method for computing the generalized singular value decomposition. In B. Kagstrom and A. Ruhe, editors, Matrix Pencils, pages 207-220. Springer-Verlag, New York, 1983. [261] G. W. Stewart. Two simple residual bounds for the eigenvalues of Hermitian matrices. SIAM Journal on Matrix Analysis and Applications, 12:205-208, 1991. [262] G. W. Stewart. Error analysis of QR updating with exponential windowing. Mathematics of Computation, 59:135-140,1992. [263] G. W. Stewart. On the early history of the singular value decomposition. SIAM Review, 35:551-566,1993. [264] G. W. Stewart. Afternotes Goes to Graduate School. SIAM, Philadelphia, 1998. [265] G. W. Stewart. Matrix Algorithms I: Basic Decompositions. SIAM, Philadelphia, 1998. [266] G. W. Stewart. The decompositional approach to matrix computations. Computing in Science and Engineering, 2:50-59,2000. [267] G. W. Stewart. A Krylov-Schur algorithm for large eigenproblems. Technical Report TR-4127, Department of Computer Science, University of Maryland, College Park, 2000. To appear in the SIAM Journal on Matrix Analysis and Applications. Vol II: Eigensystems

448

REFERENCES

[268] G. W. Stewart. A generalization of Saad's theorem on Rayleigh-Ritz approximations. Linear Algebra and Its Applications, 327:115-120,2001. [269] G. W. Stewart and J.-G. Sun. Matrix Perturbation Theory. Academic Press, New York, 1990. [270] W. J. Stewart and A. Jennings. Algorithm 570: LOPSI a simultaneous iteration method for real matrices. ACM Transactions on Mathematical Software, 7:230232,1981. [271] W. J. Stewart and A. Jennings. A simultaneous iteration algorithm for real matrices. ACM Transactions on Mathematical Software, 7:184-198,1981. [272] J. J. Sylvester. On a theory of the syzygetic relations etc. Phil. Trans., pages 481-84,1853. Cited in [28]. [273] J. J. Sylvester. Sur 1'equation en matrices px = xq. Comptes Rendus de I'Academie des Sciences, pages 67-71 and 115-116,1884. [274] O. Taussky. A recurring theorem on determinants. American Mathematical Monthly, 46:672-675,1949. [275] D. R. Taylor. Analysis of the Look Ahead Lanczos Algorithm. PhD thesis, Department of Mathematics, University of California, Berkeley, 1982. Cited in [79]. [276] L. N. Trefethen. Computation of pseudospectra. In ActaNumerica, pages 247295. Cambridge University Press, Cambridge, 1999. [277] L. N. Trefethen. Spectral Methods in MATLAB. SIAM, Philadelphia, 2000. [278] L. N. Trefethen and D. Bau, HI. Numerical Linear Algebra. SIAM, Philadelphia, 1997. [279] L. N. Trefethern. Pseudospectra of linear operators. SIAM Review, 39:383^06, 1997. [280] R. Underwood. An iterative block Lanczos method for the solution of large sparse symmetric eigenproblems. Technical Report STAN-CS-75-496, Computer Science, Stanford University, Stanford, CA, 1975. PhD thesis. [281] P. Van Dooren. The computation of Kronecker's canonical form of a singular pencil. Linear Algebra and Its Applications, 27:103-140,1979. [282] J. M. Van Kats and H. A. van der Vorst. Automatic monitoring of Lanczos schemes for symmetric or skew-symmetric generalized eigenvalue problems. Technical Report TR 7, Academisc Computer Centre Utrecht, Budapestlaan 6, De Vithof-Utrecht, the Netherlands, 1977. Cited in [236]. [283] C. F. Van Loan. Generalizing the singular value decomposition. SIAM Journal on Numerical Analysis, 13:76-83,1976.

1964. 4:296-300. H.1962.1981. S. Forward stability and transmission of shifts in the QR algorithm. 9:184-191. Watkins and L. Bericht B 44/J/37. 143:19-47. Ward. Van Loan. Computer Journal. 1991. H. Convergence of algorithms of decomposition type for the eigenvalue problem. Beitrage zur mathematischen Behandlung komplexer Eigenwertprobleme V: Bestimmung hoherer Eigenwerte durch gebrochene Iteration. The calculation of eigenvectors by the method of Lanczos. Numerische Mathematik. Linear Algebra and Its Applications. S. Watkins. 12:835-853.1868. [295] J. Linear Algebra and Its Applications. Mathematical Tables and Other Aids to Computation. Monatshefte Akadamie Wissenshaften Berlin. S. SIAM Journal on Numerical Analysis. [289] D. [297] J. [296] J. On smallest isolated Gerschgorin disks for eigenvalues. Aerodynamische Versuchsanstalt Gottingen. Weierstrass. SIAM Journal on Matrix Analysis and Applications. Watkins. Wilkinson. 3:23-27. C. Eisner. [294] H. Wielandt.1960. 241-243:877-896. S. pages 310-338.1959. The combination shift QZ algorithm. Germany.1985. Varga.1995. [298] J. 2:141-152. Ward. New York.REFERENCES 449 [284] C. On the transmission of shifts and shift blurring in the QR algorithm.1996. Computer Journal. S. H. F. Note on the quadratic convergence of the cyclic Jacobi process. C. Fundamentals of Matrix Computations. [287] R. H. [292] D. [290] D. Balancing the generalized eigenvalue problem. Zur Theorie der bilinearen und quadratischen Formen. Numerische Mathematik. [285] R. Watkins. 2000. Wilkinson. 1991. [286] R. SIAM Journal on Scientific and Statistical Computing.1955. 16:469—487. 46:479-491. Wilkinson. 6:366-376. Numerische Mathematik. John Wiley & Sons. Vol II: Eigensystems . 22:364-375. Computing the CS and the generalized singular value decompositions.1975. [291] D. S. Performance of the QZ algorithm in the presence of infinite eigenvalues. SIAM Journal on Matrix Analysis and Applications. Watkins. [288] D. 1:148-152. [293] K. 1944. Householder's method for the solution of the algebraic eigenvalue problem. Wilkinson. The use of iterative methods for finding the latent roots and vectors of matrices.

H Linear Algebra. QR. Historical development of the Newton-Raphson method. Numerische Mathematik. Simon. Springer-Verlag. [301] J. H. 1:409—420. H.1968. Wilkinson. 1971. H. NJ. J.1995. 1965. Wilkinson. Vol. [305] J. Rounding Errors in Algebraic Processes. Kronecker's canonical form and the QZ algorithm. Wilkinson. 28:285-303. Wu and H. 8:77-84. The Algebraic Eigenvalue Problem. H. [303] J. Global convergence of tridiagonal QR algorithm with origin shifts. Clarendon Press. Wilkinson. 19:176-178. Linear Algebra and Its Applications. 22:602616. Wilkinson and C. H. Handbook for Automatic Computation. [300] J.2000. New York. and related algorithms. [306] K. H. Thick-restart Lanczos method for large symmetric eigenvalue problems.1972. 1963. SIAM Review. Convergence of the LR. SIAM Journal on Matrix Analysis and Applications. Reinsch.1979. [302] J. Note on matrices with a very ill-conditioned eigenproblem. [304] J. H. 37:531-551. Ypma. Wilkinson.1965. Wilkinson. Oxford. [307] T. Prentice-Hall.450 REFERENCES [299] J. Englewood Cliffs. Computer Journal. . Linear Algebra and Its Applications.

(componentwise comparison). see norm A (conjugate of A). 302. 341-342 shift-and-invert enhancement. 340 nonorthogonal bases. 422 A~T (transpose inverse of A). 1-norm. A < B. 419 constant shift approximation. 337 in Arnoldi decomposition. 404-405 equivalence of correction formula and equation. 340 in Krylov-Schur method. 344 Arnoldi process. 422 Abel. 403 orthogonal correction. 399 Arnoldi decomposition. see Arnoldi decomposition Arnoldi method. 339-340 .299.4). 117. 344 comparison of implicit and Krylov-Schur restarting. see Arnoldi method computation of residuals. 300 QR factorization of Krylov sequence. see canonical angle approximate Newton method. 337-340. 403-404 solution. 422 A~l (inverse of A). 300 uniqueness. 396-397 diagonally dominant A. 336 convergence criteria. 300 Arnoldi factorization. 396 analysis. 297. 332 computation of residuals. 422 Au (conjugate transpose of . 334-336. 69. 403 correction equation.420 inexact solution.INDEX Underlined page numbers indicate a defining entry. 315 computation. 403 451 local convergence. H. Slanted page numbers indicate an entry in a notes and references section. 398 error recurrence. see eigenvalue angle. A. 76 Aitken. 397. 301 Arnoldi factorization. 402 convergence rates. 422 AT (transpose of A). 403 drawbacks. The abbreviation me indicates that there is more information at the main entry for this item. see norm \A\ (absolute value of A). 397 derivation. 336 hat convention.303-304 block.110 A < B. 315 alternate form. 336 deflation. 398 correction formula. 300 uniqueness of starting vector. 399 nomenclature. 423 algebraic multiplicity. 402 natural approximation. 402 inner and outer iterations. etc. 301 reduced. 55 absolute norm. see norm 2-norm.. C. 400-402 and inexact Newton method. 422 A~H (conjugate transpose inverse of 4). Only authors mentioned explicitly in the text are indexed.347 advantages.347 choice of tolerance. 423 accumulation of transformations. 302 termination of Krylov sequence. 299. 298 Rayleigh quotient. N.

323 forward instability. 315.316 Arnoldi. 167 inertia of a tridiagonal matrix. 372-373 economizing B products. 336 stability. 94 Householder's reduction to tridiagonal form.265.196. 341 arrowhead matrix. 331 filter polynomial. 192-193 Lanczos algorithm with full orthogonalization. 372-373 maintaining orthogonality. 334. 373 orthonormal matrix. 162. 324-325 example.E. 373 residual norms. 304 reorthogonalization. 183 double shift QR algorithm. 70. 372 symmetry. 320-323 overview. 375 restarted. 147 spectral decomposition updating. 318. 371 Cauchy inequality. 325. W. 102 Hessenberg QR algorithm. 345 convergence criteria. 338-339 divide-and-conquer algorithm for the spectral decomposition. 371 norm.372 orthogonalization. 10 inverse power method.316 stability. 70 convergence criteria. 374 orthogonalization. 220 deflation in Arnoldi method.344 ARPACK. 230. 3J6. 322 truncation index.346-347 band tridiagonalization. Krylov-Schur restarting shift-and-invert enhancement. 334. 304-305. 374 Ritz pairs. 334-335. 320 exact (Raylei gh quotient) shifts. 375 backward error and unitary similarities. 371 INDEX orthogonal matrix. Arnoldi method. 61. 329 the cycle. 69 backward stability and complex eigenvalues of real matrices. 189 bidiagonal QR step. 126 Arnoldi method. 347 stagnation of Krylov subspace. 372 orthogonality. 179 . 373 deflation. 217 reduction to Hessenberg-triangular form.170 implicit tridiagonal QR step. 374-375 periodic reorthogonalization. 325-326. 316 also see Arnoldi method. 371 B-Arnoldi method. 325. 346 exchanging eigenblocks. 337-338 stability. 152 reduction to bidiagonal form. 10 as paradigm.452 residual norm. 62 from residual. 373-374 restarting. 332-333. 126 eigenvector computation. 346-347 shift-and-invert enhancement. 317. 338-339 equivalence of implicit and Kry lov-Schur restarting. 332-333. 318-322 double shift method. 305 convergence criteria. 351 QZ algorithm. 345 also see filter polynomial implicit restarting.253. 305-306. 328-329 loss of orthogonality. 344-345 rationale.369 in convergence testing. me operation count. 318 the cycle. 303 restarted. 347 storage requirements. 117 operation count. 373 5-Lanczos method. 201 B inner product. 346 operation count. 326-328. 320 Krylov-Schur restarting. implicit restarting.. 345 contraction phase.

220 shift computation. 152-153. 394 column space (K(X)). 23 Cayley. 255 Braman.52 bi-Lanczos algorithm. 228 deflation. 388 Wilkinson's contribution. 248. 224 detecting negligible superdiagonal elements. 234 updating. Z. 4. 87 subspace iteration..156 operation count. 4-5 characteristic value. 23 Chatelin. 102.. 55 and terminating Krylov sequences.264 angles between right and left eigenspaces. 110 matrix pencil. 129 453 C (complex numbers). 215 complex bidiagonal matrix to real. R. 421 C m x n (space of complex mxn matrices).. 317 in complex plane. 162 companion matrix. S. 138. 222-223 zero shift. 23. 217.228 operation count. 108 Bartels. 249 computation. 220 stability. 283.. 217 operation count.279 Chebyshev polynomial. 249-250 Cauchy. 23 Bauer. M.and row-oriented algorithms... see Krylov subspace block triangularization of nearly block triangular matrix. 346.264 of a combination of subspaces. 280 in subspace iteration. 55 complete system of eigenvectors. 223-224. 49 characterization of largest angle. 129 Bunch. R. 421 Cn (complex n-space). 264 block Krylov subspace. 157 Cholesky decomposition. 279 in Lanczos's method. 251 between two vectors. H. 217 relative stability of singular values. 394 Bellman. 392 Cholesky algorithm. 224. 226. 107-108.155 Clint. 228 Chandrasekaran. 217 stability. 7 and diagonalization. L. 227 biorthogonal bases.227-228 QR step.. R. 367 look-ahead recurrence.227 first column of the transformation. 217. 424 column. 23 Chan. 24 characteristic polynomial. 44 computation. 395 balancing.227-228 combined with QR decomposition. 215-217.129. 24 Bau. T. 421 canonical angle. E.23. 4. D. K. F.112 and grading. 111 Bai. 217 reduction to... 250-251 of a vector and a subspace. R. 280 as filter polynomial. 231. A. 235 characteristic equation. 27J_. 23 Beltrami.. 171 chordal distance. 226 Bhatia.223 bidiagonal QR algorithm. 24 and companion matrix.426 pivoted.227 graded matrices. 8-9 condition .. J.R. 245 Bjorck. 249..INDEX stability in the ususal sense. 219-220 operation count. A. E. 226 pivoting..E.315. 201 Vol II: Eigensystems Byers. 315 matrix of order two. 367 bidiagonal matrix. 250 subspaces of unequal dimensions. A.

201 Curtis. 3j_ defective.185 error analysis. 183 the effects of deflation. B. W.. 48 condition number. 419 Davis. 261 singular values. normwise and componentwise.!. 344 by inspection. 367 Cuppen. 114 also see Hessenberg QR algorithm. 140 generalized eigenvector. S. 236 deflation. 34 convergence. 185 unsuitability for graded matrices. 206 singular vector.265 defectiveness. 291-292. 15. 7-8. J.38 definite generalized eigenvalue problem. 185 recursive vs. 48 Hermitian matrix. 20 dim(^) (dimension of X). M. 48 eigenvalue. R. 15 sensitivity of eigenvalues. 280. A.227.. 346 Dennis. 7. R. 422 Dhillon. 20 complete system of eigenvectors. E. 34 condition number. 190 conjugate gradient method.296 use by statisticians. direct approach. 107 complex eigenvector.S. dn] (diagonal matrix). 423 diagonalization block. 260 simple eigenspace. 237 Cullum.. 209-210 condition number. see condition congruence transformation. Dembo.. 156 Datta. QR algorithm.20] depth of the recursion. 214. 419 Demmel. 48 simple eigenblock.. 12. 183 stability. see norm convergence ratios.19-20 assessment.51 eigenvector. 48. 129. J. power method. 51-52 also see sep generalized eigenvalue. 183 divide-and-conquer algorithms. E. 106-107 operation count. 210 S/PD generalized eigenvalue. 50 Hermitian matrix.454 INDEX Davidson.. 8-9 distinct eigenvalues.202 diag(<$i. 276 dependence of eigenvectors. 205 CS decomposition.. 181-183. R. 424 divide-and-conquer algorithm for the spectral decomposition. 143 ill conditioning. 3± dominant eigenvalue.R.227 attractive features. . 48-51. 235-236 rotating to make B positive definite. N.53 condition number. 53. 419 . 231 well conditioning. 181-182 operation count. 210-211 assessment.. C. 34 eigenvalue.. 183. 7 matrix.. 279 consistent norm. 42-43. 236-237 algorithms. 211-212 inaccuracy in the left singular vectors. J.I.53 condition number. 265. 201 generalities. 23. 48 limitations. 211 inaccuracy in the singular values. 227 cross-product matrix. J.. 181-182 dominant eigenpair. 23 Davidson's algorithm. C. 212-213 refined Ritz vector.418 det(^l) (determinant of A).. 8 dominant eigenvalues. etc.. 384 and Krylov subspaces. 214 inaccuracy in the right singular vectors. 30 Crawford. 25 uniqueness. 236 cross-product algorithm for the S VD..

I. 262-263. 100 diagonal matrix. 3 skew Hermitian matrix. 240 existence. 2 eigenvector. 241 residual. 12 block triangular matrix.53 real matrix. 242 eigenpair. 4. 2 perturbation theory first-order. E. S. me Hermitian matrix.6. 263-264. 14 triangular matrix.53 simple eigenvalue.201 Duff. 13 history and nomenclature. 2. 240-241 representation of A with respect to a basis. me diagonal matrix. 242 right. 56 Donath. me continuity. 243 nonuniqueness. 3 dominant. 12-13 normal matrix. 45-46. 253. J. 4. 3 unitary matrix. 244 representation by a basis. 251. 23-24 multiple.. me derivatives. 8 singular matrix. 4 left and right. 7. 13 history and nomenclature. 3 triangular matrix. 242 left. 31. 6.264 residual bounds for eigenspaces of Hermitian matrices. 22. 254.53 rigorous bounds. 31. 44. me eigenvalue.. 31.367 Dongarra.53 rigorous bounds. 3 dominant. 265 residual bounds for eigenvalues of Hermitian matrices. 3 dominant. me diagonal matrix. Hermitian matrix. 5-6 right. 2 left. 2 algebraic multiplicity. 242 orthonormal.52 geometric multiplicity. 3 also see eigenspace eigenspace. 23-24 left. 2. 423 eigenbasis.334-335 backward error. 395 ej («th unit vector). 14 perturbation theory. 52 condition.12 Gerschgorin's theorem. 244..265 Vol II: Eigensystems 455 optimal. 37-38 defective.115. 242 behavior under similarity transformations. 127. me Eisner's theorem.. 48. 14 also see eigenpair. 46-47. 45-46. 247 deflation. me Hermitian matrix. 46-47.52-53 . 2 normalization. 23. 8 singular matrix. 252. W. 39. power method. me nilpotent matrix. me computing eigenvectors. 7. 4. 38. 242 as orthogonal complement of right eigenspace. see eigenspace eigenblock. see eigenspace eigenpair. 2 complete system. 2 simple. 47 first-order. 280.242 eigenblock. 7. me characteristic polynomial.INDEX dominant eigenvector. similarity transformation eigenvector.265 simple. 240. 242 examples. J.24 complete system. 5 characteristic equation. 31. 243 eigenbasis. 243 nonorthogonal. 242 left. me also see simple eigenspace eigenvalue. 38 simple.

130 condition. 352. 197 orthogonalization..456 normal matrix.53 rigorous bounds. 130 algebraic multiplicity. 4 simple. C. 200 generalized eigenvalue problem. 129. V.170 Fraysse. me first-order. 130 perturbation of. 346 Francis. 200 loss of orthogonality. 119 plane rotation. 3 also see eigenpair.419 EISPACK 2x2 blocks in QZ algorithm. 198 tradeoffs in clustering. me generalized shift-and-invert transform. 38. 41 fl (floating-point evaluation). 196-197 convergence. 197-198 Gram-Schmidt algorithm. 134. 132 generalized Schur form. 138. 200 starting vector. 133 condition. 347. 132. R.380 exchanging eigenblocks. 157 and inertia. 3 triangular matrix. J. G. 46-47. 140 first-order expansion. see norm Caches. 131 effect on eigenvalues and eigenvectors. 58 in complex arithmetic. similarity transformation eigenvectors of a symmetric band matrix. L. 8 singular matrix.. 67 INDEX Fernando. 14 normalization. 133 chordal distance.. 345-346 stability. T. see Rayleigh-Ritz method Gaussian elimination. 317. 195-196 solution of inverse power equation. Hermitian matrix. 193 eigenvector computation.202 attaining a small residual. 198-200 limitations of the algorithm. 380 Galerkin method. 110-112. 6 perturbation theory condition. R.53 real matrix. 228 filter polynomial. 89 factorization. 156 eigenvalues by bisection. 379. S.52. 91 Fokkema. 201. D. 52. V. 202 elementary reflector. 130 eigenvalue. 2 null space characterization. 380 Freund. 200 Eisenstat. 86-87 flrot. 140 projective representation. 143 first-order expansion. 228... 48-50. 191 band matrix. 111 6M (rounding unit). 365 in Arnoldi method. 142-143 equivalence transformation. 136-137 right. 27 flam (floating-point add and multiply).155 eigenpair. 200 residual. 141-142 perturbation bound. J55 characteristic polynomial. 5-6 right. 317 in Lanczos algorithm. 102 implicit double shift. me . 368. 345 Chebyshev polynomial. 27 Ericsson. 195-200. 130 left. 45^6.. 296 Frobenius norm. 296 background. 328 exponent exception computing bidiagonal QR shift. W. 326-328. 365 Ritz values as roots. 137 perturbation bound.. 317.285 Eisner. 128. K. 223 computing inertia.231 eigenvector. 317 Fischer's theorem. see Householder transformation Eisner's theorem.365 Leja points. R. J.

198. 235 harmonic Rayleigh-Ritz method. H.227. H. 147-148. 155 order of eigenvalues. 112 eigenvalues by bisection.193 diagonal similarities. 303 B variants. 293-294 from Krylov decomposition. 203 types of grading.201 Vol II: Eigensystems 457 Goldstine. M.. A. 40-41 Gerschgorin disks.336 screening of spurious eigenpairs. 365 Greek-Latin equivalents. 133 condition. 203 Golub. R. 134-135 simple eigenvalue. 202 Jacobi's method. 421 Grimes.411 computation.INDEX Hessenberg-tri angular form. 113. 135 shifted pencil. 203 singular value decomposition. 52. 52 Givens rotation. 39. 236-237 computation via the CS decomposition. G. 294. 13.379-380 bounding the residual. M. generalized eigenvalue problem. see Gram—Schmidt algorithm Gu. 140 nonregular pencils. 201. W. 370 generalized singular value decomposition.. 143. 108-110 bidiagonal QR algorithm. 170 also see plane rotation Givens. 108 Gram-Schmidt algorithm.201. 108. 131 residual. 228 Gutknecht. 307. J55 also see generalized eigenvalue problem... 227-228 Jacobi's method.. 372 gsreorthog. 112 balancing for eigenvalue computations. H. 110 and the QR algorithm. 369 S/PD pencils. G. S. 136. 294 Rayleigh quotient. 130-131 pencil. 55. 314-315 Hermitian matrix. 143 generalized shift-and-invert transform. 293. me reduction to ordinary eigenproblem. 367 graded matrix. 304 modified. 144-147. generalized eigenvalue problem. 292 Hermitian matrix. 131 regular pencil. 136. 133 Weierstrass form. 293 Hermitian matrix eigenvalue. me infinite eigenvalue.. 296. 296 harmonic Ritz pair. 294 exact eigenspace. R. 236-237 relation to the S/PD eigenproblem. 380 right and left eigenvectors. eigenvalue. 369 backward error. 370 matrix-vector product.200 and balancing. 23. 133 real generalized Schur form. 367 gsreorthog. 369. 367 Hadamard. 110 Handbook for Automatic Computation: Linear Algebra. M. 170.. 39 Gerschgorin.264. J. me Rayleigh quotient. 132 generalized Schur form. 132. see eigenvalue Gerschgorin's theorem. H. 134. 198. eigenpair. 227 S/PD eigenproblem. 131 algebraic multiplicity. 294 ordinary eigenproblem. 236 geometric multiplicity. 296 Rayleigh quotient preferred to harmonic Ritz value. 293 generalized eigenvalue problem.24.359 Grcar.. J. 368-369 regular pair. 368. 42-43 . 202 eigenvectors by the inverse power method.52 condition. 138 real generalized Schur form. 130 perturbation theory. eigenvector QZ algorithm.

123 applying transformations to deflated problems. 23.156 operation count. 123 detecting negligible subdiagonals. 87. 116. 203 Hotelling. 111. 254. 98-100 convergence. J. 87 operation count. 157 also see symmetric matrix Hessenberg form. 116. 98-100 eigenvalues only. 120-123 stability. L.. D.380 Hilbert. 23 Horowitz. 94 normwise criterion. me invariance of Hessenberg form. 42 eigenvector. 98 forward instability. 95 double shift algorithm. 118-123. 13 residual bounds for eigenspaces. US.112. 92-94 operation count. me nonorthogonalreduction. 86 first column of the accumulated transformation.. 380 Higham.105 applying transformations to deflated problems. D. 279 hump. 126 explicit shift algorithm. 12.B±. 105-106. 94-95 backward stability. N. 128-129 transmission of shifts.265 residual bounds for eigenvalues. 158 uniqueness of reduction. 96-97 stability.H. R.112 aggressive deflation. 82-83 reduction of a vector to multiple of e n .169 Horn. 95-96 double shift algorithm. accumulating transformations. 100 searching for negligible subdiagonal elements. 112 implicit double shift. A. 87 stability. 82-83 reduction of real vectors. 123 QR step. 116 unreduced.458 Fischer's theorem. 81-82 reduction of a vector to multiple of ei. 123 operation count.265 sep.346 Francis double shift. 263-264. 115. 100 operation count. 51 special properties. S. 129 deflation.. 95-96 basic QR step. 83 reduction of row vectors. 92. 115 deflation. 43. 147 stability.148 implementation issues. 42 min-max characterization. 41 Hoffman-Wielandt theorem. 147 Higham. 43 interlacing theorem. 111 implicit Q theorem. 325. 41 perturbation.. 42 principal submatrix. 83-88.. 94 complex conjugate shifts.. 23 Hoffman-Wielandt theorem. A. 69 Householder transformation. 24. 111 of a symmetric matrix. 94-95 relative criterion. J. 118-119 implicit Q theorem. 82 Householder. 118 real Francis shifts. 123-126 INDEX 2x 2 blocks. 87. 112 multishift. 144-147.128 general strategy. 100. 94-95. 92 stability. 262-263. the.110. 111 product with a matrix. 97.. 119 starting. 8! Householder reduction.112 computation. 34 no-ni . 86 overwriting original matrix. P. 97-98 Hessenberg-triangular form. me Hessenberg QR algorithm ad hoc shift. 112 Wilkinson shift.228.

411 with harmonic Ritz vectors.265 Jacobi's method.. 409 partial Schur decomposition. 420 459 Davidson's algorithm. 416-417 Krylov sequence methods.265 Johnson. 396. 21. P.INDEX /. I. 191-193 operation count.405. 416 stopping. S.20 powers of.25 Jordan block.36 assessment. 191 invariance under congruence transformations. In (identity matrix). 7. 22 sensitivity of eigenvalues. u) (Krylov subspace). 267 Vol II: Eigensystems . 347. 394. 202 convergence. 407 residual computation. 423 implicit Q theorem. 170. 415-416 extending partial Schur decompositions. 66-67 symmetric band matrix. Z. 21 Jordan. 203 parallelism. 69 convergence. 42 interval bisection.. 70-71.170 backward error. 226. 201. 412-414. 416 preconditioning..118 inertia.264 Kk(A. E. 56 effects of rounding error. G. 404-405 exact solution of correction equation. 190 of a tridiagonal matrix. 411—412 Hermitian matrices. 195-200 use of optimal residuals. 202-203 and graded matrices. 192-193 inf (X) (smallest singular value of X).202. 419 derivation from approximate Newton method. 296. 38 uniqueness. 67 residual computation. 417-418 Olsen's method. 205 oo-norm. 417-418 correction equation.24-25. 193 invariant subspace. J. 423 f (cross matrix).. 295. 203 quadratic. 23 Jordan block.. 414-415 ill conditioning. 267 fck(A. C. 155. u) (Krylov matrix). see norm interlacing theorem. 404. 395 Jensen. 190. 409 orthogonalization. 407 retarded by inexact solution. 203 Jacobi. JU6.296 Jiang.. 202 cyclic method. 66. see Jordan canonical form Jordan canonical form. 412 shift parameters. 203 rediscovery. A. 419-420 operation count. 203 as preprocessing step. 71 Ipsen. 191 computed by Gaussian elimination. R. 405 harmonic Ritz vectors. 68-69 implementation. 415-416 restriction on choice of shifts. 202 threshold method.419-420 compared with approximate Newton method.420 iterative solution of correction equation. 408-409 restarting. R.420 Jacobi-Davidson method. 406 convergence testing. 405-406 solving projected systems. C. 34-35 principal vector. 407-411 focal region. 409 updating the Rayleigh quotient. 68 dominant eigenvector. 201 and LU decomposition.. 414-415 storage requirements. 409 Jennings. 408-409 convergence to more than one vector. 416-418 free choice of shifts. 192 stability. 415 multiple right-hand sides. C.403. 380 Jia. see eigenspace inverse power method. 22. C. 70.381.

277 non-Hermitian matrices. 273-274 deflation. 266.. 268 Krylov matrix. 349. 298 . 266 degeneracy in Krylov sequence. 351 updating. 53.. 316 computation of harmonic Ritz vectors. A. 268 block. 297.52 Kline. 335 economics of transforming. 312-314 similarity. 280. Lanczos algorithm Krylov. 269-272 square root effect. 356-358 filter polynomial. 277 compared with power method. 269. Krylov decomposition. 268 and similarity transformations. N.307 Rayleigh quotient. 367 computing residual norms.. 278 non-Hermitian matrices convergence. 363 estimating loss of orthogonality.279-280. 279. 348 as iterative method. 272-273 to inferior eigenvector. 275-276 polynomial representation. 267-268 invariance under shifting. 301 also see Arnoldi decomposition. 266 convergence. 365 essential tridiagonality of Rayleigh quotient. 309 Rayleigh quotient. 155 Krylov decomposition. G. 365 block.227. 280 Kato. 2 Lanczos algorithm. 110 Kogbetliantz. V. 277.365 Krylov-spectral restarting. 352.315 assessment of polynomial bounds. 297. L. 310 orthonormal. 267 Krylov sequence. 350 Lanczos decomposition.365 stability. W.J. 315 and Rayleigh-Ritz method. S. 311-312 nomenclature. 278 block Krylov sequence.279 multiple eigenvalues.418 Krause. 273 to the subdominant eigenvector. 23 A(A) (the spectrum of ^4).N..280 polynomial bounds. 276 and scaling. 352 Chebyshev polynomial.264.309. 277 convergence. 365 convergence and deflation... 311 Krylov subspace..201. 314 computation of residuals.. 110 Lagrange. 273 to interior eigenpairs. 268 termination. N. 52 Kronecker product. 312-314 equivalence.280. 268 and characteristic polynomial. 279 and eigenspaces. 23 K6nig. 309 reduction to Arnoldi form. 277 convergence. 279 Kublanovskaya.. 365 Leja points..T. 351 global orthogonality. 314-315 computation of refined Ritz vectors. 351 implicit restarting. 24 Kronecker. 307 Kahan. 267. 351-352. G.156. 362-363 and characteristic polynomial..460 INDEX elementary properties. 274-275. 309 to an Arnoldi decomposition. J.170.3J5 adjusted Rayleigh quotient. 269 uniqueness of starting vector. E.279 and defective matrices. 273-274 polynomial bounds. 306 uniqueness.-L. M. 310 translation. 365 full reorthogonalization. 265 Kaniel. 274 multiple eigenvalues.367 computation. 228 Kollerstrom. 352 error analysis.

362-363 also see Lanczos algorithm. 156 2x2 blocks in double shift QR algorithm. 237 differential qd algorithm. 280 selective reorthogonalization. 365 Lanczos decomposition.365-366 example. 355-356 size of reorthogonalization coefficients. 358-362. Lanczos algorithm. 279 no reorthogonalization.346 Lewis. 228 CS decomposition. 365 and semiorthogonality.365 LAPACK. 228 divide-and-conquer algorithm for the spectral decomposition. 201 storage of band matrices. 350 reorthogonalization. G. 355 residual norms.INDEX Lanczos recurrence. 233 large matrices. 347 contamination by reorthogonalization. 170 Wilkinson's algorithm. 365-366 software. 348 local and global. Lanczos algorithm. 345. 308 storage considerations. 350. 239 storage of eigenvectors. 171 reduction to tridiagonal form. see Lanczos algorithm Lanczos recurrence. see Lanczos algorithm Lanczos. 350. 239 storage of sparse matrices. 2 Lehoucq. 346 multishiftQR algorithm.315 loss of orthogonality. 366 partial reorthogonalization. 235 spectral decomposition downdating. 102 exchanging eigenblocks. 355-356 Ritz vectors. 350 loss of orthogonality. 350 difficulties. 129 ad hoc shift in QR algorithm. 105 bidiagonal QR algorithm. 128. 170 S/PD generalized eigenvalue problem. 307-308. Lanczos algorithm. 350 method of minimized iterations. R. no reorthogonalization. 349. 366 periodic reorthogonalization. 366 software. 349-350 updating adjusted Rayleigh quotient. 328. 366-367 termination and restarting. B. 353 QR factorization. 353. partial reorthogonalization. 239 latent value. 348. full reorthogonalization. J. 350. 353-354 Vol II: Eigensystems 461 Rayleigh quotient.348 alternative.. 112 2x2 blocks in QZ algorithm.365-366 and reorthogonalization. 202 eigenvectors of triangular matrices.129 PWK method for the tridiagonal QR algorithm. 366 Rayleigh quotient. 364 modified Gram-Schmidt algorithm. 24 Le. 359 software.. 187 tridiagonal QR step. 367 . 307. 346 left eigenpair. periodic reorthogonalization. 348-349 general algorithm. Lanczos algorithm. C. 365-366 semiorthogonality.. 308 local orthogonality. 202 eigenvectors by the inverse power method. 308 operation count. 358-359 effects of relaxing constraint. 354 singular value decomposition. 350 thick restarting. 358-359 contamination of the Rayleigh quotient. selective reorthogonalization restarted. 315.. J. 279. 183 eigenvalues by bisection.

36 consistent vector norm. R. 33 the hump.33.. 33-36 behavior when computed. 193. 27 and singular values. 12-13 nonnormal matrix. 27 unitary invariance. 279 Moler. 236 More. 28 Martin. 6. 129. 418 Newton-based methods.. see convergence ratios LR algorithm. 31-33. 30 examples. 29 equivalence of norms. 28 and spectral radius.. 235 local convergence ratio. 26 family of norms. S. 25 operator norm.S.. Y. 28. 110. 424 Nash. K. 36 spectral radius bounds. 424 unit triangular.. 27 co-norm.27 limitations..394 convergence. 418-419 derivation by perturbation expansion. 423 zero. 380 method of minimized iterations. 423 trapezoidal. 36 l-norm. 23 identity. 113 Mathias. 367 INDEX M(X) (null space of X).345 multiple eigenvalue.27 2-norm and Frobenius norm. B. J..36 compatable norm. 27 and largest singular value. 296. 418 relation to the QR algorithm. 423 sum and product.12 multiset. 65 tridiagonal matrix. 423 element. 29. C. 395-396 history. K. 423 permutation.35-36. A. 26 and 2-norm.6. B. 381 Nielsen. 205 trace characterization. 24 Newton's method. 423 history. trapezoidal. 28 examples.43 matrix norm. 27 properties. 418 relation to Rayleigh quotient method. 403. 26 Frobenius norm. 395 application to eigenproblem. 26 absolute norm. 156 Moon. 111 LU decomposition. R. 2 algebraic multiplicity..12 geometric multiplicity. Z. 191 Lui. 36 consistent norm..265 matrix. 422 symmetric.. 25. 418-419 Newton. P. 27 rounding a matrix. 26. 71 norm.. R.425 and inertia. 30 Euclidean norm. 423 diagonal. 27 of orthonormal matrix. 2 Murray. 423 order. 27 vector norm. 423 orthonormal. 423 matrix powers (Ak). 29-30 construction by similarities. S. 26 unitary invariance.26.. 424 unitary. 421 cross. 4.J. 203 .235. 205 matrix norm. 58-59 Meerbergen. 201 nilpotent matrix. C. 421 full rank. 265 Morgan. I.462 Li. 422 orthogonal. R J. 34 matrix-vector product effects of rounding error. C.. 424 triangular. 425 Hermitian.

. 181-182 eigenvectors of a triangular matrix. 88-89 plane rotations Givens' method. 24.296. 26 normal matrix. 217 reduction to Hessenberg-triangular form. see norm. C. see generalized eigenvalue problem pentadiagonal matrix.. 71 overflow. 88. 70 power method. 425 Powell. eigenvalue.280. M. S/PD generalized eigenproblem Peters. 92 bidiagonal QR step. 65 . 380 null space CV(X)).279.169. 91 Arnoldi process. Hermitian matrix. 162 Vol II: Eigensystems 463 spectral decomposition updating. 60-62 deflation. 89-91 operation count. relative and structured perturbation theory. see eigenpair. 69-70. 102 explicit shift Hessenberg QR algorithm. 108 band tridiagonalization.420 Petrov method. 295. 346.. 58. 86 Householder-matrix product. 179-181 Wilkinson's algorithm for S/PD eigenproblem. simple eigenspace. 232 orthogonal similarity. 418.INDEX spectral norm. 147 reduction to tridiagonal form. 380 pencil. 91 application. A..60 convergence. 57 convergence criteria. 424 Oettli. 111 and Wilkinson diagrams.W. 167 implicitly restarted Arnoldi. 14 eigenvectors. E. 88 introducing a zero into a 2-vector. 36. 425 positive semidefinite matrix. 86-87 deflation by inspection.. 26 unitarily invariant norm. 89 in the (e. 89 generation. 189 basic QR step in Hessenberg QR algorithm. 308 QZ algorithm.228. N. M. see exponent exception Paige.91 exponent exception. 2-norm subordinate norm... 70 effects of rounding error. 419 operation count application of a plane rotation. 237. 192 Jacobi-Davidson method. 365. 366 Parlett. 14 eigenvalues.265. see Rayleigh-Ritz method plane rotation. J. 112 Ostrowski. 347. D.296. 409 Krylov-Schur restarting. 320-323 inertia of a tridiagonal matrix. 220 complex arithmetic. eigenvector. generalized eigenvalue problem. see unitary similarity Osborne. 100 Householder reduction to Hesssenberg form. 367.and postmultiplication. 304 balancing. B. 150-152 reduction to bidiagonal form. 185 divide-and-conquer algorithms. 91 pre. 14 Nour-Omid. 70 Olsen. 27 triangle inequality. 424 nullity [null(A)]. 186 perturbation theory. 347.. 226 implicit double shift QR step.170. 82 hybrid QRD-SVD algorithm.266. 170 positive definite matrix. singular value decomposition. 329 Lanczos recurrence. 56. 315. j)-plane. 123 implicit tridiagonal QR step. J. C.381 assessment. 379. B. E. 88 introducing a zero into a matrix. G. 107 divide-and-conquer algorithm for the spectral decomposition. 27 vector norm. 66 choice of starting vector..

73-74 multiple eigenvalues. 421 R n (real n-space).153 eigenvalues only. Hessenberg QR algorithm. 1. 343-344 potential instability. 421 R(X) (column space of A"). 76 convergence. 75 error bounds for a QR step. 421 jpmxn (Space Of reai m x « matrices). 63 shifting. 70 primitive Ritz vector. 347 no free lunch. 80-81 QL algorithm. W. 155 deflation. 424 rank. 78-79 disorder in eigenvalues.170. 111 alternate shifts.464 implementation. 80 rates of convergence. 70 for generalized eigenvalue problem..55. 403. 111 relation to Newton's method. see Jordan canonical form projector.396 quadratic convergence. 152 operation count. 69. 69 relation to Newton's method. subspace iteration. 111. 427 proper value. 426-427 and QR algorithm. see Rayleigh-Ritz method principal vector. W. see QR algorithm quasi-triangular matrix.70 accuracy of. 343 Rayleigh (J. 381-382 Prager. 115 computing eigenvectors. 75 quadratic. 110 impracticality on a full matrix. 150-152 QZ step. 70 with two vectors.156 2x2 blocks.418 INDEX Wilkinson shift. 424 Raphson. 394. QL. 110 QR step. 47. tridiagonal QR algorithm QR decomposition. Strutt). 150. 309 in Lanczos decomposition. 152 starting the double shift.138 for subspaces. 155 choice of double shift. 72. 77-80 history. 110 QL algorithm. 7_I unshifted.156 E (real numbers). 113. 59 operation counts. 24 pseudospectrum. 71 Rayleigh quotient shift. 80 variants (RQ.77. 418-419 relation to power method.63. 72-73. 156 stability.295 Rayleigh quotient. 252 in Arnoldi decomposition. 58 Rayleigh quotients. QZ algorithm. LQ). see QR algorithm . 73 relation to inverse power method. 153-155. 75 also see bidiagonal QR algorithm. 306 Rayleigh quotient method. 77-80 convergence to Schur decomposition. 418 Rayleigh quotient shift. 418 rational Krylov transformation. 70. 342-344. 260 and optimal residuals. 396 as factorization of a polynomial. 64. 126-128 QZ algorithm. 79 equimodular eigenvalues. 75 cubic. 111 shifted.170 QR step. 136. 301 in Krylov decomposition. 147-148. 24.169. 64 Markov chain. 148 computation of eigenvectors. 36-37 qd-algorithm. 169. 75 unshifted algorithm. 111. 148-152 real shifts. see QR algorithm QR algorithm. 148 treatment of infinite eigenvalues. 58-59 rate of convergence. 69 normalization. 71. 113. 80 implications for the shifted algorithm.

285-287 Ritz vector. 282 Ritz value. 281 failure. 123-126 computation by Householder transformation. 344-347 scalar. 43. 295 residual bound. 421 Schnabel. 284 Petrov method. 281 harmonic Rayleigh-Ritz method. 203. 293. 282 projection method. 291 cross-product algorithm. see triangular matrix. 283 optimal residual. 295 rounding unit (eM). 113 relative and structured perturbation theory.. 286-288 residual. B. 284 orthogonal. 156 Reinsch. 288 Ritz basis.128 computation by double shift QR algorithm. 295 oblique.265 computation in the inverse power method. A. 115 computing eigenvectors.. 71 and Jordan form.. 28 in eigenvalues. 284. 279. 38 real Schur form. H. me history. 98-100 computing eigevectors. see column.295 Hermitian matrix. 295-296 computation. 314 convergence. J. 282 Ritz pair. 282-283 Galerkin method. 282 Ritz space. K. 159 packed storage.295. 282 Ritz vector. 126-128 in subspace iteration. Y.253. 288 Ritz value. 264 resolvent. computing eigenvectors partial. 12. 391 refined Ritz vector.and row-oriented algorithms Ruhe. 61_ backward error. 1. 287 exact eigenspace. 295-296 general procedure. 24. 286-287 uniform separation condition. 22. 289-290 use of Rayleigh quotient. 380 right eigenpair. 394. 252. 379. 289 Ritz blocks and bases..C. 63. 252. W. 407 Schur vector. 66-67 convergence testing.. 283 choosing a primitive Ritz basis. 159 in two-dimensional arrays. 5-6 perturbation of simple eigenvalue. 195-196 optimal. 2 Ritz pair.280.231 normwise. 296 from Krylov decomposition. L. 478 Schur decomposition. 280-281 real matrix eigenvectors. 291-292 Vol II: Eigensystems 465 basic algorithm. J. 113. see Rayleigh-Ritz method Ritz. 27 row orientation. 367.295..381. 159 residual. 282 separation condition. 126.55. 289.280. 54 relative error. 28 representation of symmetric matrices..396 and Lanczos algorithm.380 Rutishauser. 282 convergence. 70. 291 compared with Ritz vector. 347. 62. 23. 291-292. 23.INDEX Rayleigh-Ritz method. 290 Reid. 282 primitive Ritz vector. 247 reverse communication. 24 by rowwise deflation. R. 110. 345 Rigal.10-12. 348 as a generalized eigenvalue problem. me also see real Schur form . 283 two levels of approximation. 295-296 primitive Ritz basis. 72 computation by explicit shift QR algorithm. 61. 395 Saad. 282 Ritz block.

378-379.15 Schur.255 and physical separation of eigenvalues.143. 366-367 singular value. 226 Jacobi methods. 206. 157. 204. 259 condition of an eigenblock. 204 perturbation theory. 208 second-order expansion for small singular values. 246.204 and 2-norm.53-54. 365-367 simple eigenspace. 251 biorthogonal basis for left eigenspace. 261 main theorem. 375-379.. 228 Lanczos algorithm. 9 unitary similarity. 265 accuracy of the Rayleigh quotient. 261 bounds in terms of the error. 24 Schwarz. J. 46. 365.. 206 ordering convention. 204. 204 and 2-norm. 260.380 assessment. 244 angles between right and left eigenespaces. 379 Scott. 209 condition. 205 effects of rounding. 204 singular value decomposition.256 Hermitian matrices. 258 normalization of bases. 376-377 growth. 265 INDEX bounds for eigenspace. 66 factorization. I. 379 geometry. 254-255. 257 limitations. 229 two-sided method of Kogbetliantz.205 and Frobenius norm.466 Schur vector. 378.226-227 analogue of Rayleigh quotient. 380 purging. 245 uniqueness. 245 corresponding left eigenspace.. 245-246 complementary eigenspace. 376 defective matrix. 206. 8 eigenvalues and eigenvectors. 201 Scott. 259 residual bounds. 67-68 <ri(X) (the ith singular value of X). R. me first-order expansion. 9 invariance of trace.247 also see eigenspace Simpson. 257 resolvent. 225 downdating. 67 solving linear systems. H. 51 properties. 376 effect on Ritz vectors.380 rounding-error analysis. 418 singular value. H. 228 history. 346. 246 standard representation. 247 spectral projector. 375-376 null-space error. 55. me . D. 245 several eigenspaces. 228-229 one-sided method. 12. \2 relation to eigenvectors. 10 Simon. 24 semidefinite 5-Lanczos and Araoldi algorithms. 211 min-max characterization. 244-246 several eigenspaces. 380 effect on Rayleigh quotient. 50-51. 204 similarity transformation. 245 deflation of an approximate eigenspace.. 228 divide-and-conquer algorithms. 245 block diagonalization.. 204 singular vector. 205 and eigenvalues of the cross-product matrix. 226 <ri(X) (the ith singular value ofX). 260 condition of an eigenspace. 395 secular equation.265 perturbation theory. 205 differential qd algorithm. S. 380 sep. 247 spectral representation. A. 27 and spectral decomposition of cross-product matrix.265 alternate bound. me singular value factorization. 256 shift-and-invert enhancement.

2-norm spectral projector. 296. 176-177 spectral enhancement. 31-33. 231 eigenvector. 231-233. 209-210. G. 87 Stathopoulos. 201. 229 eigenvalue.234-235 product problem (AB). 173-175 tradeoffs. C. 66. 4]9 VolII: Eigensystems . 232 storage and overwriting. 316.. 171. 226-227 condition. 346 Steihaug. 235 Chandrasekaran's method. see singular value decomposition singular vector.. 365 span(^f) (the space spanned by X}. 177-181 stability. 380 SRR. 171-173 accumulating transformations. G. 230 generalized shift and invert transformation. me first-order expansion.205 left. see skew Hermitian matrix Sleijpen. 205 skew Hermitian matrix. 230 banded problems. 234 467 use of spectral decomposition of B. 177-178 deflation. 229 Wilkinson's algorithm. 231 well conditioned eigenvalues not determined to high relative accuracy. 181 basic algorithm. 206-210. 210 right. 424 sparse matrix. 14 skew symmetric matrix.. 346. J_3. 157. 346.229 B-orthogonality. 230 perturbation theory. 27 effects of rounding. 36 spectral representation. 173 secular equation. 234 no infinite eigenvalues. 53 spectral radius. 204 perturbation theory. 231 projective representation. 210 separation of singular values. 414-415 Sorensen. 158 and 2-norm. see norm. D. 229 diagonalization by congruences. 414-415 ill conditioning. A. 2 Spence. me spectral norm. L. 179-181 reduction to standard form. 370 ill-conditioned B. 3^ norm bounds. 64 shift-and-invert enhancement. 232 spectral decomposition. 231 ill conditioned. 379 also see generalized shift-and-invert transformation spectrum. 205 singular value factorization. 345.INDEX uniqueness. A.239 S/PD generalized eigenproblem. 175 solving the secular equation. 179 the naive approach. see simple eigenspace spectral transformation. T. 175 operation count. 204 and eigenvectors of the cross-product matrix. 235 S/PD pencil. 208 left singular vector. 204 and eigenvectors of the cross-product matrix. 210 right singular vector. 230 condition number. 231. see spectral decomposition updating spectral decomposition updating. Schur-Rayleigh-Ritz refinement stability in the ususal sense. 235 congruence transformation. 232-233 operation count.420 solution of projected systems.. 58.235 computation of R~TAR~l. 201 assessment. 233-235 use of pivoted Cholesky decomposition of B. 211-212 updating. 171 computing eigenvectors.. 157. 415 multiple right-hand sides. see subspace iteration. 14 eigenvalues.

157 also see Hermitian matrix symmetric positive definite generalized eigenvalue problem. 316. 382. 186 . 104 tridiagonal matrix.382. 382 freedom in dimension of the basis. 102 stability and residuals. 386-387 orthogonalization. 388 dependence in basis. W. T. 394 real Schur form. D.384-385 frequent with shift and invert enhancement. 201 Taussky. 383 Schur-Rayleigh-Ritz refinement. 389-391. 170.37 triangle inequlity. 158. 52 Taylor. 391-392 Treppeniteration.. 189 stability. 391 Chebyshev acceleration. 265.295.. 395 economization of storage. 386. 201 limitations. J.394 when to perform. J. 394 and QR algorithm. 189 band width. W. 186 computing eigenvalues and eigenvectors. me representation. 380 trace as sum of eigenvalues. 388 convergence theory.y) (canonical angle matrix)..264.394 shift-and-invert enhancement. 159 packed storage. 203 Tang. 384 deflation.156. 10 trace(^) (trace of A). 24. 345. 18.52. 0. 395 Sturm sequence. 189 operation count. see S/PD generalized eigenproblem symmetric tridiagonal matrix.201 symmetric band matrix. 2.128 conditions for solution. 389. J.394 convergence. 16-17 Kronecker product form. 390 software. 390-391 when to perform.202 subspace iteration. 100-104 operation count. 187 storage. 393-394 Rayleigh-Ritz method. 52 SVD. V. 186 eigenvectors. 395 Stewart. 190. see tridiagonal matrix systolic array. J. 159 in two-dimensional arrays. 24. 24 numerical solution. 9 use in debugging. 385 convergence testing. 158 special properties. 381.24 Sylvester operator. N. see norm triangular matrix computing eigenvectors. 248 Toumazou. 16 Sylvester. 102-104 computing left eigenvectors. P. 422 Trefethen.237.-G. 394.155. 23. R. 367 Q(X...346. 186 representation. see eigenvectors of a symmetric band matrix pentadiagonal matrix. 386 superiority to power method..382. G. 384 defectiveness. 394 Sun. 395 symmetric matrix. see singular value decomposition Sylvester's equation. 9 invariance under similarity transformations. 391 Schur-Rayleigh-Ritz refinement. P.. 392-394.386.. 187-189. 186 symmetric matrix inertia. L. 24. 384 to dominant subspaces. 384 general algorithm.16. 387-388 stability.186 Stewart.. 159 simplifications in the QR algorithim.468 INDEX band tridiagonalization.

Z. 423 zero.. 164 graded matrix. K. 167. 168-169. H. 169 local convergence.201-202 accuracy. 345 Zhang. 70 Wilkinson.129. see norm unitary matrix eigenvalues.163.. 423 von Neumann. 23. 170 Householder's reduction to. 203 Vu. 193-195. 169 explicitly shifted. 421 component.170 using symmetry. A...170 Twain. 365 Yang.. 162.24.. see exponent exception Underwood.53. 14 unitary similarity..170 implicitly shifted QR step. 195 use of fast root finders. 160 Wilkinson's contribution. 164-167 operation count. QR algorithm. S.237 vector. 20 underflow. 10 backward stability.R. 170 deflation. 162 stability. 195 Givens' reduction to..235. R. 192 stability. 156 Watkins.365. 264.170 Wilkinson shift.418 Wu.23.189 first column of the transformation. 170-171 combination of Rayleigh quotient and Wilkinson shifts. 23. 280. 111-113.346 Weierstrass form. 159-162. Mark. 155 Wielandt. 421 unit.. D. 132 Weierstrass. 1. 160 tridiagonal QR algorithm. 112. 167 PWK method.170 graded matrix. J.345 469 Ward.H. 169.168.INDEX assumed symmetric. 8J. 162-163 representation. 192-193 making a Hermitian matrix real. R. 167.229. C.. 71. see Hessenberg QR algorithm.F. 158 calculation of a selected eigenvalue. 70. 169. C.. K. 162 operation count. 23.52. 167 variations. 116 van der Vorst. 168-169 detecting negligible off-diagonal elements. 128. 296. 367 unitarily invariant norm. C. 424 Wilkinson shift. 346.346. 191-193 operation count. H. J. 111. 169 QL algorithm. 170. 170 inertia.171 stability.. 10 VolII: Eigensystems unreduced Hessenberg matrix. 420 Wilkinson diagram.156.169.420 Van Loan.394. 167-168 Rayleigh quotient shift.170. 170 . tridiagonal QR algorithm Wilkinson.

Sign up to vote on this title

UsefulNot useful- Matrix Algorithms Volume 1 Basic Decomposition
- Numerical Methods for Large Eigenvalue Problems
- Harmonic Oscillator by Jawad
- Helix Coil Theory
- Basic Gas Chromatography
- The Fundamental Unit of Life
- Liquid Solid Chromatography Mechanism of Retention
- History of Science
- Internal Crystal Geometry.
- Hydrogen Atom & Periodic Table
- Applied Mathematics
- Induced Currents
- Light
- Graphs of Polar Equations
- Calculus Proof of the Pythagorean Theorem
- Exploring Chemistry With Electronic Structure Methods
- Fun With Emc2
- Identical Particles
- IdeaMath 2007 2008
- Vol 2 Ch 33 - Surface Reflection
- Elliptic Functions
- Pulsar Pursuit
- Gas Solid Chromatography on Ion Exchange
- Black Holes
- Neutron Star
- Special Angles
- null
- Formulas
- Matrix Algorithms Volume II Eigensystems~Tqw~_darksiderg