P. 1
Matrix Algorithms Volume II Eigensystems~Tqw~_darksiderg

Matrix Algorithms Volume II Eigensystems~Tqw~_darksiderg

|Views: 882|Likes:
Published by nijo00

More info:

Published by: nijo00 on Jun 17, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

03/22/2013

pdf

text

original

Sections

  • Algorithm 2.1: Generation of Householder transformations
  • Algorithm 2.6: Finding deflation rows in an upper Hessenberg matrix
  • Algorithm 2.8: Schur form of an upper Hessenberg matrix
  • Algorithm 2.9: Right eigenvectors of an upper triangular matrix
  • Algorithm 3.3: Finding deflation rows in a real upper Hessenberg matrix
  • Algorithm 3.4: Real Schur form of a real upper Hessenberg matrix
  • Algorithm 2.1: Rank-one update of the spectral decomposition
  • Algorithm 23: Divide-and-conquer eigensystem of a symmetric tridiagonal matrix
  • Algorithm 3.1: The cross-product algorithm
  • Algorithm 4.1: Solution of the S/PD generalized eigenvalue problem
  • Algorithm 3.1: Lanczos procedure with orthogonalization
  • Algorithm 3.3: Periodic orthogonalization I: Outer loop
  • Algorithm 3.4: Periodic reorthogonalization II: Orthogonalization step
  • Algorithm 2.1: The basic Jacobi-Davidson algorithm
  • Algorithm 2.2: Extending a partial Schur decomposition
  • Algorithm 23: Harmonic extension of a partial Schur decomposition

Matrix Algorithms

This page intentionally left blank

Matrix Algorithms
Volume II: Eigensystems

G. W. Stewart
University of Maryland College Park, Maryland

Society for Industrial and Applied Mathematics Philadelphia

siam

Copyright © 2001 by the Society for Industrial and Applied Mathematics. 1098765432 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688. MATLAB is a registered trademark of The MathWorks, Inc., 3 Apple Hill Dr., Natick, MA 01760-2098, USA, tel. 508-647-7000, fax 508-647-7001, info@mathworks.com, http://www. mathworks. com. The Library of Congress has catalogued Volume I as follows: Library of Congress Cataloging-in-Publication Data Stewart, G. W. (Gilbert W.) Matrix algorithms / G. W. Stewart. p. cm. Includes bibliographical references and index. Contents: v.1. Basic decompositions ISBN 0-89871-414-1 (v. 1 : pbk.) 1. Matrices. I. Title. QA188.S714 1998 512.9'434-dc21 0-89871-414-1 (Volume I) 0-89871-503-2 (Volume II) 0-89871-4184 (set)
siam ELLEUTL is a registered trademark.

98-22445

CONTENTS
Algorithms Preface 1 Eigensystems 1 The Algebra of Eigensystems 1.1 Eigenvalues and eigenvectors Definitions. Existence of eigenpairs: Special cases. The characteristic equation and the existence of eigenpairs. Matrices of order two. Block triangular matrices. Real matrices. 1.2 Geometric multiplicity and defective matrices Geometric multiplicity. Defectiveness. Simple eigenvalues. 1.3 Similarity transformations 1.4 Schur Decompositions Unitary similarities. Schur form. Nilpotent matrices. Hermitian matrices. Normal matrices. 1.5 Block diagonalization and Sylvester's equation From block triangular to block diagonal matrices. Sylvester's equation. An algorithm. Block diagonalization redux. 1.6 Jordan form 1.7 Notes and references General references. History and Nomenclature. Schur form. Sylvester's equation. Block diagonalization. Jordan canonical form. 2 Norms, Spectral Radii, and Matrix Powers 2.1 Matrix and vector norms Definition. Examples of norms. Operator norms. Norms and rounding error. Limitations of norms. Consistency. Norms and convergence. 2.2 The spectral radius Definition. Norms and the spectral radius. 2.3 Matrix powers Asymptotic bounds. Limitations of the bounds.
v

xv xvii 1 1 2

6 8 10 15 20 23

25 25

30 33

vi

CONTENTS
2.4 Notes and references Norms. Norms and the spectral radius. Matrix powers. 3 Perturbation Theory 3.1 The perturbation of eigenvalues The continuity of eigenvalues. Gerschgorin theory. Hermitian matrices. 3.2 Simple eigenpairs Left and right eigenvectors. First-order perturbation expansions. Rigorous bounds. Qualitative statements. The condition of an eigenvalue. The condition of an eigenvector. Properties of sep. Hermitian matrices. 3.3 Notes and references General references. Continuity of eigenvalues. Gerschgorin theory. Eigenvalues of Hermitian matrices. Simple eigenpairs. Left and right eigenvectors. First-order perturbation expansions. Rigorous bounds. Condition numbers. Sep. Relative and structured perturbation theory. 36 37 37

44

52

2 The QR Algorithm 1 The Power and Inverse Power Methods 1.1 The power method Convergence. Computation and normalization. Choice of a starting vector. Convergence criteria and the residual. Optimal residuals and the Rayleigh quotient. Shifts and spectral enhancement. Implementation. The effects of rounding error. Assessment of the power method. 1.2 The inverse power method Shift-and-invert enhancement. The inverse power method. An example. The effects of rounding error. The Rayleigh quotient method. 1.3 Notes and references The power method. Residuals and backward error. The Rayleigh quotient. Shift of origin. The inverse power and Rayleigh quotient methods. 2 The Explicitly Shifted QR Algorithm 2.1 The QR algorithm and the inverse power method Schur from the bottom up. The QR choice. Error bounds. Rates of convergence. General comments on convergence. 2.2 The unshifted QR algorithm The QR decomposition of a matrix polynomial. The QR algorithm and the power method. Convergence of the unshifted QR algorithm. 2.3 Hessenberg form

55 56 56

66

69

71 72

75

80

CONTENTS
Householder transformations. Reduction to Hessenberg form. Plane rotations. Invariance of Hessenberg form. 2.4 The explicitly shifted algorithm Detecting negligible subdiagonals. Deflation. Back searching. The Wilkinson shift. Reduction to Schur form. Eigenvectors. 2.5 Bits and pieces Ad hoc shifts. Aggressive deflation. Cheap eigenvalues. Balancing. Graded matrices. 2.6 Notes and references Historical. Variants of the QR algorithm. The QR algorithm, the inverse power method, and the power method. Local convergence. Householder transformations and plane rotations. Hessenberg form. The Hessenberg QR algorithm. Deflation criteria. The Wilkinson shift. Aggressive deflation. Preprocessing. Graded matrices. 3 The Implicitly Shifted QR Algorithm 3.1 Real Schur form Real Schur form. A preliminary algorithm. 3.2 The uniqueness of Hessenberg reduction 3.3 The implicit double shift A general strategy. Getting started. Reduction back to Hessenberg form. Deflation. Computing the real Schur form. Eigenvectors. 3.4 Notes and references Implicit shifting. Processing 2x2 blocks. The multishifted QR algorithm. 4 The Generalized Eigenvalue Problem 4.1 The theoretical background Definitions. Regular pencils and infinite eigenvalues. Equivalence transformations. Generalized Schur form. Multiplicity of eigenvalues. Projective representation of eigenvalues. Generalized shifting. Left and right eigenvectors of a simple eigenvalue. Perturbation theory. First-order expansions: Eigenvalues. The chordal metric. The condition of an eigenvalue. Perturbation expansions: Eigenvectors. The condition of an eigenvector. 4.2 Real Schur and Hessenberg-triangular forms Generalized real Schur form. Hessenberg-triangular form. 4.3 The doubly shifted QZ algorithm Overview. Getting started. The QZ step. 4.4 Important miscellanea Balancing. Back searching. Infinite eigenvalues. The 2x2 problem. Eigenvectors. Vol II: Eigensystems

vii

94 104

110

113 113 116 117

128 129 130

143 147 152

viii
4.5 Notes and References The algebraic background. Perturbation theory. The QZ algorithm.

CONTENTS
155

3 The Symmetric Eigenvalue Problem 1 The QR Algorithm 1.1 Reduction to tridiagonal form The storage of symmetric matrices. Reduction to tridiagonal form. Making Hermitian tridiagonal matrices real. 1.2 The symmetric tridiagonal QR algorithm The implicitly shifted QR step. Choice of shift. Local convergence. Deflation. Graded matrices. Variations. 1.3 Notes and references General references. Reduction to tridiagonal form. The tridiagonal QR algorithm. 2 A Clutch of Algorithms 2.1 Rank-one updating Reduction to standard form. Deflation: Small components of z. Deflation: Nearly equal di. When to deflate. The secular equation. Solving the secular equation. Computing eigenvectors. Concluding comments. 2.2 A Divide-and-conquer algorithm Generalities. Derivation of the algorithm. Complexity. 2.3 Eigenvalues and eigenvectors of band matrices The storage of band matrices. Tridiagonalization of a symmetric band matrix. Inertia. Inertia and the LU decomposition. The inertia of a tridiagonal matrix. Calculation of a selected eigenvalue. Selected eigenvectors. 2.4 Notes and References Updating the spectral decomposition. The divide-and-conquer algorithm. Reduction to tridiagonal form. Inertia and bisection. Eigenvectors by the inverse power method. Jacobi's method. 3 The Singular Value Decomposition 3.1 Background Existence and nomenclature. Uniqueness. Relation to the spectral decomposition. Characterization and perturbation of singular values. First-order perturbation expansions. The condition of singular vectors. 3.2 The cross-product algorithm Inaccurate singular values. Inaccurate right singular vectors. Inaccurate left singular vectors. Assessment of the algorithm. 3.3 Reduction to bidiagonal form The reduction. Real bidiagonal form.

157 158 158

163

169

171 171

181 186

201

203 203

210

215

3 Perturbation theory The perturbation theorem. 2. Condition of eigenvalues. The generalized singular value and CS decompositions. 2. Negligible superdiagonal elements. Subspaces of unequal dimensions. Jacobi methods. Spectral projections. Block diagonalization and spectral representations.1 . 1. The cross-product algorithm. Existence of eigenspaces. 4 Eigenspaces and Their Approximation 1 Eigenspaces 1. Residual bounds. Combinations of subspaces.4 The QR algorithm The bidiagonal QR step. Block deflation. The differential qd algorithm. Uniqueness of simple eigenspaces. 1. 4. Downdating a singular value decomposition. Residual bounds for eigenvalues of Hermitian matrices. The definite generalized eigenvalue problem. Band matrices. The resolvent. Computation of R-T A R .2 Residual analysis Optimal residuals. Choice of subspaces.3 Notes and references Eigenspaces.CONTENTS 3. Deflation.1 Theory The fundamental theorem. Wilkinson's algorithm. Angles between the subspaces. 2 Perturbation Theory 2.2 Wilkinson's algorithm The algorithm.6 Notes and references The singular value decomposition. Backward error.2 Simple eigenspaces Definition.4 Residual bounds for Hermitian matrices Vol II: Eigensystems ix 217 226 226 229 229 231 234 239 240 240 244 247 248 248 251 258 262 . Final comments. Back searching. The quantity sep. Perturbation theory.1 Canonical angles Definitions. Bounds in terms of E. Uniqueness of simple eigenspaces. 4. The Rayleigh quotient and the condition of L. 3. 2. Computing the shift.1 Definitions Eigenbases.3 Notes and references Perturbation theory. The Rayleigh quotient.5 A hybrid QRD-SVD algorithm 3. Computing canonical angles. 4 The Symmetric Positive Definite (S/PD) Generalized Eigenvalue Problem 4. Reduction to bidiagonal form and the QR algorithm. The QRD-SVD algorithm. Divide-and-conquer algorithms. Ill-conditioned B.

Residual eigenvalue bounds for Hermitian matrices.3 Refined Ritz vectors Definition. Convergence of Ritz values revisited. Residuals and convergence. Elementary properties.1 Rayleigh-Ritz methods Two difficulties. 3.5 Notes and references General references.x CONTENTS Residual bounds for eigenspaces.1 Arnoldi decompositions The Arnoldi decomposition. Convergence (non-Hermitian matrices). 4. 4. Termination. Assessment of the bounds. Block Krylov sequences.4 Notes and references Sources. Refined Ritz vectors. Krylov sequences.2 Convergence The general approach. 4. The polynomial connection. Rayleigh-Ritz and generalized eigenvalue problems. Convergence of Ritz vectors. 2.3 Block Krylov sequences 3.5 Notes and references Historical.2 Convergence Setting the scene. Reduced Arnoldi decompositions.1 Krylov sequences and Krylov spaces Introduction and definition. The orthogonal Rayleigh-Ritz procedure. Uniqueness. Residual bounds for eigenspaces and eigenvalues of Hermitian matrices. 4 Rayleigh-Ritz Approximation 4. Harmonic Ritz vectors. 3 Krylov Subspaces 3. Chebyshev polynomials. Convergence of refined Ritz vectors. Non-Hermitian matrices. Multiple eigenvalues. Convergence (Hermitian matrices). Block deflation and residual bounds. Convergence redux. Convergence of Ritz blocks and bases. Residual bounds for eigenvalues. 4. The Rayleigh quotient.4 Harmonic Ritz vectors Computation of harmonic Ritz pairs. Convergence of the Ritz value. Backward error. Computing Arnoldi 297 297 298 . Canonical angle. 264 266 266 269 277 279 280 281 284 289 292 295 5 Krylov Sequence Methods 1 Krylov Decompositions 1. 3. Perturbation theory. Hermitian matrices. The computation of refined Ritz vectors. Optimal residuals and the Rayleigh quotient. Convergence.

3 Krylov decompositions Definition.CONTENTS decompositions. 3 The Lanczos Algorithm 3. Krylov decompositions. Comparison of the methods. Shift-and-invert enhancement and the Rayleigh quotient.4 Rational Krylov Transformations Krylov sequences. inverses. Convergence criteria. Reorthogonalization.1 Reorthogonalization and the Lanczos algorithm Lanczos with reorthogonalization. 2 The Restarted Arnoldi Method 2.4 Computation of refined and Harmonic Ritz vectors Refined Ritz vectors. The restarted Arnoldi cycle. 2. Reduction to Arnoldi form.3 Stability. The Krylov-Schur method. and shifts. 1. 3. Translation. Implicitly restarted Arnoldi. Harmonic Ritz vectors. The rational Krylov transformation. Stability and shift-and-invert enhancement.2 Krylov-Schur restarting Exchanging eigenvalues and eigenblocks. Krylov-spectral restarting. Deflation. Stability. and deflation Stability. 2. Deflation. Practical considerations. 1. The Krylov-Schur cycle. The equivalence of Arnoldi and Krylov-Schur.5 Notes and references Arnoldi's method. restarting. Orthogonalization and shift-and-invert enhancement. Similarity transformations. Nonorthogonal bases. Deflation in the Krylov-Schur method. 1. Exchanging eigenvalues. Exact shifts. Forward instability.5 Notes and references Lanczos and Arnoldi. 2. Shift-and-invert enhancement and Orthogonalization. Reverse communication. The Lanczos recurrence.2 Lanczos decompositions Lanczos decompositions.3 Semiorthogonal methods xi 306 308 314 315 316 317 325 332 342 344 347 348 351 352 Vol II: Eigensystems .2 Full reorthogonalization and restarting Implicit restarting. Convergence. Forward instability. and filter polynomials. Deflation in an Arnoldi decomposition. Implementation. The rational Krylov method. convergence. ARPACK.1 The implicitly restarted Arnoldi method Filter polynomials. Stability of deflation. 2. 1. Choice of tolerance. Equivalence to an Arnoldi decomposition. Convergence and deflation. Convergence with shift-and-invert enhancement. 3.

4 Notes and references Subspace iteration. An example.2 Implementation A general algorithm.2 Positive definite B The B inner product. 2 Newton-Based Methods 2. Local convergence. Harmonic Ritz vectors. Software.2 The Jacobi-Davidson method The basic Jacobi-Davidson method. The effects on the Ritz vectors. The QR factorization of a semiorthogonal basis. Semidefmite B. Restarting. Computing residual norms. 4. The singular value decomposition.4 Notes and references The Lanczos algorithm. 1. 2. 365 367 368 370 379 6 Alternatives 1 Subspace Iteration 1. Schur-Rayleigh-Ritz refinement. Semiorthogonal methods. Block Lanczos algorithms.1 Transformation and convergence Transformation to an ordinary eigenproblem. No reorthogonalization. Estimating the loss of orthogonality. The B-Arnoldi method. Convergence.1 Theory Convergence. When to perform an SRR step. 3. Periodic reorthogonalization. Chebyshev acceleration. When to orthogonalize. Orthogonality. 1. Filter polynomials. Extending Schur decompositions. The size of the reorthogonalization coefficients. Three approximations. 1.3 Symmetric matrices Economization of storage. Hermitian 381 381 382 386 391 394 395 396 404 . Semidefinite B. The B-Lanczos method with periodic reorthogonalization. The correction equation. B-Orthogonalization. The QR connection.1 Approximate Newton methods A general approximate Newton method. Orthogonal corrections. Real matrices.xii CONTENTS Semiorthogonal bases. Residuals and backward error. Deflation. 4. The effects on the Rayleigh quotient. Chebyshev acceleration. 4 The Generalized Eigenvalue Problem 4. Full reorthogonalization. The bi-Lanczos algorithm.3 Notes and references The generalized shift-and-invert transformation and the Lanczos algorithm. The restarted B-Lanczos method. Bounding the residual. Residuals and backward error. Analysis. Updating HK.

vectors. Solving projected systems. The correction equation: Iterative solution.CONTENTS matrices. Hermitian matrices. The Jacobi-Davidson method. 2.3 Notes and references Newton's method and the eigenvalue problem. The correction equation: Exact solution. and matrices 2 Linear algebra 3 Positive definite matrices 4 The LU decomposition 5 The Cholesky decomposition 6 The QR decomposition 7 Projectors References Index xiii 418 421 421 424 425 425 426 426 427 429 451 Vol II: Eigensystems . Inexact Newton methods. Appendix: Background 1 Scalars.

This page intentionally left blank .

.1 2.1 4.4 2.3 2. Eigensystems 1.2 Eigenvectors of a rank-one update 180 2.8 2.3 3.5 Finding a specified eigenvalue 194 2.1 The cross-product algorithm 211 3.1 Solution of Sylvester's Equation Chapter 1.ALGORITHMS Chapter 1. The Symmetric Eigenvalue Problem 1.1 3.4 4.2 Hermitian tridiagonal to real tridiagonal form 164 1.9 3.4 Counting left eigenvalues 192 2.3 Divide-and-conquer eigensystem of a symmetric tridiagonal matrix .6 2. 184 2.2 2. The QR Algorithm The shifted power method The inverse power method Generation of Householder transformations Reduction to upper Hessenberg form Generation of a plane rotation Application of a plane rotation Basic QR step for a Hessenberg matrix Finding deflation rows in an upper Hessenberg matrix Computation of the Wilkinson shift Schur form of an upper Hessenberg matrix Right eigenvectors of an upper triangular matrix Starting an implicit QR double shift Doubly shifted QR step Finding deflation rows in a real upper Hessenberg matrix Real Schur form of a real upper Hessenberg matrix Reduction to Hessenberg-triangular form The doubly shifted QZ step 18 64 67 83 85 90 90 93 96 98 99 103 120 122 124 125 146 151 Chapter 3.1 Rank-one update of the spectral decomposition 172 2.6 Computation of selected eigenvectors 199 3.3 The symmetric tridiagonal QR step 166 2.1 Reduction to tridiagonal form 161 1.7 2.2 3.5 2.1 1.2 2.2 2.2 Reduction to bidiagonal form 216 xv .

2 Estimation of uTK+1uj 3.4 Periodic reorthogonalization II: Orthogonalization step Chapter 6.4 Deflating in a Krylov-Schur decomposition 2.3 Periodic orthogonalization I: Outer loop 3.4 Solution of projected equations 2.3 The Krylov-Schur cycle 2.4 3.1 Subspace iteration 2.3 Krylov decomposition to Arnoldi decomposition 2.5 Davidson's method . Krylov Sequence Methods 1.1 The Arnoldi step 1.3 Harmonic extension of a partial Schur decomposition 2.2 The Lanczos recurrence 1. Alternatives 1.T A R .2 Extending a partial Schur decomposition 2.5 3.1 ALGORITHMS 218 221 225 226 232 234 304 308 313 321 322 330 341 344 349 357 360 361 387 405 410 413 415 419 Chapter 5.3 3.2 The restarted Arnoldi cycle 2.1 4.6 4.1 Filtering an Arnoldi decomposition 2.1 Lanczos procedure with orthogonalization 3.xvi 3.5 The rational Krylov transformation 3.1 The basic Jacobi-Davidson algorithm 2.2 Complex bidiagonal to real bidiagonal form Bidiagonal QR step Back searching a bidiagonal matrix Hybrid QR-SVD algorithm Solution of the S/PD generalized eigenvalue problem The computation of C = R .

sparse direct methods. This has necessarily restricted the scope of the series. Although strictly speaking the individual volumes are not textbooks. The subject of this volume is computations involving the eigenvalues and eigenvectors of a matrix. with such topics as roundingerror analysis and perturbation theory introduced impromptu as needed. which treats Krylov sequence methods—in particular the Arnoldi and the Lanczos methods. Up to this point the present volume shares the concern of the first volume with dense matrices and their decompositions. but I hope the selection of topics will give the reader a sound basis for further study. The reader is assumed to have a knowledge of elementary analysis and linear algebra and a reasonable amount of programming experience— about what you would expect from a beginning graduate engineer or an advanced undergraduate in an honors program. The general approach to such problems is to calculate approximations to subspaces corresponding to groups of eigenvalues—eigenspaces or invariant subspaces they are called. they are intended to teach. Consequently. Accordingly. The first volume treated basic decompositions. and special topics. The first chapter is devoted to an exposition of the underlying mathematical theory. including fast algorithms for structured matrices. The three following this volume will treat iterative methods for linear systems. the focus of the series is on algorithms.PREFACE This book. in the fourth chapter we will present the algebraic and analytic theory of eigenspaces. The second chapter treats the QR algorithm. is the second volume in a projected five-volume series entitled Matrix Algorithms. The third chapter deals with symmetric matrices and the singular value decomposition. which in its various manifestations has become the universal workhorse in this field. It seems to me that these people will be chiefly interested in the methods themselves— how they are derived and how they can be adapted to particular problems. My intended audience is the nonspecialist whose needs cannot be satisfied by black boxes. In the fourth chapter the focus shifts to large matrices for which the computation of a complete eigensystem is not possible. We then return to algorithms in the fifth chapter. My aim is to bring the reader to the point where he or she can go to the research literature to augment what is in the series. In xvii . and my guiding principle has been that if something is worth explaining it is worth explaining fully. The series is self-contained. Eigensystems. The second and third chapters also treat the generalized eigenvalue problem.

M. References to items outside the current chapter are made explicitly—e. she has turned an onerous job into a pleasant collaboration. much of the necessary material is slipped unobtrusively into the present volume and more background can be found in the appendix. I posted it on the web and benefited greatly from the many comments and suggestions I received.2. Briggs. and David Watkins for their comments and suggestions. Mei Kobayashi. have aided the writing and production of this book. Many methods treated in this volume are summarized by displays of pseudocode (see the list of algorithms following the table of contents). I have therefore been forced to assume that the reader is familiar. Arnold Neumaier. However. it is impossible to redevelop the necessary theoretical and computational material in each volume. Daniel Kressner. I have checked the algorithms against MATLAB implementations. The book is divided into numbered chapters. Kelly Thomas and Sam Young have seen the present volume through production. Be on guard! A word on organization.5) refers to the fifth equations in section three of the current chapter. Numbering is by section. Jean Anderson. with the material in previous volumes. To make life easier. John Pryce. However. Algorithm 1:3.. such a procedure is not proof against transcription errors. I should like to thank K. As usual.xviii PREFACE the last chapter we consider alternative methods such as subspace iteration and the Jacobi-Davidson method. sections. Special thanks go to the editors of Templates for the Solution of Algebraic Eigenvalue Problems. they often leave out error checks that would clutter the presentation. Richard Lehoucq invited me to a . In 1997. I have had to come to grips with how the the volumes of this work hang together. Thomas Strohmer. These summaries are for purposes of illustration and should not be regarded as finished implementations. However. I conceived the series being a continuous exposition of the art of matrix computations.g. Austin Dubrulle. Henk van der Vorst. and subsections. who allowed me access to their excellent compendium while it was in preparation. SIAM publications and its people. Moreover. With this second volume. I have come to feel that the reader will be better served by semi-independent volumes. it is difficult to verify the correctness of algorithms written in pseudocode. I have not hesitated to reference the first volume—especially its algorithms—where I thought it appropriate. On reflection. David Dureisseix. with heavy cross referencing between the volumes. While I was writing this volume. I am indebted to the Mathematical and Computational Sciences Division of NIST for the use of their research facilities. Vickie Kearn first encouraged me to start the project and has helped it along too many ways to list.7. Initially.g. Theorem 2. Andrzej Mackiewicz.1.. followed by unnumbered subsubsections. Nicola Mastronardi. past and present. References to Volume I of this series [265] are preceded by the Roman numeral I—e. Chapter 1. In the first place. in a general way. as I was preparing to write this volume. Wherever possible. so that (3. Special mention should be made of my long-time copy editor and email friend.

Naming names carries the risk of omitting someone. I was stunned by the progress made in the area since the late sixties and early seventies. Beresford Parlett. but any list would surely include Vel Kahan. But any unified exposition makes it easy to forget that the foundations for these advances had been laid by people who were willing to work in an unexplored field. when I had worked on the problem. Stewart College Park.PREFACE xix workshop on large-scale eigenvalue problems held at Argonne National Laboratory. MD Vol II: Eigensystems . where confusion and uncertainty were the norm. and Yousef Saad. G. Chris Paige. Shmuel Kaniel. This book is dedicated to these pioneers and their coworkers. W. The details of these new and improved algorithms are the subject of the second half of this volume.

This page intentionally left blank .

some of which had been developed by Wilkinson himself. This happy mixture of the theoretical and computational has continued to characterize the area— computational advances go hand in hand with deeper mathematical understanding. The Algebraic Eigenvalue Problem. This is a large subject. THE ALGEBRA OF EIGENSYSTEMS In this section we will develop the classical theory of eigenvalues. eigenvectors. We then present Schur's reduction of a triangular matrix to triangular form by unitary similarity transformations—a decomposition that serves as 1 . which are important in the analysis of many matrix algorithms. with an emphasis on the reduction of a matrix by similarity transformations to a simpler form. The purpose of this chapter is to provide the mathematical background for the first half of this volume. Here we will pay special attention to the asymptotic behavior of powers of matrices. the theory of eigensystems is best treated in terms of complex matrices. throughout this chapter: Unless otherwise stated A is a complex matrix of order n. and their reality often results in computational savings. but a significant part of the book was devoted to the mathematical foundations. 1. Consequently. and we will only be able to survey the basics. and reduction by similarity transformations. The second section concerns norms and the their relation to the spectrum of a matrix. The first section consists of an exposition of the classic algebraic theory of eigensy stems. the workhorse of dense eigenvalue computations. His purpose was to survey the computational state of the art. The second and third sections are devoted to the analysis of eigensystems. We begin with basic definitions and their consequences. The third section is devoted to perturbation theory—how eigensystems behave when the original matrix is perturbed. The predominant theme of this section is the systematic reduction of matrices to simpler forms by similarity transformations. However. The majority of eigenvalue problems involve real matrices.1 ElGENSYSTEMS In 1965 James Hardy Wilkinson published his classic work on numerical linear algebra. One of these forms — the Schur decomposition—is the goal of the QR algorithm.

. which we investigate in some detail. when the eigenvector corresponding to an eigenvalue is unique up to a scalar multiple. The advantage of having this representation is that multiplication by A is reduced to a scalar operation along the eigenvector.2). we will refer to the one we are currently interested in as the eigenvector. Further reduction to block diagonal form is treated in § 1. it is natural to regard the identity matrix of order n as having n eigenvalues. In any event. y) is called a LEFT EIGENPAIR ify is nonzero and yHA = Xy11.2 CHAPTER 1. 1. • Eigenvalues can be repeated. Since the two are inseparable. one might require that XHXx = 1 or that some component of x be equal to one. The multiset of eigenvalues of A is called the SPECTRUM of A and is denoted by A( A). i.6 we present the Jordan canonical form. we will formalize their pairing in the following definition.1.. To keep track of multiplicities we define the spectrum of a matrix to be a multiset. all equal to one (see Theorem 1. EIGENSYSTEMS a springboard for computing most others. For example. Here are some comments on this definition. Let A be of order n. one often specifies a normalization ofx. Left eigenpairs and eigenvectors will always be qualified. • The unqualified terms "eigenpair" and "eigenvector" refer to right eigenpairs and right eigenvectors.5. the effect of a power of A along x can be determined by taking the corresponding power of A. Finally in §1. Perhaps the best way to think of an eigenvector x of a matrix A is that it represents a direction which remains invariant under multiplication by A. The corresponding eigenvalue A is then the representation of A in the subspace spanned by the eigenvector x. For example. The pair (A. EIGENVALUES AND EIGENVECTORS Definitions This volume has two heroes: the eigenvalue and its companion the eigenvector. it is easy to show that if Ax = Xx then Thus. The reduction depends on our ability to solve a matrix equation called Sylvester's equation.1. For example. To remove this trivial nonuniqueness. • If (A. The pair (A. x) is called an EIGENPAIR or a RIGHT EIGENPAIR of A if The scalar X is called an EIGENVALUE of A and the vector x is called an EIGENVECTOR. then (A. rx) is also an eigenpair.e. a set in which a member may be repeated. Definition 1. x) is an eigenpair and r ^ 0.

It follows that (0. x) is an eigenpair of A. or equivalently (\I-A)x = Q. THE ALGEBRA OF EIGENSYSTEMS Existence of eigenpairs: Special cases 3 In a moment we will show that any matrix has an eigenpair. if (A. then \x Ax = 0. denotes the z'th unit vector (i.. Conversely. However. i. If A is singular. and it is easily verified that is an eigenvector of A corresponding the the eigenvalue akk. If A is diagonal and e. and (A. a vector x / 0 for which Ax = 0. Conversely. A/ . then A has a null vector. then it has a null vector x. the vector whose «th component is one and whose other components are zero). If (A. x) is an eigenpair of A. x) is an eigenpair of A. • Triangular matrices. then ctkkl—AH is nonsingular. then Thus (a.Since any diagonal element of A must have a first appearance on the diagonal.SEC..e.. 1. If no diagonal element of AH is equal to akk. before we formally establish the existence of eigenpairs. The characteristic equation and the existence of eigenpairs We now turn to the existence of eigenpairs. It then follows from the equation that akk£k = ^£k or A = akk. Since x is nonzero. say &.Thus the eigenvalues of a triangular matrix are precisely its diagonal elements. it will be convenient to examine three important cases in which a matrix must by its nature have eigenpairs. it follows that the diagonal elements of A are eigenvalues of A. Let A be a triangular matrix partitioned in the form where AH is of order k. if XI .. e{) is an eigenpair of A.A must be singular. • Diagonal matrices. Thus the eigenvalues of A are Vol II: Eigensystems . x) is an eigenvector of A it must have a last nonzero component.A is singular.e. • Singular matrices.

which are said to have ALGEBRAIC MULTIPLICITY mi. We will also use the phrase "counting multiplicities" as a reminder that multiplicities count. from the fundamental theorem of algebra we have the following theorem. More precisely. then.. are distinct and The eigenvalues of A are the scalars A. This theorem and the accompanying definition has two consequences. a matrix of order n has n eigenvalues. Thus. the eigenvalues of A are the scalars for which The function ^(A) is a monic polynomial of degree n called the characteristic polynomial of A. Theorem 1. • It is important to keep track of the algebraic multiplicity of the eigenvalues. For this matrix An easily memorized equivalent is .2. it has a left null vector yH. EIGENSYSTEMS the scalars for which A/ — A is singular.3). Equivalently. Since a polynomial of degree n has exactly n roots. • It follows from the theorem that we do not need to distinguish between "right" and "left" eigenvalues. andletpA(^) = det(A/ — A) be its characteristic polynomial Thenpp(X) can be factored uniquely in the form where the numbers A. in addition to a right null vector. Let A be of order n. We have thus shown that the eigenvalues of A are the roots of the characteristic equation (1. yHA = Ay H . counting algebraic multiplicities every matrix of order n has n eigenvalues. i..4 CHAPTER 1. say.e. Matrices of order two In many applications it is useful to have the the characteristic polynomial of a 2x2 matrix. For if A/ — A is singular. To do so. t/H is a left eigenvector corresponding to A. we have already defined A( A) to be a multiset. Since a matrix is singular if and only if its determinant is zero. Thus every eigenvalue has both a left and right eigenvector.

In the presence of the screen this plane wave is diffracted. The total field u can be written where us(x. A point source is located at x0 in the exterior of JR. The absorption coefficient is the constant k2 outside the bounded region R and a function k2 . z) satisfies .) 7. If the screen were absent.52 Consider steady diffusion in an absorbing medium. the plane wave elv/Az could propagate in the channel.22 7. FIGURE 7.kl(x) inside R.16] WIENER-HOPF METHOD 329 derivative of the field in the aperture.22 is a cross-sectional view of two parallel plates with a screen obstructing in part the plane z = 0. x in R.§7.53 Figure 7. Compare this result with diffraction by a thin disk of the shape of the aperture (with vanishing field on the disk. The concentration u satisfies Express u everywhere in terms of its values inside R and derive an integral equation for u(x). Characterize this normal derivative by an integral equation. The boundary condition on the plates and on the screen is that the normal derivative of the field vanishes.

The theorem shows that the set of eigenvectors corresponding to an eigenvalue along with the zero vector forms a subspace. The geometric multiplicity is never greater than the algebraic multiplicity (see the notes following Theorem 1. Then for y and z to be linearly dependent. Now the complex zeros of a real polynomial occur in conjugate pairs. we get Ax = Ax. the real and imaginary parts of the eigenvector x corresponding to a complex eigenvalue A are linearly independent. Specifically. .12). the real and imaginary parts ofx are linearly independent 1. Actually.4.6 CHAPTER 1. that need not be true.5. if we have computed a complex eigenpair. We may summarize these results in the following theorem.2. and such pairs have the same algebraic multiplicity. If (A. Since A and x are real and x ^ 0. then on taking conjugates we find that Ax = Xx. Then the set of all eigenvectors corresponding to A together with the zero vector is the null space of XI — A. The complex eigenvalues of a real matrix A occur in conjugate pairs of equal algebraic multiplicity. Definition 1. as the following example shows. Theorem 1. where A is complex. It follows that the vector is real and nonzero. if A is real. It follows that or is an eigenvector of A corresponding to A. say. The dimension of that subspace is the maximal number of linearly independent eigenvectors associated with the eigenvalue. we can say more. GEOMETRIC MULTIPLICITY AND DEFECTIVE MATRICES Geometric multiplicity The fact that we refer to the multiplicity of an eigenvalue as a root of the characteristic equation as its algebraic multiplicity suggests that there is another kind of multiplicity. A is also real—a contradiction. It is natural to conjecture that the algebraic and geometric multiplicities of an eigenvalue must be the same. The following theorem sets the stage for defining it. we must have z = ry for some real scalar r. Moreover. This suggests the following definition. For if Ax — Xx. x). Hence on multiplying the relation Ax = \x by 1 — i r. Let X be an eigenvalue of A. Thus. Moreover. EIGENSYSTEMS of a real matrix have a special structure that we can exploit to eliminate the use of complex arithmetic. The GEOMETRIC MULTIPLICITY of an eigenvalue A is the dimension of the subspace spanned by its eigenvectors orequivalentlym\l(XI — A). but it can be less. Unfortunately. Its proof is left as an exercise. x) is a complex eigenpairofA. Theorem 13. To see this write x = y + i z and assume that. It follows that the complex eigenvalues of a real matrix occur in conjugate pairs. then so is (A. we have effectively computed the conjugate eigenpair. y is nonzero. then its characteristic polynomial is real.

n). But its eigenvalues are ±V/e. . For example. This represents a dramatic perturbation of the eigenvalue. First. On the other hand.) Then any vector y can be expressed in the form An expansion like this is particularly useful in numerical computations. the perturbed eigenvalues are ±10~8—eight orders of magnitude greater than the perturbation in the matrix. If. let us agree to say that the matrix A has a complete system of eigenvectors if it has n linearly independent eigenvectors Xi corresponding to eigenvalues Az-. Specifically. as the following example shows. it follows that Thus we can replace the process of computing Aky by multiplication of y by A with the less expensive process of taking the powers of the eigenvalues and summing their products with their eigenvectors. The matrix differs from J2(0) by a perturbation ofe in its (2.SEC.7..-.6. The second difficulty with defective matrices is that defective eigenvalues are very sensitive to perturbations in the matrix elements. Unfortunately. It is also an example of a defective matrix. the null space of J2(0) is spanned by ei. l)-element. we say that an eigenvalue whose algebraic multiplicity is less than its geometric multiplicity is defective. Hence the geometric multiplicity of the eigenvalue 0 is one. Example 1. for example. A matrix with one or more defective eigenvalues is also said to be defective.6 shows that not all matrices have complete systems of eigenvectors. 6 — 10~16. 1.. since AkXi — X^Xi (i — 1. THE ALGEBRA OF EIGENSYSTEMS Example 1. Let 7 The characteristic equation of J2(0) is A2 = 0. x z ) form a complete system. a defective matrix has too few eigenvectors. VolII: Eigensystems . Example 1. (We also say that the eigenpairs (A. from which it follows that 0 is an eigenvalue of algebraic multiplicity two. To see why this is a problem. Defectiveness The matrix J2(0) is an example of a Jordan block. Defectiveness introduces two practical difficulties..

SIMILARITY TRANSFORMATIONS A key strategy in matrix computations is to transform the matrix in question to a form that makes the problem at hand easy to solve. To distinguish this latter class of eigenvalues. in transforming the least squares problem of minimizing 11 y—X b \ \ 2 one uses orthogonal matrices because they preserve the 2-norm.3. Xi) (i — 1. which makes the computation of expansions like (1. one of the most successful algorithms for solving least squares problems is based on the orthogonal reduction of X to triangular form. Then the matrices A and B = U~1AU are said to be SIMILAR.9. However.. We also say that B is obtained from A by a SIMILARITY TRANSFORMATION. In fact. We could wish our tools were sharper.n). a matrix that is near a defective matrix is not much better than a defective matrix. In other cases we must come to grips with it. where defectiveness is only incidental to the problem. Thus defectiveness is a phenomenon we will have to live with.can be combined in the matrix equation . For the eigenvalue problem. .12). Moreover.. Example 1. EIGENSYSTEMS Because rounding a defective matrix will generally cause its eigenvalues to become simple.8 CHAPTER 1. For example.6) problematic. Let A be of order n and let U be nonsingular. We will also say the the corresponding eigenvector and eigenpair are simple. It may have a complete system of eigenvectors. In some cases. The class of transformations and the final form will depend on the problem. Let the matrix A have a complete system ofeigenpairs (A. the transformations of choice are called similarity transformations.. by using a Schur decomposition (Theorem 1. are just as sensitive as the original multiple eigenvalues. 1. but those eigenvectors will be nearly linearly dependent. Let The the individual relations Ax{ — A t # t .-. though distinct. Definition 1. it is sometimes argued that defective matrices do not exist in practice. the perturbed eigenvalues. we will say that an eigenvalue is simple if it has algebraic multiplicity one. we can work around it—for example. The following is an important example of a similarity transformation. Simple eigenvalues We have just seen that multiple eigenvalues may be accompanied by theoretical and practical difficulties that do not affect their nonmultiple brethren.8.

THE ALGEBRA OF EIGENSYSTEMS 9 Because the eigenvectors Xi are linearly independent. 1. which is the product of its eigenvalues. Let A be of order n. we see that if A can be diagonalized by a similarity transformation X~l AX. x) is an eigenpair of A. U~lx) is an eigenpair of B. The trace of A. the matrix X is nonsingular. which form a complete system. If (A. then the columns ofX are eigenvectors of A. by reversing the above argument. or) is an eigenpair of A. is clearly unchanged by similarity transformations. VolII: Eigensystems . Theorem 1.10. similarity transformations preserve functions of the eigenvalues. i. • In addition to preserving the eigenvalues of a matrix (and transforming eigenvectors in a predictable manner).11. The following theorem shows that the trace is also invariant. is the sum of the eigenvalues of A counting multiplicities and hence is unchanged by similarity transformations.SEC.U~1AU be similar to A.. U lx) is an eigenpair of B. then so that (A. whose columns are eigenvectors of A. The determinant of a matrix. If (A. Conversely. Let A be of order n and let B . then (A.e. Then the eigenvalues of A and B are the same and have the same multiplicities. A nice feature of similarity transformations is that they affect the eigensystem of a matrix in a systematic way. Thus A and B have the same characteristic polynomial and hence the same eigenval ues with the same multiplicities. Hence we may write Thus we have shown that a matrix with a complete system ofeigenpairs can be reduced to diagonal form by a similarity transformation. as the following theorem shows. Theorem 1.

26). we can hope to reduce a matrix A to triangular form by a unitary similarity.7)].9) that a matrix with a complete system of eigenpairs can be reduced to diagonal form by a similarity transformation. More important. Unitary transformations play a leading role in matrix computations. The characteristic polynomial of A can be written in two ways: By expanding the determinant on the right. we will now investigate the existence of intermediate forms that stop before full diagonality. . Thus we can throw an error made in a computed transformation back on the original matrix without magnification. Chapter 2. The following theorem shows that this hope is realized. Consequently.10 CHAPTER 1. Since a triangular matrix has n(n-l)/2 zero elements. so that they do not magnify errors. and even when one can. there is a bug in the code. Unfortunately. we find that the same coefficient is the negative of the sum of the eigenvalues of A. . From the expression on the left. . Schur form In asking how far one can reduce a matrix by a class of transformations. where U is unitary and we perturb B by an error to get B = B -f F. \\E\\2 = \\F\\2 [see (2. if we set a small element of B to zero—say when an iteration is deemed to have converged—it is equivalent to making a perturbation of the same size in A. its matrix of eigenvectors may be nearly singular. Just print out the trace after each transformation. EIGENSYSTEMS Proof. If it changes significantly. . We have seen (Example 1. see (2. • This result is invaluable in debugging matrix algorithms that proceed by similarity transformations. . Let the eigenvalues of A be A i . A unitary matrix has n2 elements. They have spectral norm one. not all matrices can be so reduced. if. 1. so that a unitary transformation can easily be undone. say. A n . we can write Moreover. This leaves n(nl)/2 degrees of freedom. They are trivial to invert. SCHUR DECOMPOSITIONS Unitary similarities This subsection is devoted to the reduction of matrices by unitary and orthogonal similarities.1. For an example. For example. But these elements are constrained by n(n+l)/2 conditions: n(n-l)/2 orthogonality conditions and n normality conditions. it is useful to ask how many degrees of freedom the transforming matrices possesses. we find that the coefficient of A n 1 is the negative of the sum of the diagonal elements of A.4.

If we set then it is easily verified that which has the required form. .. 1.. we may assume that there is a unitary matrix V such that V^MV — T^ where T^. the eigenvalues of M are A 2 . A n . there may be more Vol II: Eigensystems . is an upper triangular matrix with the eigenvalues A 2 . . Then there is a unitary matrix U such that where T is upper triangular. . which are the diagonal elements of T. Let A be of order n. .. In addition multiple eigenvalues can cause the decomposition not to be unique—e. Then the last equality following from the relation Ax\ — \\x\. • The decomposition A = UTU11 is called a Schur decomposition of A. Hence Now by (1. Because X is unitary we h a v e x ^ x i — 1 a n d x ^ X i — 0.5). Let AI . An appearing on its diagonal in the prescribed order. . may be made to appear in any order.g.. It is not unique because it depends on the order of the eigenvalues in T. the eigenvalues of A. . . THE ALGEBRA OF EIGENSYSTEMS 11 Theorem 1. By appropriate choice ofU. Xn be an ordering of the eigenvalues of A. . By an obvious induction. Let the matrix X — (x\ X^) be unitary. .12 (Schur).SEC. Let x\ be an eigenvector corresponding to AI and assume without loss of generality that Z H X = 1. Proof. • Here are some comments on this theorem.

7. • Schur decompositions can often replace other. EIGENSYSTEMS than one choice of the eigenvector x\ in the proof of Theorem 1. In general. it provides a constructive method for reducing the eigenvalue problem to a smaller one in which the eigenpair has been removed [see (1. The following theorem generalizes the above observations. more complicated forms in theoretical and computational applications. to be treated later. the Schur vectors are the columns of the Q-factor of the matrix of eigenvectors arranged in the prescribed order. if one has an eigenpair at hand. • When A has a complete system of eigenvectors. the columns of U are called Schur vectors. the Jordan canonical form. To prove the same fact via the Schur decomposition.9) is equivalent to The right-hand side of this relation is upper triangular with the eigenvalues of A on its diagonal. Unique or not. shows that the geometric multiplicity of an eigenvalue cannot exceed its algebraic multiplicity. A nontrivial example is the matrix </2(0) of Example 1. beginning with an analysis of nilpotent matrices. so that all the others are distinct from A. as can easily be verified. • The proof of Theorem 1. Nilpotent matrices We now turn to applications of the Schur decomposition. It follows that the eigenvectors corresponding to A are confined to a subspace of dimension k and cannot span a subspace of dimension greater than k. . In other words. whose square is zero. then the diagonal and first k-l superdiagonals of Tk are zero. Let the first k diagonals of T = UHAU be A. Chapter 2]. Then it is easy to show that the last n—k components of any eigenvector corresponding to A must be zero [see (1. However. if T is an upper triangular matrix of order n with zeros on its diagonal.12. Let be the QR decomposition of X. let A have algebraic multiplicity k. since it requires a ready supply of eigenpairs.2)]. The zero matrix is a trivial example of a nilpotent matrix. • Given the eigenpair (Ai. so that where A is the diagonal matrix of eigenvalues.12 is not constructive. Let X be the matrix of eigenvectors of A. the Schur vectors bear an interesting relation to the eigenvectors. xi). Hence Tn = 0.8)]. Hence UHAU is a Schur decomposition of A. the unitary matrix X = (xj ^2) in the proof can be constructed as a Householder transformation [see (2. A matrix A is nilpotent if Ak = 0 for some fc > 1.19). For example. Then (1.12 CHAPTER 1. This process is known as deflation and is an important technique in eigencalculations.

Since the diagonal elements of a Hermitian matrix are real. x) be an eigenpair of A. its eigenvalues are real. It follows that Hermitian matrices The elementary facts about Hermitian matrices can be obtained almost effortlessly from the Schur form. Thus T is both upper and lower triangular and hence is diagonal. We call the decomposition the SPECTRAL DECOMPOSITION OF A Vol II: Eigensystems . the Az are real. To summarize: If A is Hermitian. Specifically. 1. In this case. suppose that the eigenvalues of A are all zero. ttt-) is an eigenpair of A. A matrix A of order n is nilpotentifand only if its eigenvalues are all zero. THE ALGEBRA OF EIGENSYSTEMS 13 Theorem 1. Conversely. say.1). Let Ak = 0 and let (A. and it follows that A = 0. Then from (1.13. then so that T itself is Hermitian. Proof. and it has a complete orthonormal system of eigenvectors. if A is Hermitian and T = U^AU is a Schur form of A.denotes the ith column of U then (A.-. from the relation we see that if ttt.SEC. Then any Schur decomposition of A has the form where T has zeros on its diagonal. Moreover.

l)-blocks of each side of the relation A H A = AAH we find that Since for any matrix B it follows that If A is a normal matrix and U is unitary. To see this. A key fact about normal matrices is contained in the following theorem. partition By Theorem 1. By computing the (l. It follows that: A normal matrix has a complete orthonormal system ofeigen vectors. It then follows from the relation TET = TTH that ^22^22 = ^22^22.15. EIGENSYSTEMS The key observation in establishing (1. Theorem 1. t12 = 0. the matrix T is normal. and unitary matrices. (1 -11) We add for completeness that the eigenvalues of a unitary matrix have absolute value one. Let the normal matrix A be partitioned in the form Then Proof. Definition 1. so that T22 is normal. Consequently.14 Normal matrices CHAPTER 1. which uses the Frobenius norm || • ||p to be introduced in §2.1. We claim that T is also diagonal.10) is that the triangular factor in any Schur decomposition of a Hermitian matrix is diagonal. skew Hermitian matrices (matrices for which AH = —A).14.15. This property is true of a more general class of matrices. A matrix A is NORMAL if A H A = AA H . The eigenvalues of a skew Hermitian matrix are zero or purely imaginary. if T is the triangular factor in a Schur decomposition of A. then UHAU is also normal. The three naturally occurring examples of normal matrices are Hermitian matrices. . An obvious induction now establishes the claim.

We can make the transformation even simpler by requiring its diagonal blocks to be identity matrices. If we can solve it. in which case L and M would be upper triangular. for example. But only the first Schur vector—the first column of the orthogonal factor—is an eigenvector. BLOCK DIAGONALIZATION AND SYLVESTER'S EQUATION 15 A Schur decomposition of a matrix A represents a solution of the eigenvalue problem in the sense that the eigenvalues of the matrix in question can be obtained by inspecting the diagonals of the triangular factor. Suppose that we wish to reduce A to block diagonal form by a similarity transformation—preferably a simple one. 1. THE ALGEBRA OF EIGENSYSTEMS 1. From block triangular to block diagonal matrices Let A have the block triangular form where L and M are square. be obtained by partitioning a Schur decomposition. then the diagonalizing transformation provides a complete set of eigenvectors. which can be accomplished by making the transformation itself block upper triangular. we must first investigate conditions under which the equation has a solution. But we make no such assumption. which has no complete system of eigenvectors. We have seen (Example 1. such as a block diagonal form. Thus to proceed beyond Schur form we must look to coarser decompositions. we can block diagonalize A.9) that if we can diagonalize A. Vol II: Eigensystems . cannot be diagonalized. However. Such a matrix could. Unfortunately. preserve the block upper triangularity of A. Thus we are led to a transformation of the form whose inverse is Applying this transformation to A we get Thus to achieve block diagonality we must have This equation is linear in the elements of Q. We must. a defective matrix. of course.5.SEC.

say of orders n and m. Then Hence if we can choose A = // — that is. Let X = xyH. we use the fact that an operator is nonsingular if every linear system in the operator has a solution. Specifically. The first step is to transform this system into a more convenient form. The Sylvester operator S is nonsingular if and only if the spectra of A and B are disjoint. EIGENSYSTEMS Let us consider the general problem of solving an equation of the form where A and B are square. y] be a left eigenpair of B. and S is singular. Let T = VHBV be a Schur decomposition of B. If we introduce the Sylvester operator S: defined by then we may write Sylvester's equation in the form 1 he operator ^ is linear in X .. Here we will be concerned with sufficient conditions for the existence of a solution. Then Sylvester's equation can be written in the form If we set and we may write the transformed equation in the form . our problem reduces to that ot determining when S is nonsingular. and it occurs in many important applications. we have the following theorem. let (A. if A and B have a common eigenvalue— then X is a "null vector" of S. It turns out that the nonsingularity depends on the spectra of A and B. Hence.16. Consider the sy stem SX — C. x) be a right eigenpair of A and let (//. To prove the converse. i.16 Sylvester's equation CHAPTER 1. Theorem 1. if and only if Proof. This equation is called a Sylvester equation.e. To prove the necessity of the condition.

12) shows that (A — /*) € A(S).. 1.XB = 0 so that the Sylvester operator X >-> (A — <?I)X — XB is singular.From the fcth column of (1. In general.till is nonsingular. or equivalently a = X — /z.a = //. Thus A — t\\I is nonsingular. so that (<r. . THE ALGEBRA OF EIGENSYSTEMS Let us partition this system in the form 17 The first column of this system has the form Now tn is an eigenvalue of B and hence is not an eigenvalue of A. as argued above.17. We have found a solution Y of the equation Hence is a solution of We may restate this theorem in a way that will prove useful in the sequel. (A — aI)X . yk-\. Vol II: Eigensystems . is well defined. and the first column of Y is well defined as the solution of the equation (A-tuI)yi = di. X) is an eigenpair of S. Moreover. suppose that we have found 3/1. A . Hence 3/2 is well defined. It follows from Theorem 1. Let SX = crX.13) we get The right-hand side of this equation is well defined and the matrix on the left is nonsineular. Thus we have the following corollary. if A € A( A) and \L € A(B).. Conversely. Hence vi. The eigenvalues of S are the differences of the eigenvalues of A and B. Corollary 1.SEC. Equivalently. then (1.13) we get the equation Because y\ is well defined the right-hand side of this system is well defined.16 that this can be true if and only if there is a A € A( A) and \i € A(£) such that A . From the second column of the system (1..

However. But it is instructive to reflect on the the second part of the proof. In the code below all partitionings are by columns. More details are given in the notes and references. the systems have upper triangular matrices of the form S — tkkl. It is summarized in Algorithm 1.1. this operation count is prohibitive. Then we can write Sylvester's equation in the form or equivalently where Y = UHXV and D = UHCV. 1. the procedure of transforming the Sylvester equation into the form AY .16. For a general matrix A these systems would require O(mn3) operations to solve. this algorithm solves Sylvester's equation AX . Unless m is quite small. As it stands. 6. Specifically. 7.r^/).1: Solution of Sylvester's Equation o An algorithm Theorem 1. as in the proof of Theorem 1.YT = D and solving columnwise for Y represents an algorithm for solving Sylvester's equation. 4. which can be solved in 0(n 2 ) operations. EIGENSYSTEMS Given matrices A £ C n x n and B € C m x m . If we now solve columnwise for y. If we have a method for computing Schur decompositions. it involves solving m systems whose matrices are (A . let S = UHAU be a Schur decomposition of A. 2. 3. by a further transformation of the problem we can reduce the count for solving the systems to O(mn2).16 and its corollary are sufficient for our immediate purposes. Compute a Schur decomposition A — USU^ Compute a Schur decomposition B = VTVH D = UHCV for k = 1 to ro Solve the triangular system (S-tkkI)yk k = dk + X^i" tikl/i end for k * = UYVH Algorithm 1.18 CHAPTER 1. . 5. These observations form the basis for one of the best algorithms for solving dense Sylvester equations.XB = C.

and by Theorem 1. to give the following theorem.SEC.U£fc of disjoint multisets. By the Schur decomposition theorem (Theorem 1. Hence we have the following theorem.. . Then the eigenvalues of LI are disjoint from those of M.18 to M to cut out a block L^ corresponding to £2. Vol II: Eigensystems ..18. Suppose. £*:• Partition where A(£i) = LI . a sufficient condition for this Sylvester equation to have a solution is that L and M have disjoint spectra.And so on. . THE ALGEBRA OF EIGENSYSTEMS Block diagonalization redux Let us now return to our original goal of block diagonalizing the matrix 19 We have seen that a sufficient condition for a diagonalizing similarity to exist is that the equation LQ . 1... that A( A) can be partitioned into the union C\ U. Moreover.QM = —H have a solution.18 we can reduce T to the form In the same way we may apply Theorem 1. for example.12) we can find a unitary matrix X such that T = XHAX is upper triangular with the eigenvalues of A appearing in the order of their appearance in the multisets C\. Let and suppose that Then there is a unique matrix Q satisfying such that The importance of this theorem is that it can be applied recursively. Theorem 1.

JORDAN FORM The reports of my death are greatly exaggerated. we may replace X^ Yi. Finally. if Si is nonsingular.-. which is the topic of this subsection... A is diagonalizable and has a complete system of eigenvectors. TJ 1.19 and its corollaries enjoy a certain uniqueness. see p. if all the eigenvalues of A are distinct. We begin with a definition. This lack of uniqueness can be useful. Then the spaces spanned by the Xi and Yi are unique (this fact is nontrivial to prove. that Yi is orthonormal. and S^lL{Si. U Ck of disjoint multisets. For example.Lk)..19. and Li in Theorem 1. The answer is the classical Jordan canonical form. Then there is a nonsingular matrix X such that X-lAX = di&g(Ll. The decompositions in Theorem 1. However. EIGENSYSTEMS Theorem 1.. Corollary 1. .. ra^. \k with multiplicities mi. we obtain the following corollary. Mark Twain We have seen that any matrix can be reduced to triangular form by a unitary similarity transformation. Let A(A) be the union C\ U . say.. correspond to the distinct eigenvalues of A... Let the matrix A of order n have distinct eigenvalues AI. .6. 246). Let X — (Xi • • • Xk) be partitioned conformally with the pardon of A( A) and let X~l = (Yi .20.. . .19 by XiSi. It is natural to ask how much further we can go by relaxing the requirement that the transformation be unitary.20 CHAPTER 1. Then there is a nonsingularmatrix X such that where If the sets £.• Yk) also be partitioned conformally. . where Li is of order raz and has only the eigenvalue A. then the blocks Li in this corollary are scalars. it allows us to assume. . Hence: If A has distinct eigenvalues. YiS~^.

14) where (1. • The proof of the theorem is quite involved.j (i = 1.. The Jordan canonical form—a decomposition of a matrix into Jordan blocks—is described in the following theorem.15)..15) Here are some comments about this theorem. each block corresponding to a distinct eigenvalue of A.22.. The eigenvalue A of the matrix J m (A) has algebraic multiplicity m. .. . THE ALGEBRA OF EIGENSYSTEMS Definition 1. so A has geometric multiplicity one. But rankfAJ — J m (A)] is m-1. For more see the notes and references. • The Jordan canonical form has a limited uniqueness. Let AeCnXn have k distinct eigenvalues AI. The hard part is reducing the individual blocks to the J.SEC. in (1.21. l. Theorem 1. we still speak of the Jordan canonical form of a matrix./. Principal vectors may be chosen in different ways..-j along with their associated eigenvalues are characteristic of the matrix A. . Any reduction of A into Jordan blocks will result in the same number of blocks of the same order having the same eigenvalues. However.. But in spite of this nonuniqueness.Then there are unique integers m. j = 1.14) and (1. . . A JORDAN BLOCK Jm(X) isa matrix of order m of the form 21 From an algebraic point of view a Jordan block is as defective as a matrix can be. The spaces spanned by the Xi are unique. . . mk. &. A^ of algebraic multiplicities m i . Vol II: Eigensystems . . It begins with a reduction to block diagonal form. the columns of the Xi are not.. The multiplicities rat and the numbers m.-) satisfying and a nonsingular matrix X such that (1..

the matrix becomes diagonalizable (Example 1. Note that the first and only the first principal vector associated with a Jordan block is an eigenvector.7). Under the circumstances there is an increasing tendency to turn whenever possible to friendlier decompositions—for example. in principle. The spaces spanned by the columns of the Xi are examples of eigenspaces (to be treated in Chapter 4). EIGENSYSTEMS • We can partition X at several levels corresponding to levels in the decomposition. If we now partition X{ in the form in conformity with (1. It then follows from the relation AX = XJ that i. First partition in conformity with (1. However.. and reports of its death are greatly exaggerated.15). Nonetheless the simple relations (1. the action of A on the matrix Xi is entirely specified by the matrix Jt. The Jordan canonical form is now less frequently used than previously. as a mathematical probe the Jordan canonical form is still useful.e. so that. Perturbations in the matrix send the eigenvalues flying.22 CHAPTER l. The simplicity of its blocks make it a keen tool for conjecturing new theorems and proving them. If we know the nature of the defectiveness in the original matrix.17) is called a sequence of principal vectors of A. Finallv. . to the Schur decomposition.14). if we write then the vectors satisfy A set of vectors satisfying (1. Against this must be set the fact that the form is virtually uncomputable. then it may be possible to recover the original form. But otherwise attempts to compute the Jordan canonical form of a matrix have not been very successful.17) make calculations involving principal vectors quite easy. it can be shown that Thus the actions of A on the matrices Xij are specified by the individual Jordan blocks in the decomposition.

as is Watkins's Fundamentals of Matrix Computations [288]. which was introduced by Hilbert [116. "characteristic" has an inconveniently large number of syllables and survives only in the terms "characteristic Vol II: Eigensystems . The nomenclature for integral equations remains ambiguous to this day. Demmel. There are too many of these books to survey here. The classics of the field are Householder's Theory of Matrices in Numerical Analysis [121] and Wilkinson's Algebraic Eigenvalue Problem [300]. Datta's Numerical Linear Algebra and Applications [52] is an excellent text for the classroom. but they only gained wide acceptance in the twentieth century. He makes the interesting point that many of the results we think of as belonging to matrix theory were derived as theorems on determinants and bilinear and quadratic forms. History and Nomenclature A nice treatment of the history of determinants and matrices. 96]. and van der Vorst [10].) The nontrivial solutions of the equations Ax = Xx were introduced by Lagrange [155. is given by Kline [147. Ch.SEC.119]. 1762] to solve systems of differential equations with constant coefficients. The term "characteristic value" is a reasonable translation of Eigenwert. should be made of Bellman's classic Introduction to Matrix Analysis [18] and Horn and Johnson's two books on introductory and advanced matrix analysis [118. 1. see [213. It is a perfect complement to this volume and should be in the hands of anyone seriously interested in large eigenproblems. Chatelin's Eigenvalues of Matrices [39] and Saad's Numerical Methods for Large Eigenvalue Problems: Theory and Algorithms [233] are advanced treatises that cover both theory and computation. However. NOTES AND REFERENCES General references 23 The algebra and analysis of eigensystems is treated in varying degrees by most books on linear and applied linear algebra. however. Trefethen and Bau's Numerical Linear Algebra [278] is a high-level. More or less systematic treatments of eigensystems are found in books on matrix computations. They contain a wealth of useful information. Special mention should be made of Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide edited by Bai. p. Dongarra. including eigenvalues.7. (Matrices were introduced by Cayley in 1858 [32]. Our term "eigenvalue" derives from the German Eigenwert. At some point Hilbert's Eigenwerte inverted themselves and became attached to matrices. Special mention. Eigenvalues have been called many things in their day. Ruhe. 1904] to denote for integral equations the reciprocal of our matrix eigenvalue." not for the polynomial equation but for the system Ax — Xx. It consists of sketches of all the important algorithms for large eigenproblems with brief treatments of the theoretical and computational background and an invaluable bibliography. insightful treatment with a stress on geometry. THE ALGEBRA OF EIGENSYSTEMS 1. Golub and Van Loan's Matrix Computations [95] is the currently definitive survey of the field. for example. 33]. In 1829 Cauchy [31] took up the problem and introduced the term "characteristic equation.

Thus we can use "eigensystem." or "eigenexpansion" without fear of being misunderstood. The day when purists and pedants could legitimately object to "eigenvalue" as a hybrid of German and English has long since passed.16) and then stated that it easily generalizes. and an extensive bibliography. and it can often replace more sophisticated decompositions—the Jordan canonical form. The customary representation is vec(X) in which the columns of X are stacked atop one another in their natural order. In this representation the Sylvester equation becomes where ® is the Kronecker product defined by Algorithm 1.) Other terms are "latent value" and "proper value" from the French valeur propre. Sylvester considered the homogeneous case AX . Chapter 2). The difference is that a real Schur form is used to keep the arithmetic real. J. The decomposition can be computed stably by the QR algorithm (see §§2-3. Golub. 1884]." (For symmetric matrices the characteristic equation and its equivalents are also called the secular equation owing to its connection with the secular perturbations in the orbits of planets.XB = 0 [273.24 CHAPTER 1.1 is a stripped down version of one due to Bartels and Stewart [14]. Ch. The earliest use I can find is in Parlett's Symmetric Eigenvalue Problem [206]. For A and B of order two he showed that the equation has a nontrivial solution if and only if A and B have a common eigenvalue (essentially Theorem 1. The term "eigenpair" used to denote an eigenvalue and eigenvector is a recent innovation. and Van Loan [90] have proposed a Hessenberg-Schur variant that saves some operations. EIGENSYSTEMS equation" and "characteristic polynomial. The German eigen has become a thoroughly naturalized English prefix meaning having to do with eigenvalues and eigenvectors. Higham [114. Sylvester's equation Sylvester's equation is so called because J. which contains error and perturbation analyses. The linearity of the Sylvester operator means that it has a matrix representation once a convention is chosen for representing X as a vector. Nash. Schur form The proof of the existence of Schur decompositions is essentially the one given by Schur in 1909 [234] — an early example of the use of partitioned matrices. historical comments. . 15] devotes a chapter to Sylvester's equation." "eigenspace. for example—in matrix algorithms.

But once the existence of eigenpairs is established. We begin with a brief review of matrix and vector norms. Consequently. 2.139. NORMS. Example 1. 57. an algebra of eigensy stems can be erected on the definition. Leaving aside the possibility that the smallest cluster may consist of all the eigenvalues. The characteristic polynomials of the Jordan blocks of a matrix A are called the elementary divisors of A. Jordan canonical form The Jordan canonical form was established in 1874 by Camile Jordan [136]. AND MATRIX POWERS Block diagonalization 25 Although rounding error prevents matrices that are exactly defective from arising in practice. Block diagonalization would seem to be an attractive way out of this problem.7 shows that eigenvalues that should be clustered do not have to lie very close to one another. 2. For matrices the metric is generated by a norm. MATRIX AND VECTOR NORMS Definition Analysis happens when a metric meets an algebraic structure. and reduce the matrix to block diagonal form. one frequently encounters matrices with very ill-conditioned systems of eigenvectors. 218] for the issues. We now turn to the analytic properties of eigensystems. NORMS. a topic which will often recur in the course of this volume.99]. which measure the size of matrices and vectors.61. All the above comments about computing block diagonal forms of a matrix apply to the Jordan form—and then some. as the following definition shows. SPECTRAL RADII.140. Vol II: Eigensystems . the problem is to find the clusters. SPECTRAL RADII. It is a generalization of the absolute value of a scalar. For more on norms see §1:4.1.60. 2. one can afford to lavish computation on their subsequent analysis. See [96. For more see [17. a function that in some sense measures the size of a matrix. The first section of this chapter was devoted to that algebra.1. The idea is to generate clusters of eigenvalues associated with nearly dependent eigenvectors.SEC. Thus to say that A has linear elementary divisors is to say that A is nondefective or that A is diagonalizable. this sensible-sounding procedure is difficult to implement. If the blocks are small. AND MATRIX POWERS Eigenpairs are defined by a nonlinear equation and hence are creatures of analysis. The results are then applied to determine the behavior of powers of a matrix. find bases for their eigenspaces. In the next subsection we study the relation of norms to the magnitude of the largest eigenvalue of a matrix—the spectral radius of the matrix. Unfortunately. schemes for automatically block diagonalizing a matrix by a well-conditioned similarity transformation have many ad hoc features.

Examples of norms The norms most widely used in matrix computations are the 1-norm. The third condition—called the triangle inequality— specifies the interaction of the norm with the matrix sum. In particular. EIGENSYSTEMS Definition 2. It is frequently used in the equivalent form In this work we will always identify C n x l with the space of n-vectors C n . for reasons that will become evident shortly.1.26 CHAPTER 1. in which case their definitions take the following forms: The Frobenius norm has the useful characterization which in the case of a vector reduces to The Frobenius vector norm is also called the Euclidean norm because in E2 and R3 the quantity \\x\\p is the ordinary Euclidean length x. A MATRIX NORM ON C mxn is a function \\-\\: C m x n ->• ]R satisfying the following three conditions: The first two conditions are natural properties of any measure of size. and we call a norm on either space a vector norm. . The Frobenius norm of a vector x is usually written \\x\\2. and the oo-norm defined by Each of these norms is actually a family of matrix norms in the sense that each is defined for matrices of all orders. these norms are defined for vectors. the Frobenius norm.

The vector 1-norm generates the matrix 1-norm and the vector co-norm generates the matrix oo-norm. The vector 2-norm generates a new norm called the matrix 2-norm or the spectral norm. Denote by fl( A) the result of rounding the elements of A. NORMS. It is written || • |J2. and v or the matrix norm subordinate to IJL and v. Norms and rounding error As an example of the use of norms.Unlike the 1-norm. it is difficult to calculate. we say that the 2-norm is unitarily invariant. 2. Because of property 7. Specifically. We have already seen examples of operator norms. we have n(A) = A + h. and the Frobenius norm. Then elementwise where In terms or matrices. AND MATRIX POWERS Operator norms 27 Let ^ be a vector norm on C m and v be a vector norm on C n . Here is a list of some of its properties. Nonetheless it is of great importance in the analysis of matrix algorithms. SPECTRAL RADII. suppose that the machine in question has rounding unit eM— roughly the largest number for which fl(l + e) = 1. let us examine what happens when a matrix A is rounded. where the elements ot L are o^ et-j. It is called the operator norm generated by fj. where fl denotes floating point evaluation.SEC. The Frobenius norm is also unitarily invariant. It follows that Vol II: Eigensystems . Then it is easily verified that the function || • ||M)I/ defined on C m x n by is a matrix norm. the oo-norm.

wehave||£|| < \\A\\cM or The left-hand side of (2. On the other hand. What we have shown is that: In any absolute norm. although (2. and 1 for the third. which has no accuracy at all. Let Then it is easily verified that so that x has low normwise relative error. It is 10~5 for the first component. the commonly used one. Unfortunately. In general. as the following example shows. and infinity norms are absolute norms. Unfortunately the converse is not true. 10~2 for the second. then the matrix as a whole has a small normwise relative error. Now in many applications the accuracy of small elements is unimportant. and in such applications one must either scale the problem to fit the norm or use a norm that gives greater weight to small elements. Frobenius. But in others it is. Incidentally. the 2-norm is not. the relative errors in the components vary. a small normwise relative error guarantees a similar relative error only in the largest elements of the matrix.28 CHAPTER l. Limitations of norms We have just shown that if the elements of a matrix have small relative error. Example 2. The property of consistency does the same thing for the product of two matrices. Consistency The triangle inequality provides a useful way of bounding the norm of the sum of two matrices.2) is called the normwise relative error in fl( A). EIGENSYSTEMS Thus in any norm that satisfies 11 \A\ \\ = \\A\\ (such a norm is called an absolute norm). . the smaller elements can be quite inaccurate.2) is still a good approximation. with balanced norms like the oo-norm. rounding a matrix produces a norm wise relative error that is bounded by the rounding unit.2.

2-. oo. In this case consistency reduces to • Matrix norms do not have to be consistent. For example. In fact there are many such norms. NORMS. But Vol II: Eigensystems . if then • The 1-. the function denned by is a norm.SEC. whenever the product AB is defined. || • ||m.n. But it is not a consistent norm. For example. These norms are CONSISTENT if for all A eC / x m and#e<C m X n we have Four comments on this definition. 2.2. Let \\ . and oo-norms as well as the Frobenius norm are consistent in the sense that for p = 1. • The Frobenius and 2-norms are consistent in the sense that Given a consistent matrix norm || • || on C n x n it is natural to ask if there is a vector norm */ on C n that is consistent with || • ||. For let y 6 Cn be nonzero. C m X n . Then it is easily verified that the function i> defined by is a vector norm. a n d C / x n . AND MATRIX POWERS 29 Definition 2. SPECTRAL RADII. • An important special case is a matrix norm || • || defined on 7£ nxn .\\itTn. and || • ||/tn be matrix norms defined on <C /Xm . F.3.

we write At first glance. 2.. . as a practical matter they may say very different things about a converging sequence. This distance function can in turn be used to define a notion of convergence for matrices. This is an immediate consequence of the following theorem on the equivalence of norms. . .2. For an example. Although all norms are mathematically equivalent. The proof of the following theorem is left as an exercise. Let \JL and v be norms on C m x n .5.6.6f/| -»• 0 (i = 1.. In particular.4.30 Thus we have shown that: CHAPTER 1.. it is easy to see that \\Ak . THE SPECTRAL RADIUS Definition In this subsection we will explore the relation of the largest eigenvalue of a matrix to its norms. if {Ak} is a sequence of matrices. If\\ • || is consistent. we may define the distance between two matrices A and B by \\A — B\\. EIGENSYSTEMS Any consistent matrix norm onCnxn has a consistent vector norm. Then there are positive constants a and T such that for allAe€mxn Thus it does not matter which norm we choose to define convergence. this definition of convergence appears to depend on the choice of the norm || • ||. But in fact. (2-4) Another useful construction is to define a norm by multiplying by a matrix. Thus convergence in any norm is equivalent to the componentwise convergence of the elements of a matrix. j = 1. Theorem 2. Specifically. Theorem 2.IfUis nonsingular.11.. then it converges in any norm. For a proof see Theorem 1:4. then the functions are norms. Norms and convergence Given a matrix norm || • || on C m x n .. m.4. at least in the initial stages. Let \\-\\bea norm onCHXn. see the discussion following Example 2.. then so is Vsim(A). We begin with a simple but far-reaching definition.B\\p -»• 0 if and only if |aj^ . if a sequence converges in one norm. n).

but ||«/o(0)||oo = 1. Thus the spectral radius is the magnitude of the largest eigenvalue of A. It also tracks the powers of a matrix with greater fidelity than a norm. there is no norm that reproduces the spectral radius of J2(0). Let A be of order n. and let \\ • || be a consistent norm on C n x n . there are norms that approximate it. Let A be of order n. In general. as the following theorem shows.8. Let i/ be a vector norm consistent with the matrix norm || • || [see (2.e (depending on A and e) such that Vol II: Eigensystems .7. However.SEC. We begin our investigation of norms and spectral radii with a trivial but important theorem. For example. NORMS. If it is large. any norm of J2(0) must be positive.6. 2. something worse is true. Specifically. then Since A is an arbitrary eigenvalue of A. AND MATRIX POWERS 31 Definition 2. If (A. Then Proof. In fact. Let A be of order n. Then for any e > 0 there is a consistent norm II • IU. p(Ak) = p(A)k. Theorem 2. Norms and the spectral radius The spectral radius has a normlike flavor. a consistent norm will exceed the spectral radius of A. SPECTRAL RADII. the Jordan block has spectral radius zero. the matrix must be large. whereas the best we can say for a consistent norm is that || Ak\\ < \\A\f. Since J2(0) is nonzero. The eigenvalue A is called a DOMINANT EIGENVALUE and its eigenvector is called a DOMINANT EIGENVECTOR.4)]. In other words. Theorem 2. Then the SPECTRAL RADIUS of A is the number We call any eigenpair (A. x) is an eigenpair of A with j/(z) = 1. x) for which |A| = p(A) a DOMINANT EIGENPAIR. It gets its name from the fact that it is the radius of the smallest circle centered at the origin that contains the eigenvalues of A.

For 77 > 0.6). B?). If the dominant eigenvalues of A are nondefective. there is an 77 such that HD"1 JVjD||oo < c.32 CHAPTER 1. where B\ contains the dominant eigenvalues of A and BI contains the others.j)-element of D 1ND is Vijrf *.e by then by Theorem 2. we can reduce A to the form where AI is a diagonal matrix containing the dominant eigenvalues of A. we can choose a diagonal matrix DJJ — diag(/.7) it satisfies (2. Specifically. Since the diagonals of A2 are strictly less than p(A). by Theorem 1. EIGENSYSTEMS If the dominant eigenvalues of A are nondefective. Since B\ is not defective. and by (2. and N is strictly upper triangular. let Then for i < j the (i. D^) as above so that The norm \\-\\A defined by .4 the norm || • \\^t£ is consistent. It follows that If we now define || • \\A. Since for i < j we have TJJ~I ->• 0 as T? -> 0. Let be a Schur form of A in which A is diagonal and N is strictly upper triangular.19 we can reduce A to the form diag(i?i. we can diagonalize it to get the matrix AI . A2 is a diagonal matrix containing the remain eigenvalues of A. then there is a consistent norm \\-\\A (depending on A) such that Proof. We can then reduce BI to Schur form to get the matrix A2 + N.

Let A be of order n and let e > 0 be given. • 33 Theorem 2.3. the errors in certain iterative algorithms for solving linear systems increase decrease as the powers of a matrix called the iteration matrix.« for all B. We will see that this norm may be highly skewed with respect to the usual coordinate system—see the discussion following Example (2. e. Then It follows that || Ak\\2 > p(A)h. AND MATRIX POWERS now does the job. (depending on A. as we shall show in the next subsection. For this reason.8 shows that the spectral radius of a matrix can be approximated to arbitrary accuracy by some consistent norm. By the equivalence of norms. Then Vol II: Eigensystems . SPECTRAL RADII. Then Again by the equivalence of norms. 2. For example. there is a constant a > 0 such that \\B\\ > o-\\B\\2 for all B.Z > 0 such that ||J?|| < TA. Asymptotic bounds The basic result is contained in the following theorem. let || • \\A. we may take e = 0. Then for any norm \ \ . we now turn to the asymptotic behavior of the powers of a matrix. Nonetheless.c||£|U.SEC. It is therefore important to have criteria for when we observe a decrease. Equally important. Hence For the upper bound. it can be used to establish valuable theoretical results. Let (A. NORMS. x) be a dominant eigenpair of A with ||a:||2 = 1. Theorem 2.11). there is a constant TA.9.e be a consistent norm such that || A||4ie < p( A) + e.(. 2.\ \ there are positive constants cr (depending on the norm) and TA. Proof. MATRIX POWERS Many matrix processes and algorithms are driven by powers of matrices. and the norm) such that If the dominant eigenvalues of A are nondefective. we need to know the rate of decrease. We will first establish the lower bound.

we can repeat the proof of the upper bound with a consistent norm || • \\A (whose existence is guaranteed by Theorem 2. • The theorem shows that the powers of A grow or shrink essentially as p(A)k.1 (called the hump) plots these norms against the power k. Limitations of the bounds One should not be lulled by the behavior of J2(0. Instead of decreasing. EIGENSYSTEMS If the dominant eigenvalues of A are nondefective.9) are called LOCAL CONVERGENCE RATIOS because they measure the amount by which the successor of a member of the sequence is reduced (or increased).8) for which \\A\\A = p(A).10. Thereafter they decrease. It is instructive to look at a simple case of a defective dominant eigenvalue. Fork = 1. whose norms decrease monotonically at a rate that quickly approaches 2~k. The co-norm of the powers of J?f 0. It follows immediately from Theorem 2. A matrix whose powers converge to zero is said to be convergent.g.5) are Hence and The ratios in (2. Here is a more extreme example. We have to add "essentially" here because if a dominant eigenvalue is defective. Example 2. The local convergence ratios are never quite as small as the asymptotic ratio one-half. the ratio is about 0. where c can be arbitrarily small.5)fc.99 only when k = 563. For k = 10. the convergence ratio is |..99) not only take a long time to assume their asymptotic behavior.55. e. the norms increase to a maximum of about 37 when k = 99. so that the norms decrease from the outset.99) are The upper graph in Figure 2.99 = -0. which eventually approximate the asymptotic value of log 0. but they quickly approach it. but they fall below their initial value of 1.34 CHAPTER 1. The lower plot shows the natural logarithms of the local convergence ratios.9 that: A matrix is convergent if and only if its spectral radius is less than one.11.01. but they exhibit a hellish transient in the process. Example 2. . From these graphs we see that the powers of J2(0. The powers of the Jordan block J2(0. the bound must be in terms of of p(A) + €.

then the norm || • \\A. is damped by the factor 0. Of course the norm of a general matrix would be unnaturally large. so that />[J2(0. because the (2.001. so that the norm remains small. In matrix analysis one frequently introduces a norm designed to make a construction or a proof easy. There is a lesson to be learned from this.99)fc. Be on guard.1)element of «/2(0.C given by Theorem 2.99)* is zero and hence is not magnified by this factor. AND MATRIX POWERS 35 Figure 2. Many people will go through life powering matrices and never see anything as dramatic as J2(0.99)].99)fc.1: The hump and convergence ratios Which example—2. NORMS.99)] + e is near p[J2(0. SPECTRAL RADII. which accounts for the hump.SEC. If we choose e = 0. The bound (2. this VolII: Eigensystems .2)-element of J2 (0. But the (2.8 has the form The (2.8) gives no hint of the ill behavior of J2(0. such examples do occur.11—is typical? The answer is: It depends. l)-element is multiplied by 1000. Mathematically. 2. and sometimes their effects are rather subtle. However.10 or 2.001. and it is instructive to see why.99).

Ch. Norms and the spectral radius In his Solution of Equations and Systems of Equations Ostrowski [ 194. But unless the norm bears a reasonable relation to the coordinate system one is working in (the usual unit vectors for Example 2. the perturbations Ei may create eigenvalues of magnitude greater than one. though he gives no reference. The existence of the hump illustrated in Figure 2.9. we have Matrix powers For a detailed treatment of matrix powers—exact and computed—see [ 114. the defective eigenvalue near one will." Ostrowski's proof is an elegant and amusing application of the Jordan form. §17. 2. In the example of the last paragraph. EIGENSYSTEMS is sound practice.36 CHAPTER 1. but the gist of what is happening this. for sufficiently large e. A matrix norm and a consistent vector norm are sometimes called compatible norms.11). In the same book Ostrowski gives essentially Theorem 2. There is no one.1960] attributes Theorem 2. If A has a defective eigenvalue near one. NOTES AND REFERENCES Norms For treatments of matrix and vector norms see the general references in §1. the approach has the disadvantage that one must choose a suitable value for the free parameter €—not an easy problem. cause the pseudospectrum to extend outside the unit circle. the powers of a matrix can exhibit erratic behavior.7 to Frobenius. . The computed jth column of Ak satisfies where ||£"j||/m|| is of order of the rounding unit. the results may be disappointing or misleading.4. simple explanation. An important approach to this problem is the pseudospectrum In many instances. and a nominally convergent matrix can even diverge.1 has lead to various attempts to bound maxfc || Ak\\.1 of this series.4. stating that it "is unpublished. Since the only nonzero elements of N are ones on its superdiagonal. When computed in finite precision. in which case the asymptotic behavior will not be described by Theorem 2.8. which are described in [114. However. the location of the pseudospectrum in the complex plane explains the behavior of the matrix powers. The reader should be aware that some authors include consistency in the definition of a matrix norm.1]. Let A be the diagonal matrix of eigenvalues of A and let X~l(t~l A)X = e"1 A + N be the Jordan form of €~l A. 17].7 or in §1:1.

We conclude with results on the perturbation of the eigenvalues of Hermitian matrices and the singular values of general matrices. The first is a bound that establishes the continuity of the eigenvalues of a matrix. PERTURBATION THEORY 37 Although the pseudospectrum has been introduced independently several times. First. Most of the results in §3. and in §3. Trefethen [276. the current high state of the art is largely due to L. We will treat these generalizations in Chapter 4. The continuity of eigenvalues A fundamental difficulty with the perturbation of eigenvalues is that they need not be differentiable functions of the elements of their matrix. the eigenvalues are continuous at € = 0.ac. as the following theorem shows. VolII: Eigensystems . the eigenvalues of any matrix are continuous functions of the elements of the matrix. 3. We will consider three topics. where ||£||/||yl|| is of the order of the rounding unit for the computer in question.SEC. In fact.2 we will consider the perturbation of eigenpairs.2 generalize to eigenspace. PERTURBATION THEORY In this section we will be concerned with the sensitivity of eigenvalues and eigenvectors of a matrix to perturbations in the elements of the matrix.x) that satisfy (A + E)x = Xx. N. The second topic is Gerschgorin's theorem. For more see the Pseudospectra Gateway: http://web.1. the results are often of use in the design and analysis of algorithms. In §3. The perturbation theory of eigenvalues and eigenvectors is a very large subject.uk/projects/pseudospectra/ 3. 277. and we will only be able to skim the most useful results. A knowledge of the sensitivity of the eigenpair will enable us to predict the accuracy of the computed eigenvalue.ox. and this expression is not differentiable at e = 0. THE PERTURBATION OF EIGENVALUES This subsection is concerned with the problem of assessing the effects of a perturbation of a matrix on its eigenvalues. the eigenvalues of the matrix are 1 ± ^/e. Nonetheless. Second.comlab.1 we will consider the perturbation of individual eigenvalues. a useful tool for probing the sensitivity of individual eigenvalues. For example. 3. There are two reasons for studying this problem. many of the algorithms we will consider return approximate eigenpairs (A. 279].

EIGENSYSTEMS Theorem 3.. since if it were its complex conjugate would also be an eigenvalue—a second eigenvalue that satisfies the bound. this does not preclude the possibility of two eigenvalues of A attaching themselves to a simple eigenvalue of A and vice versa. A cannot be complex. jn of the integers 1. to an eigenvalue A of multiplicity m of A there correspond m eigenvalues of A.1 (Eisner). • The theorem establishes the continuity of eigenvalues in a very strong sense. the eigenvalues of are where u.1) is used more in theory than in practice. i_ The exponent £ in the factor \\E\\% is necessary. Thus a perturbation of order e has the potential of introducing a perturbation of order e« in the eigenvalues of a matrix. Moreover. • The theorem has a useful implication for the behavior of the simple real eigenvalues of a real matrix under real perturbations. pairs up the eigenvalues of A and A in such a way that each pair satisfies the bound (3.. n such that • There are three points to be made about this theorem.. However. each satisfying the bound (3.. . the bound (3.1). . . It is fairly easy to show that for every eigenvalue A of A there must be an eigenvalue A of A satisfying the bound (3. Then there is exactly one eigenvalue A of A satisfying the bound (for otherwise there would be too few eigenvalues of A to pair up with the other eigenvalues of A). ... . it is important to stress that this example is extreme—the eigenvalue in question is totally defective. on the other hand.. Hence: The simple real eigenvalues of a real matrix remain real under sufficiently small real perturbations.1). Let the eigenvalues of A — A + E be A j .1) is less than half the distance between A and its neighbors. is the primitive nth root of unity. However. Then there is a permutation j\. and vice versa. For this reason. For example. A n .38 CHAPTER 1. A n .1). In particular. .. though perhaps magnified by a large order constant. Let A be such an eigenvalue and let E be small enough so that the right-hand side of (3. Garden variety eigenvalues are more likely to suffer a perturbation that is linear in the error.1. Theorem 3. ... Let A the (possibly multiple) eigenvalues be A i.

PERTURBATION THEORY Gerschgorin theory 39 The problem with the bound in Theorem 3. and assume without loss of generality that £t = 1. Theorem 3. Proof. if the union ofk of the setsQi are disjoint from the others. Let A be of order n and let the GERSCHGORIN DISKS of A be defined by Then Moreover. The eigenvalues of A(Q) are the diagonal elements a. then that union contains exactly k eigenvalues of A. Let & be a component of x with maximum modulus.-.. Initially the union VolII: Eigensystems . Let (A. From the ith row of the equation Ax = Xx we have or Taking absolute values and remembering that |£j| < 1. (Jt=i Q^ is disjoint from the other disks. It must therefore be large enough to accommodate the worst possible case. which makes it too large for everyday work. and they lie in the the (degenerate) disks ft(0). As t proceeds from zero to one. where D — diag(an. ann).-.SEC. say. Let A = D + B. we find that which says that A 6 (/. Let Qi(t) be the Gerschgorin disks for A(t) = D + tB. . x) be an eigenpair of A. A cure for this problem is to derive bounds for individual eigenvalues.1 is that it bounds the perturbation of all the eigenvalues of any matrix. U(i) = \Ji=i Gji(t) is also disjoint from the other disks Qi(i).. The statement about the disjoint union can be established by a continuity argument. 3. Now suppose that. Then for 0 < t < 1. the radii of the disks Qi(i) increase monotonically to the radii of the Gerschorin disks of A. Gerschgorin's theorem provides a method for doing this.2 (Gerschgorin)..-.

from which we conclude that A has an eigenvalue AI that satisfies\Xi — 1| < 2. but they badly overestimate the deviation of the eigenvalue near one. the actual deviations are 1. U(t) can neither lose nor acquire eigenvalues. U( 1) = U?=i GK contains exactly k eigenvalues of A. as t increases. 1.4-10"3. and A4 that satisfy At first glance this would appear to be a perfectly satisfactory result. and their deviation is approximately the size of the error.7e-03. Let A be the matrix of Example 3. Moreover. and let P = diag(l(T3.0e-03. which violates continuity. The remaining three disks are centered at 2 and are not disjoint. Example 3. which is illustrated in the following example. 1.1). We can remedy this defect by the method of diagonal similarities. A has three eigenvalues \2. • Gerschgorin's theorem is a general-purpose tool that has a variety of uses. it is especially useful in obtaining bounds on clusters of eigenvalues. Hence. EIGENSYSTEMS £/(0) contains exactly k eigenvalues of A(0). Consider the matrix The radii of the Gerschgorin disks for this matrix are 2. Example 3. However. then the above examples show that the eigenvalues of A lie near its diagonal elements. AS. 2. If A is regarded as a perturbation of diag(l.4e-03. In perturbation theory. -7.2. For that would require an eigenvalue to jump between U(i) and the union of the other disks. 1.1.2). -9.6e-04.5e-03. 4.3.4e-04. Thus the Gerschgorin disks locate the eigenvalues near two fairly well. The first disk is disjoint from the others.3e-06. Then .40 CHAPTER 1. Hence.1.2e-03.4.3.

7e-02.4e-01. Theorem 3. What we have shown—at least for this matrix—is that if a diagonal element is well separated from the other diagonal elements then an off-diagonal perturbation moves the eigenvalue by a small amount that is generally proportional to the square of the perturbation. PERTURBATION THEORY and the radii of the Gerschgorin disks are 41 2.1e-02. For more.(MO~3 of (wo.3) we know that the remaining eigenvalues are within 4. whose eigenvalues are the same as DAD~l. VolII: Eigensystems . see the notes and references. choosing the matrix D by inspection. A trivial but useful consequence is that Another important consequence is the following interlacing theorem.3 is an off-diagonal perturbation of a diagonal matrix. The similarity transformation DAD~l has reduced the radius of the first disk while increasing the radii of the others. Hence we know that A. 3. On the other hand. from (3.5 (Fischer).4) and (3. • Although we have proceeded informally. They have far-reaching consequences. 9. it is possible to formalize the method of diagonal similarities. has an eigenvalue within 2.SEC. 6. • The matrix in Example 3. Let the Hermitian matrix A have eigenvalues Then and The formulas (3. There are two comments to be made about this example. Nonetheless. 8. the first disk remains disjoint from the others. The first result is a far-reaching characterization of the eigenvalues of a Hermitian matrix.4-10"6 of one.5) are called respectively the min-max and the max-min characterizations. Hermitian matrices In this subsection we will state without proof some useful results on the eigenvalues of Hermitian matrices.4e-06.

Hence we have the following corollary. there is a difference. the eigenvalues \i and //. Theorem 3. the pairing is explicitly determined by the order of the eigenvalues. The inequality (3. It is noteworthy that like Theorem 3.. If U consists of m columns of the identity matrix. In the terminology of perturbation theory. A.1 the pairing is not explicitly given. then the eigenvalues m of B satisfy (3.42 CHAPTER 1. The final consequence of Fischer's theorem is the following theorem on the perturbation of eigenvalues. and Ebe Then Consequently. Let the eigenvalues of A. Let A and A = A + E be Hermitian. a perturbation in the matrix makes a .7.1 this theorem is about pairs of eigenvalues. However.e. satisfy i. Corollary 3. In Theorem 3. If B is a principal submatrix of A of order m.8.8. then U^AU is a principal submatrix of A of order m. Let A beHermitian of order n with eigenvalues U Let U e Cnxm be orthonormal and let the eigenvalue ofUHAU be Then The theorem is called the interlacing theorem because when m = n-1.8) says that the eigenvalues of a Hermitian matrix are perfectly conditioned—that is. EIGENSYSTEMS Theorem 3. In Theorem 3. (3.7) gives an interval in which the ith eigenvalue must lie in terms of the smallest and largest eigenvalues of E.6).6. they interlace.

then the quantity is accurate to about k decimal digits. Since the Frobenius norm of a matrix is greater than or equal to its 2-norm.8e-07. 1.OOOOe+00. 1. Thus (3. 1.0002e-01. (We have reported only the first five figures of the element of A. a stronger result is also true. Example 3.9) implies that the smaller th eigenvalue the fewer the accurate digits.9e-05. 3. The perfect condition of the eigenvalues of a Hermitian matrix does not.8 holds for the Frobenius norm. In the notation of Theorem 3. 1.0180e-03. The example not only illustrates the relative sensitivity of the smaller eigenvalues.10~2.8. PERTURBATION THEORY 43 perturbation that is no larger in the eigenvalues. but it shows the limitations of bounds based on norms.8e-07. 1.9. 1. The orem 3. 9. 2.10 (Hoffman-Wielandt). are in general smaller than the bound would lead one to expect. The relative errors in these eigenvalues are 2. 1.SEC.8e-05.9e-04.) If we perturb A by a matrix E of2-norm 10~4 we get a matrix whose eigenvalues are l.6e-05.6e-03. the relative error in At is A rule of thumb says that if the relative error in a quantity is about 10 *. imply that all the eigenvalues ar determined to high relative accuracy.9839e-03. The absolute errors in the eigenvalues. Theorem 3. In fact. andlO"3. It is seen that the relative errors become progressively larger as the eigenvalues get smaller.10"1. The matrix has eigenvalues 1. However. Vol II: Eigensystems . 1.8e-02. unfortunately. as the following example shows.

y is the left eigenvector of A corresponding to A. . x). It follows that: If (A. From the Cauchy inequality we know that where 7 < 1 with equality if and only if x and y are linearly dependent.10) above (x X) * = (y YE)H. In this case A corollary of this assertion is that if x is a simple eigenvector then its left eigen vector y is not orthogonal to x. x) is a simple eigenpair of A. z) be a simple eigenpair of A. In particular. EIGENSYSTEMS Up to now we have been concerned with eigenvalues.44 3.18 there is a nonsingular matrix (x X) with inverse (y y)H such that It is easily seen by writing that yRA — Aj/H and Y^A = MYE. Left and right eigenvectors Let (A. and we will actually treat the perturbation of eigenpairs. The angle between x and y will play an important role in the perturbation theory to follow. In general we define the angle between nonzero vectors x and y by Since in (3. It turns out that the perturbation theory for an eigenvector goes hand in hand with that of its eigenvalue. This insures that (up to a scalar factor) we have a unique eigenvector to study. we have yHx = 1. In R2 and R3 it is easy to see that the constant 7 is the cosine of the angle between the lines subtended by x and y. SIMPLE EIGENPAIRS CHAPTER 1. By Theorem 1. In this section we will examine the sensitivity of eigenvectors to perturbations in the matrix.2. then there is a left eigenvector y of A such thatyHx = 1. We will restrict ourselves to a simple eigenpair (A.

Thus to get a unique representation we must impose a normalization on x.11 below) that if E is small enough. It can be shown (Theorem 3. The key to a first-order analysis is to note that since (p and p approach zero along with Et products of these terms will become negligible when E is small enough. Since yHX = 0. We will suppose that A has been block-diagonalized as in (3. 3. we get the linearized equation We must now solve the linearized equation for (p and p. This is easily seen to be equivalent to representing x in the form where p is to be determined. and use the facts that Vol II: Eigensystems . or Since Ax = Ax.SEC. we have yHAXp = XyHXp = 0 and yHXp = 0. we will firs address the problem of approximating the perturbed eigenpair. Since yHx = 1. Ignoring such products. we have To solve for p multiply (3. The natural equation is Ax — Xx.10). The problem of representing x is complicated by the fact that x is determined only up to a scalar factor. PERTURBATION THEORY First-order perturbation expansions 45 We now turn to the perturbation theory of simple eigenpairs. we can write where (p is to be determined. Before establishing that fact. Here we will require that yHx = 1. and a small perturbation A = A -f E. we have Unfortunately this equation is nonlinear in the parameters (p and p. x) as E approaches zero. Specifically. we are given a simple eigenpair (A.13) by Y H . then there is a simple eigenpair (A. multiply by yE. To solve for <p. x) that approaches (A. The first order of business is to find suitable representations of A and x. x) of A. Since A is a scalar. We must now set up an equation from which (p and p may be determined.

46 CHAPTER l. x) be a simple eigenpair of A. a. Let (x X) be a nonsingular matrix such that where and set Let || • || be a consistent norm and define If then there is a scalar (p and a vector p such that is an eigenpair of A with . we have not established that a perturbed pair exists to be approximated. Let (A. nor have we shown how good the approximations actually are. EIGENSYSTEMS to get Mp + YHEx = \p. Since A is a simple eigenvalue. The following theorem. For more.11. Hence AJ — M is nonsingular and In summary: The first-order expansion of the eigenpair (A. see the notes and references. However. the eigenvalues of M are not equal to A. fills in these gaps. Theorem 3.) under the perturbation A + E of A is Rigorous bounds We have derived approximations to the perturbed eigenpair. which we give without proof.

normalized so that yHx = 1. Generalizations of this scalar quantity will play an important role in algorithms for large eigenproblems. ||/£||. and ||F||22 are all 0(||£||). 47 and There are many things to be said about this theorem. Specifically. For if we take E = ee^e. Moreover from (3.E||2). PERTURBATION THEORY Moreover.11 say very specific things about how a simple eigenpair behaves under perturbation. But to orient ourselves. In matrix terms: If (A. but they do not exhibit the 0(|| £ ||*) behavior that we have observed in defective matrices [see (3. Turning first to the eigenvalues. enough to give a subsubsection to each. j)-element of A. Qualitative statements The bounds in Theorem 3. we can write The quantity yHAx in (3. we will first describe the qualitative behavior. ||/12||. Since A + yuEx = yEAx.18) it follows that \\p\\ is also O(||E||). the simple eigenvalues are not only continuous.2)] Again by (3.20) that In other words. so that E represents a perturbation of size e in the (i. 3. then From this it follows that dX/daij is y{Xj. it follows from (3.20) implies that A is a differentiable function of the elements of its matrix.. Equation (3. then VolII: Eigensystems . we have that |y?n|. since <p = A — A.21) is called a Rayleigh quotient.SEC.20) our first-order approximation yHEx to (p is accurate up to terms of 0(||. 2) is a simple eigenpair of A and y is the corresponding left eigenvector.

) Thus: The condition number of a simple eigenvalue is the secant of the angle between its left and right eigenvectors. Moreover.11 is useful in establishing the existence of a perturbed eigenpair. (For a general discussion of condition numbers see §1:2. /^ 24) Two caveats apply to condition numbers in general. we have so that the eigenvector is also continuous. In our case. from (3. we see that the secant of the angle between x and y is a constant that mediates the effect of the error E on the eigenvalue A.20) By (3. . But it does not directly provide a condition number for the perturbed eigenvector. First. for most norms in common use the secant gives a reasonable idea of the sensitivity. we have from (3. they are useful only to the extent that the bound that generates them is sharp.12).yHEx). the bound on A was derived by taking norms in a sharp approximation to the perturbed eigenvalue A. If a condition number for a quantity is large. and the condition number will be an accurate reflection of the sensitivity A. the quantity is said to be well conditioned. The reason is that the vector p can be large. On the other hand.M) The condition of an eigenvalue l /2i to be a very good approximation to x. even when the perturbation in the directions between x and x is small. if we had not chosen the 2-norm.4.19) it follows that Thus we can expect x -f X(\I . The condition of an eigenvector Theorem 3.48 CHAPTER 1.Hence Ignoring the 0(|| JE?||2) term in (3. EIGENSYSTEMS Turning now to the eigenvector x. that it is easy to contrive large perturbations that leave A unchanged. In particular. we begin with the eigenvalue A. Such numbers are called condition numbers.23). we would not have ended up with as elegant a condition number as the secant of the angle between the left and the right eigenvectors. and. it provides a meaningful condition number for the perturbed eigenvalue. as we have just seen. Second. Keep in mind. Since A = A + yHEx + (<p. since x — x = Xp. If it is small. however. we know that ||x||2||y||2 = sec ^-(x-> y}. the condition number changes with the norm. the quantity is said to be /// conditioned. Hence the bound will generally be in the ballpark. Turning now to more informative bounds.

Theorem 3. It then follows that the approximation p to p in (3.15) to block diagonal form. Hence We are now in a position to bound the sine of the angle between x and x. For the rest of this subsection we will assume that this change has been made.11 assume thatY is orthonormal. so that the block diagonal form becomes diag(A. Then we can replace X in the block diagonal reduction (3. We begin by choosing a distinguished basis for the reduction (3. we have Since A 4 M. we can make p as large as we like compared to The cure for this problem is to bound the angle between x and x. 3. and let Y be an orthonormal basis for 7£(x)-1. Then the matrix (x Y) is unitary. Lemma 3.15) by XS and Y by YS~H.SEC. Then \\E\\- Proof. replace X by XS and Y by Y5~~H. The following lemma shows how do do this. In Theorem 3. PERTURBATION THEORY 49 To see this. let S be nonsingular. S~1MS). and that the condition (3. we will.14) becomes S~lp.13. Without loss of generality assume that ||or||2 = \\x\\z = 1.12. From the unitary invariance of || • |J2. we have But \XHX\ = cos(z. as above. with S chosen so that the new Y is orthonormal. VolII: Eigensystems . By choosing S appropriately.17) is satisfied in the 2-norm. Let x and x be nonzero. Then Proof. Specifically.£). We have (A + E)x = \x or Multiplying this relation by Yu and recalling that YHA = MYU. we will actually calculate the sine of the angle. Although we defined the angle between two vectors in terms of the cosine.

Unfortunately. M) can be far smaller than the separation of A from the spectrum of M. M) is a lower bound on the separation of A and the eigenvalues of M. EIGENSYSTEMS and by Lemma 3. sep( A. For if an eigenvalue of M is near A. this is just common sense.25) in the form Consequently.11 it follows that A —»• A as E -*• 0. we have the following theorem.15.50 CHAPTER l. Theorem 3. • Thus sep(A. then a small perturbation of A could make them equal and turn the one-dimensional space spanned by x into a two-dimensional space.12 and the definition (3.14. M) is a condition number for the eigenvector x. as the following example shows.7 The result now follows on inverting this relation. Proof.16) From Theorem 3. Hence we may rewrite (3. We will now consider some of the properties of the functions sep. Example 3. Let A = 0 and let M = Wm be themxm upper triangular matrix . Actually. Properties of sep The first and most natural question to ask of sep is how it got its name. Specifically. Otherwise put if an eigenvalue of M is near A we can expect the eigenvector x to be sensitive to perturbations in A. The answer is that it is a lower bound on the separation of A from the spectra of M. sep l (A. In any consistent norm. By Theorem 2.

It is an instructive exercise to use Wm to construct an example of this phenomenon. [Hint: See (3.15) of Theorem 3. This result. This example suggests the possibility that an eigenvector whose eigenvalue is well separated from its companions can still be sensitive to perturbations in its matrix. The simple eigenvectors of a Hermitian matrix also behave nicely.J . and sec Z(z. It follows that the angle between x and y is zero. On the other hand. the distance between A and the spectrum of Wm is one. is an eigenvector of A and x is the corresponding eigenvector of A = A + E. Hence which. is perfectly conditioned.16. of course.SEC.] Theorem 3. PERTURBATION THEORY 51 Since all the eigenvalues ofWm are one. Then in the 2-norm. Let A be real and M Hermitian. we may take X = Y. whatever its multiplicity. so that M is Hermitian. goes quickly to zero in spite of the excellent separation of A = 0 from the eigenvalues ofWm. as a consequence of the following theorem. which shows that any eigenvalue of a Hermitian matrix. It follows that if a. Thus the simple eigenvalues of a Hermitian matrix are optimally conditioned. Hermitian matrices If A is Hermitian then the right eigenvector x corresponding to the eigenvalue A is also the left eigenvector y.8). 3. with increasing m. VolII: Eigensystems .26). Now in the decomposition (3. whose proof is left as an exercise.11. is weaker than (3. in the co-norm But sothatyw-^loo = 2 m . y) = 1—the smallest possible value. then Thus for Hermitian matrices the physical separation of eigenvalues—as distinguished from the lower bound sep—indicates the condition of its eigenvectors.

Eisner.4 of this series. Also see [23.70] as improved by Bhatia. Chapter 4. see Bhatia's Matrix Analysis [23. Kato's Perturbation Theory for Linear Operators [146] presents the theory as seen by a functional analyst (although it includes a valuable treatment of the finite-dimensional case). Simple eigenpairs The results in this subsection were..1 is due to Eisner [69. Continuity of eigenvalues For an extensive survey of general results on the perturbation of eigenvalues of nonnormal matrices.119. 1931] is a consequence of the fact that a strictly diagonally dominant matrix is nonsingular. Theorem 3. In his subsequent Matrix Analysis [23] he treats eigenspaces of Hermitian matrices. Gerschgorin theory Gerschgorin's theorem [82. 269] for further references and historical notes.3. [39] [118]. [177]. There are several books devoted more or less exclusively to perturbation theory. and Krause [24]. For a formalization of the procedure see [175. leading Olga Taussky to write a survey entitled "A Recurring Theorem on Determinants" [274]. NOTES AND REFERENCES General references CHAPTER 1. Gerschgorin also showed that an isolated union of k disks contains exactly k eigenvalues and introduced the use of diagonal similarities to reduce the radii of the disks. developed for eigenspaces. This fact has been repeatedly rediscovered. [18]. Both books are a valuable source of references. Books whose titles include the words "matrix analysis" or "matrix inequalities" usually contain material on perturbation theory—e. Left and right eigenvectors The fact that left and right eigenvectors corresponding to a simple eigenvalue cannot be orthogonal is well known and can be easily deduced from a Schur decomposition.4.118.52 3. [168]. for the most part. .g. as Wilkinson has demonstrated in his Algebraic Eigenvalue Problem [300]. [119]. Eigenvalues of Hermitian matrices Much of the material presented here can be found in §1:1. Ch.285]. which will be treated in §2. EIGENSYSTEMS Most texts on numerical linear algebra contain at least some material on the perturbation of eigenvalues and eigenvectors (especially [121] and [300]). [20]. VIII]. Bhatia [22] treats the perturbation of eigenvalues with special emphasis on Hermitian and normal matrices. The approach taken in this section is along the line of Stewart and Sun's Matrix Perturbation Theory [269]. The technique is quite flexible.

also deducible from a Schur decomposition. Condition numbers Wilkinson [300. Since ||a:yH||2 = IHMMh = sec/(#. the bound becomes where Sep The justification for the introduction of sep to replace the actual separation of eigenvalues in the complex plane is the simplicity of bounds it produces. If Y is not orthonormal. formulas for derivatives can often be read off from a perturbation expansion [the assertion (3. Third.22) is an example]. Its drawback is Vol II: Eigensystems . In fact. We will return to these bounds when we treat eigenspaces in Chapter 4. then the matrix must be near one with a multiple eigenvalue [303]. where x and y are normalized right and left eigenvectors corresponding to a simple eigenvalue A. This result generalizes to representations of a matrix on an eigenspace [see (2. Theorem 3. Chapter 4].13 is a generalization to non-Hermitian matrices of the sin# theorem of Davis and Kahan [54].26). pp. PERTURBATION THEORY 53 The converse is not true—a multiple eigenvalue can fail to have nonorthogonal left and right eigenvectors. in perturbation theory a first-order expansion can yield very sharp asymptotic bounds. although for this to happen the eigenvalue must be defective. the condition of a simple eigenvalue is the norm of its spectral projection.SEC. the matrix xyH is the spectral projector corresponding to A. 3.252]. It is so called because (xy^)z is the projection of z on 'Jl(x) along ^(y)1-. I/ \s\ is. An analytic version of the result. Second. a first-order expansion is often the best way to get a handle on the perturbation theory of a new object. First. the approximations they furnish are often used in deriving and analyzing matrix algorithms. Ifx and y are normalized so that yHx = 1. the secant of the angle between x and y. 68-69] introduces the quantity s = yEx. They have also been used to generate iterations for solving nonlinear equations. and on the basis of a first-order perturbation analysis notes that the sensitivity of A is determined by l/|s|. In matrix analysis first-order expansions are useful in three ways. First-order perturbation expansions First-order perturbation expansions have long been used in the physical sciences to linearize nonlinear differential equations. is that if the left and right eigenvectors of a matrix are nearly orthogonal. Newton's method is the prototypical example. Rigorous bounds Theorem 3. A classroom example is the behavior of the pendulum under small oscillations.11 is a consequence of more general theorems due to the author [250. of course.?/).

The above categories are not disjoint. small eigenvalues may be well conditioned. 3. bidiagonal matrices.e. 2. the matrix and its perturbations must have some special structure.54 CHAPTER 1. tridiagonal matrices. 58. But we can discern some trends. This phenomenon is unexplained by our perturbation theory. if the perturbations are structured. a small perturbation in the arguments of sep makes an equally small perturbation in its value. however. 125. The following references emphasize surveys or papers with extensive bibliographies: [56. 1. for example. the matrix whose eigenvalues are (to five figures) 2. Perturbation theory for matrices with structured nonzero elements: e. even though the perturbation in the (1. as Example 3. Over the last two decades researchers have begun to investigate these and related problems. 5. and it is impossible to survey it here. This is generally true of normwise perturbation. If we replace A by we get eigenvalues 1.0000 and 5. where E and F are small.15 shows. 113.. Relative perturbation of eigenvalues: Determine the relative error in eigenvalues resulting from perturbations in the matrix. .. Perturbation theory for graded matrices. On the other hand it has the nice property that where || • || is the norm used to define sep.9995 and 4. EIGENSYSTEMS that sep is by no means a simple quantity. Componentwise perturbation of eigenvectors: Determine the error (preferably relative) in the individual components of eigenvectors resulting from perturbations in the matrix. 165. Relative and structured perturbation theory Example 3.173.9 suggests that the small eigenvalues of a symmetric matrix are not well conditioned in a relative sense under perturbations of the matrix. perturbations of the form (/ + E) A(I + F). the smallest eigenvalue is insensitive to small relative perturbations in the elements of A. but overlap—often considerably. Thus. The subject is still in flux. etc.995740 '.166. Multiplicative perturbations: i.0000-10 7. 266]. It is also impossible to present a complete bibliography. Consider. 112. In other words.g. l)-element is larger than the eigenvalue itself. Generally. 4.

The algorithm is closely related to two algorithms that are important in their own right—the power method and the inverse power method. can compute a Schur form of a general matrix in 0(n3) operations. Thus the history of eigenvalue computations is the story of iterative methods for approximating eigenvalues and eigenvectors. The purpose of this chapter is to describe the QR algorithm and its variants. And the underlying theory of the method continues to suggest new algorithms. We will therefore begin with an exposition of those methods.2 THE QR ALGORITHM All general-purpose eigenvalue algorithms are necessarily iterative. and in the following section we will present the implicitly 55 . An important theme in this story concerns methods based on powers of a matrix. the singular value decomposition of a general matrix. Specifically. Thus if we had a noniterative general-purpose eigenvalue algorithm. In §2 we will present the general algorithm in the form generally used for complex matrices. and the generalized Schur form of a matrix pencil. to any polynomial p(x) = xn — an-ixn~l — aix — ao there corresponds a companion matrix whose characteristic polynomial is p. This algorithm. with proper attention to detail. It can be adapted to reduce a real matrix in real arithmetic to a real variant of the Schur form. This statement is a consequence of Abel's famous proof that there is no algebraic formula for the roots of a general polynomial of degree greater than four. Other forms of the algorithm compute the eigenvalues of and eigenvectors of a Hermitian matrix. The climax of that theme is the QR algorithm—the great success story of modern matrix computations. we could apply it to the companion matrix of an arbitrary polynomial to find its zeros in a finite number of steps—in effect producing a formula for the roots.

.. Implementation 8. 1. The effects of rounding error . THE POWER METHOD The basic idea behind the power is simple. so that If |Ai| > |A. We begin with the power method. its implementation raises problems that will recur throughout this volume. We will treat these problems in the following order..-)(« = 1.-1 (i > 2) and 71 ^ 0. then as k —>• oo. so that Aku becomes an increasingly accurate approximation to a multiple of the dominant eigenvector x\. Computation and normalization 3. Choice of starting vector 4.56 CHAPTER 2. 1. THE QR ALGORITHM shifted form for real matrices. For this reason we will lavish attention on one variant—the explicitly shifted QR algorithm for complex matrices—and sketch the other variants in greater or lesser detail as is appropriate.. they have a lot to teach us about the art of finding eigenvectors. In spite of their simplicity. The symmetric eigenvalue problem and the singular value decomposition will be treated separately in the next chapter. in §4 we treat the generalized eigenvalue problem. The Rayleigh quotient 6. THE POWER AND INVERSE POWER METHODS The power and inverse power methods are extremely simple methods for approximating an eigenvector of a matrix. the first term in (1. Since the Xi are linearly independent. any nonzero vector u0 can be written in the form Now Akxi = X+Xi. For this reason and because of the connection of the methods to the QR algorithm. Shifts of origin 7. Nonetheless. 1. n) be a complete set of eigenpairs of A. Convergence of the method 2. Let A be a nondefective matrix and let (A. A word on the algorithms. Finally. £. Convergence criteria and the residual 5. we devote this subsection to a detailed treatment of the algorithms..1) dominates. The QR algorithm is a complicated device that generates a lot of code.1. its various manifestations all have a family resemblance. Although the basic idea of the power method is simple.

Let UQ be a nonzero starting vector.4). Chapter 1. x\) and let A be decomposed in the form (1. Let A have a unique dominant eigenpair(\i. decomposed in the form (1. x\) in the sense of Definition 2. THE POWER AND INVERSE POWER METHODS 57 Convergence In motivating the power method. It is easy to verify that Let the columns of Y form an orthonormal basis for the subspace orthogonal to x\.SEC. Proof. 1.1. we can find a nonsingular matrix X = (x\ X^) such that Without loss of generality we may assume that Since x and the columns of X form a basis for C n . Theorem 1. there is a constant a such that where p(M) is the spectral radius ofM. all that is needed is that A have a single dominant eigenpair ( AI . we can write any vector UQ in the form In this notation we have the following theorem. Chapter 1. where X = (x\ X^) satisfies (1.3). by Theorem 1.12. Then by Lemma 3. then In particular for any e > 0. Since this eigenvalue is necessarily simple. In fact. But Vol II: Eigensystems .2). 1/71 ^ 0. we made the simplifying assumption that A is nondefective.19.6. Chapter 1.

A tridiagonal matrix of order n has the form illustrated below for n = . then the convergence will be retarded. Example 1.9.This suggests the following algorithm. if A has a complete system of eigenvectors with its eigenvalues ordered so that |Ai| > |A2| > • • • > |An|. k = 0 2.58 and CHAPTER 2. For example. However. this operation count does not tell the entire story. However.6). it will have many zero elements.6). If we were to naively implement this formula. we see that if 71 is small in comparison to c2. Let us consider the example of a tridiagonal matrix. see Example 2. k verges to zero at an asymptotic rate of[p(M)/\\i\] . the error conK. a statement made precise by (1. if the powers of M are badly behaved.6). then the convergence is as |A 2 /Ai| fc . Since p(M) < |Ai|.7) and dividing by |7i||Ai|. while (not converged) 3.5). 1. THE QR ALGORITHM IfriA?*! + X2Mkc2\\2 > |7l||A?| .9. • Three comments on this theorem.6) follows directly from Theorem 2. This procedure requires about kn3 flam to compute Uk—a grotesquely large operation count. • From (1. although its asymptotic rate will be unchanged. each iteration of this program requires about n2 flam.8) 4. we would end up first computing Bk — Ak and then computing wjt = BkUQ. end while For a dense matrix. Chapter 1. We will return to this point later. if M is nondefective then we may take e = 0 in (1.5) shows that the error in the eigenvector approximation produced by thepower method behaves like [IIM^I^/IAil^]. we may have to wait a long time to observe this asymptotic convergence. • By Theorem 2. In many applications the matrix A will be large and sparse—that is. Chapter 1.||Mfc||2||C2||2. it may take far fewer than n2 flam to compute its product with a vector. If the matrix is represented by an appropriate data structure. Substituting these inequalities in (1.2. • Equation (1. uk+i = Auk (1.11. we get (1. The inequality (1.Q. In this case we say that UQ is deficient in\\. An alternative is to observe that iik+i = Auk. Computation and normalization We have written the kth iterate u^ in the power method in the form ilk = Akv. Chapter 1. where a flam denotes a pair of floating-point additions and multiplications. k = k+l 5.

The cure for this problem is to note that only the direction of the vector Uk matters: its size has no relevance to its quality as an approximate eigenvector. then the following program computes y = Tx. Thus we have reduced the work to form the product Ax from 0(n2) to O(n). k=Q while (not converged) Mfc+i = Auk 4.g.In the sequel we will dispense with such minor economies and write. b. Such a process cannot continue long without overflowing the machine on which it is run.. Uk is scaled so that \\Uk\\ = 1 in some relevant norm. and c. e. We can therefore scale iik to form a new vector Uk that is within the bounds of the machine. Usually. But it serves to show that algorithms based on matrix-vector multiplies can often be economized. 3. i. 5.. 2.1. the vectors Uk will quickly underflow. k = k+l (T = 1/KH Ui — auk end while Since floating-point divides are expensive compared to multiplies. This can be done in the course of the iteration as the following program shows. 4. 7. 3. 1.e. For example. Uk = Uk/'\\Uk\\> Vol II: Eigensystems . suppose that A! = 10. Similarly. As it stands the program (1. the magnitude of Uk increases by a factor of ten for each iteration. 1. Instead we multiply it by l/||tifc||.SEC. we do not divide Uk by \\uk\\. 1.8) will tend to overflow or underflow. y[l] = a[l]*a:[l] + 6[l]*a:[2] for z = 2 to n—1 y[i] — c[i—l]*x[i—1] + a[i]*^[z] + 6[z]*x[z'4-l] end for t y[n] = c[n-I]*x[n-l] + a[n]*z[n] This program has an operation count of In fladd+3n flmlt. Then asymptotically we have Uk — 10*70^0. In most cases the savings will be small compared to the work in forming Auk. 56. THE POWER AND INVERSE POWER METHODS 6: 59 If the nonzero diagonals of T are stored as indicated above in arrays a. 2. if AI = 0. We shall not always get as dramatic improvement in operation counts as we did in the above example.

Such approximations are natural starting values for the power method. The problem is that a highly structured problem may have such a patterned eigenvector. When no approximation to x i is available. then the dominant eigenvector of Ak will furnish a good starting approximation to the dominant eigenvector of Ak+i. For example.5). the structure of the problem may actually dictate such a choice. Thus we cannot simply quit when the norm of error ek = Uk — x\ is sufficiently small.. In fact this ratio appears explicitly in the convergence bound (1. the vector e consisting of all ones is often a good choice for a matrix whose elements are positive. this is about as good as we can expect. then it can be shown that -I TT Since 7j = y^UQ and c^ . then 71 will be zero. it is customary to take UQ to be a random vector—usually a vector of random normal deviates. THE QR ALGORITHM The ratio ||c2||/|7i| from the expansion UQ = jiXi + X^c? is a measure of how far the starting vector UQ is from the dominant eigenvector x\. we may expect the ratio ||c2/7i ||2 to vary about the quantity Lacking special knowledge about x\. In some cases one will have at hand an approximation to the eigenvector x\. where 0 < a < l. On the other hand. if we are trying to find the dominant eigenvectors of a sequence of nearby matrices A\. It is therefore important that the starting vector UQ should have a substantial value of 71.then or . a natural criterion is to stop when \\Uk — Uk-i\\ is sufficiently small. A 2 . For example. For example.. one should avoid starting vectors consisting of small integers in a pattern —e. As a rule.. -. Convergence criteria and the residual The chief problem with determining the convergence of a sequence of vectors u^ is that the limit—in our case x\ —is unknown. if ||eJ| < alle^-ill. It it not surprising that a small value of 71. if UQ is random. If we write (x i X2) = (y\ YI) .g. which makes the ratio large. If that eigenvector is not the dominant eigenvector. If the sequence converges swiftly enough.60 Choice of a starting vector CHAPTER 2. can retard convergence.Y^UQ. The reasoning is as follows. a vector whose elements alternate between plus and minus one.

0.SEC. Vol II: Eigensystems . But if a is near one the error can be large when \\Uk — Uk-i \\ is small.5. Moreover.3. it gives us something almost as good. say. Let Then there is a matrix such that and If A is Hermitian and then there is a Hermitian matrix satisfying (1. Although the residual does not give us a direct estimate of the error. as the following theorem shows (for generality we drop the iteration subscripts). THE POWER AND INVERSE POWER METHODS 61 When a is. and stop when r/t is sufficiently small. 1. Theorem 1.12). Only if we have an estimate of a — which is tricky to obtain on the fly — can we relate the distance between successive approximations to the error. An alternative is to compute the residual where ^ is a suitable approximation to AI. \\ek\\ can be no larger than twice ||Uk — Uk_i ||.

E||p is the sum of the eigenvalues of ERE.12).13) of IJL makes r orthogonal to u. instead A will be contaminated with errors. we naturally take the vector u to be the fcth iterate Uk. In other words. Again.14) follow from the fact that \\E\\\ is the largest eigenvalue of E^E and ||. Optimal residuals and the Rayleigh quotient In computing the residual r = A u—fj.u to determine convergence of the power method. In most problems we will not be given a pristine matrix A.Since the remaining eigenvalues are zero. It suggests that we stop iterating when the residual is sufficiently small. in which case A will have errors that are. As is often the case. THE QR ALGORITHM Proof. in general. The proof in the non-Hermitian case is a matter of direct verification of the equalities (1. there is more than one E that will make (/^. A natural choice is the value of \i that minimizes the residual in some norm. u) an exact eigenpair of A. the problem is most easily solved when the norm in question is the 2-norm. in which case A will be contaminated with rounding error—errors of order of the rounding unit.62 CHAPTER 2. Now if the residual norm is smaller than the error level. . then any errors in the approximate eigenvector can be accounted for by a perturbation of A that is smaller than the errors already in A. note that From this it follows that r and u are orthogonal eigenvectors of EEE with eigenvalues IHH/IHII. from which (1. Thus Theorem 1. For general results on such structured backward perturbations see the notes and references. Now the zeros in a sparse matrix usually arise from the underlying problem and are not subject to perturbation.12) follows immediately. A may be computed. In many applications the matrix A will be sparse. the equalities (1.10) will not generally share the sparsity pattern of A—in fact it is likely to be dense. we have not specified how to choose //. But the matrix E in (1.3 gives a backward perturbation E that is unsuitable for sparse matrices. • The theorem says that if the residual is small. much larger than the rounding unit. The justification for this criterion is the following.However. For example.11) and (1. A may be measured. then the pair (//. For example. if your results are inaccurate. This is not the whole story. Fortunately. the inaccuracy is no greater than that induced by the error already present. For the norm of E. u) is an exact eigenpair of a nearby matrix A + E. we may take so that E alters only the diagonal elements of A. For the Hermitian case we use the fact that the choice (1.

Generalizing from these two Rayleigh quotients. By continuity when u or v is near an eigenvector. is established by direct verification. This quantity is clearly minimized when fj. Then ||rM||2 is minimized when In this case r^ J_ u. l.SEC. we make the following definition. Then the quantity is called a RAYLEIGH QUOTIENT.21). the Rayleigh quotient will approximate the corresponding eigenvalue. The orthogonality of r and fj. Letu ^ 0 and for any yu setr M = Au — ^LU. THE POWER AND INVERSE POWER METHODS 63 Theorem 1. then the Rayleigh quotient reproduces that eigenvalue. We have already seen another Rayleigh quotient in (3. Chapter 4.16).2. then on to the main point. Let u and v be vectors with v^u ^ 0. • A brief comment. The quantity WH Au/u^u is an example of a Rayleigh quotient. This characterization of the optimal residual norm will prove useful in the sequel. where we will treat the approximation of eigenspaces and their eigenblocks. Definition 1. We will return to Rayleigh quotients in §2. Vol II: Eigensystems . There it had the form yH(A + E)x where x and y (normalized so that yHx = 1) were left and right eigenvectors corresponding to a simple eigenvalue Xof A. If either u or v is an eigenvector corresponding to an eigenvalue A of A.5. Without loss of generality we may assume that 11 u \ \2 = 1. — v — WH Au. Proof. Chapter 1.4. the Rayleigh quotients u^Auk/u^Uk provide a sequence of increasingly accurate approximations to A. Thus in the power method. Let (u U) be unitary and set Then Since || • ||2 is unitarily invariant. This Rayleigh quotient provides a highly accurate approximation to the corresponding eigenvalue of A + E. • The proof characterizes the norm of the optimal residual as the norm of g in the partition (1.

and -0.49. y — Ax 5. .5.. There are three things to say about this straightforward program. If we shift A by AC = —0. 1. If this ratio is near one. returning the approximate eigenpair (/z. x = x/\\x\\2 3. whose eigenvalues \\ — AC. 1/3 is nearly the best convergence ratio we can obtain by shifting. THE QR ALGORITHM Given matrix A. \n — K are shifted by the constant K. and the convergence ratio becomes | —a dramatic improve This shifting of the matrix to improve the disposition of its eigenvalues is an example of spectral enhancement..0. r = y — /j. For example.64 CHAPTER 2. .1: The shifted power method o Shifts and spectral enhancement If A is nondefective and the eigenvalues of A satisfy then the power method almost always converges to the dominant eigenvector x\ with rate |A 2 /Ai|.5. Anorm = \\A\\p 2. and a nonzero starting vector x.99.1 implements the shifted power method. we pay for this power in the coin of increased computational costs. x}.. /j. Implementation Algorithm 1. However.x 7. In the next subsubsection we will consider a very powerful spectral enhancement—the shift and invert operation.5. and —0. a shift AC. But it can only improve performance by so much. ar = ar/||ar||2 9. In the above example. the convergence will be slow. if (11 r 112 /Anorm < e) leave while fi 10. • Both the residual and the next iterate are computed from the intermediate vector . end while Algorithm 1. = x^y 6. x = y — KX 8.99. then the powermethod converges with ratio 0. while (true) 4. Sometimes we can improve the rate of convergence by working with the matrix A — AC/. a convergence criterion e. if the eigenvalues of A are 1.K!)U in the form Au — KU at little additional cost. this algorithm iterates the shifted power method to convergence. the eigenvalues become 1. Shifting has the advantage that it is cheap: we can calculate (A .

The major source of error in the method is the computation of the product AukAn elementary rounding-error analysis shows that for a dense matrix where Equation (1. Similarly. From Theorem 1. we return the vector x because it is almost certainly more accurate. x). Chapter 1]. by Theorem 3. in view of the simplicity of the convergence proof for the power method. However.13. Thus.18) imply that at each stage the power method is trying to compute the dominant eigenvector of a slightly perturbed matrix. if the residual corresponding to Ax is reduced far enough (and that need not be very far). This is only n times as large as the error we could expect from simply rounding the matrix [see (2. Nonetheless. and we cannot expect the power method to compute xi to an accuracy greater than that error. M). x). our convergence test is based on the pair (/x. but it also changes the residuals corresponding to these vectors. For definiteness we will assume that ||ujb||2 = 1 and ||A||p = 1 in the following discussion. the change is very small. VolII: Eigensystems . we can say that rounding error not only changes the vector Ax to (A + E)x. we can expect no greater accuracy in the angle between the computed vector and xi than roughly neM||. 1. where x is the previous iterate. The limiting accuracy of the power method is therefore quite satisfactory. x). x) is an exact eigenpair of A + E. THE POWER AND INVERSE POWER METHODS 65 y = Ax to avoid an extra multiplication by A. This perturbations induces an error in the eigenvector x\. of the order of the rounding unit.A||p/sep(Ai. We will suppose that the computations are performed in standard floating-point arithmetic with rounding eM.3 we know that (//.17) and the bound (1.SEC. Moreover. But can the limit be attained? The answer is not easy — surprisingly. but it is easy to forget when one is writing in a high-level language like MATLAB. • In a working implementation one would specify a maximum number of iterations. where the relative error ||£||p/|| A||p is less than or equal to the convergence criterion e. then the computed residual corresponding to(A + E)x will also be reduced. Thus we can expect to see a steady decrease in the residual until the limiting accuracy is attained. we cannot say the same thing about (//. However. Chapter 1. the norm of A is computed outside the loop. and the code would give an error return when that number was exceeded. The effects of rounding error We must now consider the effects of rounding error on the power method. In particular. This parsimonious style of coding may seem natural. Strictly speaking. • On termination the algorithm returns an approximate eigenvector (//.2).

.K!)~I. and the power method cannot converge. the power method in its natural form is little used these days. and its combination with the power method is called the inverse power method. and the convergence can be very slow if the dominant eigenvalue is near in magnitude to the subdominant one. Thus by choosing K sufficiently near A! we can transform AI into a dominant eigenvalue. However.. We now turn to that variant. provided the matrix has a single.2. given a vector x and a shift K we generate a new iterate in the form In principle. THE QR ALGORITHM The advantages of the power method are that it requires only matrix-vector multiplications and that it is essentially globally convergent. which are finite quantities. and that dominance can be made as large as we like. 1.66 Assessment of the power method CHAPTER 2. we could use Algorithm 1.. It has the disadvantages that it only computes the dominant eigenvector. . The inverse power method The inverse power method is simply the power method applied to the enhanced matrix (. However.. The traditional enhancement for doing this is called the shift-and-invert enhancement. /^i —> oo. simple dominant eigenvalue.4 . we can compute a residual almost for free. On the other hand. then the eigenvalues of the matrix (A — At/)"1 are Now as K. owing to the nature of the shift-and-invert enhancement. A n in no particular order. This spectral enhancement is called the shift-and-invert enhancement. Shift-and-invert enhancement Let A have eigenvalues AI . If we have an approximation K to AI. for i > 1 the m approach (A. Moreover. and suppose that Aj is simple. it plays an implicit role in the QR algorithm. We begin with the enhancement. THE INVERSE POWER METHOD The difficulties with the power method can be overcome provided we can find a spectral enhancement that makes an eigenvalue of interest the dominant eigenvalue of the matrix. —*• AI. Specifically. — A) . then its conjugate is also an eigenvalue with the same absolute value. For these reasons. and a variant—the inverse power method—is widely used. if a real matrix has a complex dominant eigenvalue.1 to carry out the inverse power method.

1.K. we have If we set then the optimal residual for x is (see Theorem 1. 10. 9. It is evident that the major work in the algorithm consists in solving the system (A . It is therefore important that the computation of the decomposition be placed outside the while Vol II: Eigensystems .2: The inverse power method o Specifically. 4. requires only ra2 flam. Algorithm 1. 2. and some care must be taken not to inadvertently increase this work. or).I)y — x in statement 2. r = w — px x=x if (||r||2/||A||F < c) leave while fi end while Algorithm 1.4) Note that as x converges to an eigenvector. 7. 1.2 implements this scheme. THE POWER AND INVERSE POWER METHODS 67 Given a matrix A. while (true) Solve the system (A-Kl)y = x w = x/\\y\\2 p — x^w 3- % = y/\\y\\2 p. We will say that this process requires a factorization. a shift K. The subsequent solution of the linear system. the decomposition is usually an LU decomposition computed by Gaussian elimination with partial pivoting. this algorithm iterates the inverse power method to convergence.SEC. 5. let Since (A — Kl)y = x. With dense systems. on the other hand. the Rayleigh quotient fi = K + p = xHAx converges to the corresponding eigenvalue of A. and a nonzero starting vector x. a convergence criterion e. = K + p 6. 8. which costs about ^n3 flam. returning an approximate eigenpair (//. Most linear system solvers first compute a decomposition and then use the decomposition to solve the system.

6e-09 7.814e+00 7.079e-01 1.4e-16 It is seen that each iteration adds about two significant figures of accuracy to the eigenvalue until the eigenvalue is calculated to working accuracy (£M = 10~16).765e+00 -2. On the other hand if we take K = 1 + 10~8.022e+00 6. In particular.095e-01 4. This behavior is typical of the inverse power method applied to a well-conditioned eigenpair. one should avoid software that solves a linear system by invisibly computing a decomposition. For example.2 were 9. THE QR ALGORITHM loop in Algorithm 1.209e+00 -1.5. we took K = 1.2.2e-03 7.606e+00 (shown only to four figures) has eigenvalues 1. if not dominant. The effects of rounding error The nearer the shift K is to an eigenvalue. An example The convergence of the inverse power method can be stunningly fast when the shift is near the eigenvalue in question.4e-04 3. Nonetheless.907e+00 3.2. We will return to this point repeatedly in the second half of this volume.510e+00 2.906e-01 -5.307e+00 -1.776e-01 1.7e-ll 7.513e+00 -6.833e+00 -3. the matrix 3.3e-08 3.274e+00 8.4e-10 1. But if K is near an eigenvalue.399e+00 4.4e-14 3.184e+00 2.298e-01 -3. In the following experiment.699e+00 -2. the iteration converges in one iteration.1e-18 The iteration converges to working accuracy in two iterations.475e+00 -1.01 and took x to be a normalized random vector. If K approximates AI to working accuracy.68 CHAPTER 2.877e-01 1.3. then A — K! is nearly singular. When A is sparse.820e-02 4. The errors in th Rayleigh quotients and the residual norms for several iterations of Algorithm 1. 9.158e-01 -1.4e-12 3. the faster the inverse power method converges.7e-05 7.487e-01 9.2e-06 3. part of any calculation involving shift-and-invert enhancement. we get the following results.5e-07 7. where the concern is with large eigenproblems.3e-16 3.2e-09 4.470e+00 2. and the system .4e-10 3. the operations counts for computing a decomposition or factorization may be less than O(n3). they are usually an important.8e-13 8.4.4e-15 4.

the power method goes back at least to 1913 [182]. Thus for full. However. we can improve convergence by replacing AC wit ju.g. In fact if AC approximates A to working accuracy. 1. Gaussian elimination with partial pivoting) is used to solve the system (A — AC/)?/ — x. To put the matter in another way. the solution will be completely inaccurate. then the computed solution will satisfy where ||£||F/||A||F is of the order of the rounding unit.1933] used it to compute his numerical results. The main difficulty with the Rayleigh quotient method is that each time the shift changes. If a stable algorithm (e. It can be shown that once it starts converging to a simple eigenvalue. Yet the method still converges. it produces Rayleigh quotients \i that are generally more accurate than the shift AC. this perturbation in A .4).1937] gave an extensive analysis of the method. which is the subject of the rest of this chapter. Since the rate of convergence depends on the accuracy of AC. Since we are only concerned with the direction of the eigenvector. Hotelling in his famous paper on variance components [120. This amounts to adding the statement AC = /z after statement 8 in Algorithm 1.2 The result is called the Rayleigh quotient iteration.. which accounts for the fact that the Rayleigh quotient iteration in its natural form is not much used. Aitken [2. For example. THE POWER AND INVERSE POWER METHODS 69 (A — K. The Rayleigh quotient method We have seen that as we proceed with the inverse power method. 1. Many people who use the power method do so unknowingly.3. it converges at least quadratically. when AC is near an eigenvalue. this error in y does not introduce any significant errors in the approximate eigenvector. a simulation of a discrete-time discrete-state Markov chain amounts to applying the power method to the matrix of transition probabilities. dense matrices the Rayleigh quotient method takes O(n3) operations for each iteration. but most of the inaccuracy is in the direction of the eigenvector being approximated. Thus the inverse power method moves toward an approximate eigenpair that is for practical purposes indistinguishable from the exact eigenpair. the vector y is indeed computed inaccurately. Vol II: Eigensystems . one must decompose the matrix A — AC/ to solve the system (A — AC/)?/ = x. NOTES AND REFERENCES The power method As an explicit method for computing a dominant eigenpair. If the eigenpair we are looking for is well conditioned.AC/ changes it only insignificantly.SEC. Why? The answer to this question is furnished by the rounding-error analysis of the solution of linear systems (see §1:3. in a disguised form it reappears in the QR algorithm.I)y — x in statement 2 will be solved inaccurately.

1944]. The fact that the Rayleigh quotient gives an optimal residual is due to Wilkinson [300. 95. 577].g. [62. Even in this case one can apply a very general theorem of Oettli and Prager [185]. The justification of a convergence criterion by appeal to a backward error result is part of the paradigm of backward rounding-error analysis. which says that a small residual implies a small backward error. A particularly important problem was to deflate eigenvalues as they are found so that they do not get in the way of finding smaller eigenvalues. working at the National Physical Laboratory in England. 570-599] and [295]. honed this tool to as fine an edge as its limitations allow. The first part is due to Wilkinson—e. Wilkinson. who defined it for symmetric matrices. The inverse power and Rayleigh quotient methods The inverse power method is due to Wielandt [294. before the advent of the QR algorithm. pp.g. Strutt). For further discussion of this matter see any general textbook on numerical linear algebra—e. The Rayleigh quotient The Rayleigh quotient is due to Lord Rayleigh (J. Ipsen [124] gives a survey of the method and its properties.3. along with an extensive list of references. p. Residuals and backward error The use of a residual to determine convergence is inevitable in most problems. 172]. The . see [299. Whether this result can be generalized to Hermitian matrices is an open question..70 CHAPTER 2. since seldom will we have the wherewithal to determine the accuracy of the current approximation to an eigenvalue or eigenvector. 278. The shift was actually entered by machine operators in binary by switches at the console before each iteration [300. which allows one to specify the relative sizes of the elements in the backward error.262. Theorem 1.. Shift of origin Wilkinson used shifting to good effect at the National Physical Laboratory. 300].3 to respect the zero structure of a sparse matrix is a drawback only if the eigenpair in question is especially sensitive to perturbations in the zero elements. 299. The failure of the backward perturbation in Theorem 1. 141] The result for Hermitian matrices in a more general form is due to Powell [214]. Some of its many generalizations will appear throughout this volume. W. THE QR ALGORITHM For a time. provides a justification for tests based on the size of a suitably scaled residual. whose application was optimization. p. p. the power method was one of the standard methods for finding eigenvalues and eigenvectors on a digital computer. For more see The Algebraic Eigenvalue Problem [300.

multiply the factors in reverse order.SEC.2.1) satisfy Thus Ak+i is unitarily similar to Ak. 619-622]). Ak+i = RkQk+KkI 5. when the off-diagonal part of its Schur form dominates the diagonal part [111]. The use of the Rayleigh quotient to compute the optimal residual mitigates this problem. The fact that A — AC/ is ill conditioned when AC is near an eigenvalue of A makesT certain that the vector in the inverse power method will be inaccurately calculated.190. end for k In other words. For more see [96] and [124]. this fact alone is not sufficient reason for performing this sequence of operations.However. Factor Ak—Kkl — QkRk.191.. which shows that the inaccuracies do no harm.2.9]. 2. The Rayleigh quotient iteration has been analyzed extensively in a series of papers by Ostrowski [188.e. 2.192]. 1. THE EXPLICITLY SHIFTED QR ALGORITHM 71 method is chiefly used to compute eigenvectors of matrices whose eigenvalues can be approximated by other means (e. for A: = 0. Its shifted form can be written simply—and cryptically—as follows. Let AQ = A. Choose a shift AC^ 3.. For additional results on the symmetric case. see [206. the residual can never converge to zero.g. to perform one QR step— subtract the shift from the diagonal. the first residual will generally be extremely small and then will increase in size before decreasing again. In that case. see Algorithm 2. pp. The method is called the explicitly shifted QR algorithm because the shift is explicitly subtracted from the diagonal of A and added back at the end of the process. to get from one iterate to the next — i.189. 2. is the algorithm worth considering? Vol II: Eigensy stems . A more subtle reason is the behavior of the residual when the matrix is strongly nonnormal—that is. compute a QR factorization.. The analysis given here. An obvious reason is that if we use the constant shift AC.. then.6. Why. Chapter 3). §4. and add back the shift to the diagonal. is due to Wilkinson (e...6-4. It is important to use the Rayleigh quotient to compute the optimal residual in Algorithm 1. The matrices produced by the algorithm (2. THE EXPLICITLY SHIFTED QR ALGORITHM The QR algorithm is an iterative method for reducing a matrix A to triangular form by unitary similarity transformations.1.g. see [300. where Q is orthogonal and R is upper triangular 4.

72 CHAPTER 2. however. These relationships go far toward explaining the remarkable effectiveness of the algorithm. A step of the QR algorithm can be regarded as an approximation to such a deflation. This relation allows us to investigate the convergence of the algorithm. For brevity we will drop the iteration subscripts and denote the current matrix by A and the next QR iterate by A. then It follows that Thus we have deflated the problem.1.12. B.4). The QR algorithm implicitly chooses g to be a vector produced by the inverse power method with shift K. The remaining subsections are devoted to the practical implementation of the algorithm. If. and the next two subsections are devoted to laying them out in detail. we choose q to be an approximate eigenvector of A.ftq11 (see Theorem 1. Specifically. . we obtain an alternate proof of Schur's theorem—one in which the reduction is accomplished from the bottom up by left eigenvectors. THE QR ALGORITHM AND THE INVERSE POWER METHOD In this subsection we will show how the QR algorithm is related to the inverse power method. let Q — (Q* q) be unitary and write If (A. etc.2) will be small because ||#||2 is the norm of the optimal residual qu A . The QR choice The above reduction is not an effective computational procedure because it requires that one be given eigenvectors of A. q) is a left eigenpair of A. then g in (2. At the same time it is a variant of the power method. Schur from the bottom up In establishing the existence of Schur decompositions (Theorem 1. If we repeat the procedure inductively on B. Chapter 1) we showed that if we have knowledge of an eigenvector we could deflate the corresponding eigenvalue of the matrix by a unitary similarity transformation. THE QR ALGORITHM The answer is that the algorithm is an elegant way of implementing the inverse power method with shift K^. with the difference that the deflation occurs at the bottom of the matrix. 2.

Hence if g is small. Since Q is unitary. then we can expect q to be a good approximation to a left eigenvector of A and hence #H in (2.AC/) = rnnej. the norms of its last row and last column must be one.6) converges to zero. Hence Equation (2. If AC is close to an eigenvalue of A.SEC.5) shows that the last column of the matrix Q generated by the QR algorithm is the result of the inverse power method with shift K applied to the vector ej. 2. We begin by giving a bound that relates the norms of g and g in successive iterates. \i should approximate an eigenvalue of A and is a natural choice for a shift. Partition A in the form Then (//.4) is qH(A . e n ) is an approximate left eigenpair whose residual norm is \\g\\2.2) to be small. is the Rayleigh quotient eT A e n -^-"n • Error bounds To show that the QR algorithm converges to a deflated matrix. let us write and also write the relation A — AC/ = RQ in the form From these partitions we proceed in stages. This shift is called the Rayleigh quotient shift because fj. Specifically. THE EXPLICITLY SHIFTED QR ALGORITHM 73 To see this. The key to the derivation is to work with a partitioned form of the relation of A — AC/ = QR. Hence and Vol II: Eigensystems . we must show that the vector g in (2. let us rewrite the QR factorization of the shifted matrix A — AC/ in the form where because of the triangularity of R we have The last row of (2. The above analysis also suggests a choice of shift.

If we assume that S is nonsingular and set then We now need a bound on p. from (2.K). Rates of convergence Let us restore our iteration subscripts and write the bound (2. Hence from (2. We will therefore assume that the iteration converges and determine the rate of convergence.13) by . We may now replace (2. However.74 CHAPTER 2.12) in the form This bound can be used to show that if go is sufficiently small and //o is sufficiently near a simple eigenvalue A of A then the iteration with Rayleigh quotient shift converges in the sense that (/jt -> 0 and fik -> A.7) in the form then we find that p = ftth + x((j.10) and (2. We will assume that we have constants T? and a such that Since the A* are orthogonally similar.1)-block of (2.11) we get This is our final bound. Computing the (2. THE QR ALGORITHM Next we want to show that e is small when g is small.9) and (2.8) we find that #H = peH. The fact that ak can be bounded is a nontrivial corollary of the formal proof of convergences. . If we use the fact that Q is unitary to rewrite (2.7). Hence from (2. we can simply take rj = ||A2||2. the proof is tedious and not particularly informative. we get #H — eH S.10) and the fact that Finally.

any shift for which \pk . General comments on convergence Our convergence analysis seems to depend on the hypothesis that //^ is converging to a simple eigenvalue A. which we turn to now.Q is sufficiently near a simple eigenvalue of A. For example.Kk\ = O(||0||fc) will give us quadratic convergence. the convergence of the QR algorithm remains quadratic. if a2?/ = 1 and ||0o||2 = 10"1. which removes the term 0k\Hk . If A is defective on the other hand. if A is nondefective.We say that \\gk\\2 converges at least quadratically to zero. it assumes that fj. The complete convergence analysis is local. In addition.Thus when the QR algorithm converges to a simple eigenvalue.Thus our bound becomes This type of convergence is called cubic.2. 2. so that we may replace 77 by \\gk\\2. this is true of the Wilkinson shift defined by Algorithm 2. (It is a good exercise to see why this happens. we have hk = gk. The typical behavior seems to be that the algorithm spends its first few iterations floundering and then homes in on an eigenvalue. We have used the Rayleigh quotient shift in our analysis. For more on these points see the notes and references. Quadratic convergence. so that our uniform upper bound a no longer exists. In practice.Kjb|||flr/b||2 in (2.In particular. once it sets in. however. However. There are no theorems for the global convergence of the QR algorithm applied to a general matrix. and it is blindingly fast. In particular. 2. that is. is very fast. the matrices S^1 can be unbounded. the convergence may be only linear. A is necessarily nondefective. THE EXPLICITLY SHIFTED QR ALGORITHM 75 (the second term vanishes because the Rayleigh quotient shift takes Kjt = Hk). Subsequent eigenvalues are found with fewer and fewer iterations.) Nonetheless. the norm of gk+i is bounded by a quantity proportional to the square of the norm of gk.SEC. if A is Hermitian. and the convergence remains cubic. then so are the iterates Ak. For example. if A is multiple. Why this happens is partly explained by the relation of the QR algorithm to the power method. the algorithm also Vol II: Eigensystems .7 below. THE UNSHIFTED QR ALGORITHM The relation of the QR algorithm to the inverse power method explains its ability to converge swiftly to a particular eigenvalue. whatever its multiplicity. If AQ is Hermitian. then Thus four iterations are sufficient to reduce the error by a factor corresponding to the IEEE double-precision rounding unit.13).

We begin with a far-reaching characterization of the ArthQR iterate.This can be accomplished by multiplying the transformations together as we go along. The reason for this is that the QR algorithm is also related to the power method. The QR decomposition of a matrix polynomial Each step of the QR algorithm consists of a unitary similarity transformation Ajt+i = Q^AQk.. ..1. The explication of this relation is the subject of this subsection. We have Postmultiplying this relation by Rk-i. a process called accumulating the transformations. we get Hence An obvious induction on the product completes the proof. .76 CHAPTER 2. . . Kk starting with the matrix A. we will need a single transformation that moves us from the initial matrix A = AQ to the final matrix Ak+i .. The result is a matrix satisfying The matrix Qk has another useful characterization Theorem 2.Qk and RQj. Let Q0... THE QR ALGORITHM produces approximations to the other eigenvalues. Rk be the orthogonal and triangular matrices generated by the QR algorithm with shifts KQ. Let Then Proof.. albeit more slowly..When the algorithm has finished its work. .

It follows that if we partition then gk —»• 0 and /^ —> AI. The QR algorithm and the power method Theorem 2...SEC. we have Thus the first column of Qk is the normalized result of applying k iterations of the power method to ei.. The result is the QR factorization QkRk ofp(A). Vol II: Eigensystems . Kk. the matrices Qk and Rk from the unshifted QR algorithm satisfy Multiplying this relation by ei and recalling that Rk is upper triangular. accumulating the transformations Q j and the product of the triangular matrices Rj. Now if the conditions for convergence of the power method are met. 2. Convergence of the unshifted QR algorithm The relation of the QR algorithm to the power method is a special case of a more general theorem—under appropriate conditions the unshifted QR algorithm actually triangularizes A. where AI is the dominant eigenvalue of A. say by Householder triangularization.. If we use the first formula. we are interested in what it says about the relation of the QR algorithm to the power method. For the present. Let us say that a sequence of QR steps is unshified if the shifts Kk are zero. THE EXPLICITLY SHIFTED QR ALGORITHM Theorem 2.. of p(r). if we know the zeros AC. then we can simply apply the shifted QR algorithm with shifts KI .1 has more than one application.. However. the vectors q[ ' approach the dominant eigenvector of A.1 can be interpreted as follows.) These formulas give us two ways of evaluating the QR factorization of p( A). Kk. Let 77 be a polynomial of degree k whose zeros are «i.15)...We can then define p( A) by either of the formulas (It is a good exercise to verify that these forms are equivalent. we simply formp(A) and calculate the QR decomposition of the result. Then by (2.

16) approaches zero — i. 6nt and set Then the triangular factor in the decomposition has positive diagonal elements. We have Now for i > j the (z. Let CHAPTER 2... Proof.9).. Write A*IA-fc = I + Ek. If Ak has the QR decomposition Ah = Qk&k. It follows by the uniqueness of the QR decomposition that Qk = QQkD~\and Four comments on this theorem.78 Theorem 2. • We know [see (1. with .kLA~k approaches the identity matrix... and letX = QR be the QR decomposition o f X . where L is unit lower triangular. Consequently. by the continuity of the QR factorization Qk —>• /. Then Let QkRkbe the QR decomposition of / + REkR 1 .16). which by (2.e.Then Since / -f REkR 1 —> /. .2. Let the diagonals of RRAkU be 61. A. then there are diagonal matrices Dk with \Dk\ = I such that QkDk -> Q. Chapter 1 ] that the columns of the Q-factor of X are the Schur vectors of A corresponding to the eigenvalues in the order (2..)-element of A fc £A k is lij(\i/Xj)h. where Ek -> 0. THE QR ALGORITHM where Suppose that X l has anLU decomposition X l = LU.

i. Rounding error will generally cause the eigenvalues to be found in descending order. But the QR algorithm can be very slow in extricating itself from disorder. This phenomenon is called disorder in the eigenvalues. if A — diag(0.SEC. which is what we set out to show. but the latter need not. For example. and hence its Q-factor has the form Vol II: Eigensystems . The former decomposition always exists. • We can get further information about the behavior of the unshifted algorithm by partitioning the matrices involved in the proof of Theorem 2. It follows that the QR iterates Ak = Q^AQk must converge to the triangular factor of the Schur decomposition of A. and has no LU decomposition. then so that the matrix X that orders the eigenvalues in descending order. write If 4fc)H> £J+i.5. and ^ . It is easily verified that the unshifted QR algorithm converges — actually it is stationary in this example — to a matrix whose eigenvalues are not in descending order. 2.0). THE EXPLICITLY SHIFTED QR ALGORITHM 79 its columns suitably scaled the matrix Q k converges to the orthogonal part of the Schur decomposition of A.1. It should be kept in mind that disorder is a creature of exact computations.2. Thus under the conditions of Theorem 2. Specifically.are zero.2 the unshifted QR algorithm produces a sequence of matrices tending to a triangular matrix. • We have assumed that the matrix X of right eigenvectors has a QR decomposition and that the matrix X~l of left eigenvectors has an LU decomposition. the product R(AkLA~k)(AkU) has the form This product is block upper triangular.

the QR iterates Ak tend to a block triangular form. in fact. l:i] converge to zero. so that the last row is going swiftly to zero. the results on the unshifted algorithm give us insight into how the shifted algorithm behaves. Specifically. By inspection. the (i. the proof of the theorem can be adapted to show that if there are groups of eigenvalues of equal magnitude. | \i+i/ \i \}k to zero. The first point to note is that one step of the algorithm requires the computation of a QR decomposition . its cumulative effect may be significant. we are in effect applying the unshifted algorithm to the matrix A — K&/. the convergence can be slow. Hence we may expect some reduction in the subdiagonal elements of A other than those in the last row. i)-element of Ak converges to A. The foregoing does not make an impressive case for the unshifted QR algorithm. 2.80 CHAPTER 2. while the other elements of the submatrix Ak[i:n. • Theorem 2. whose diagonal blocks—which themselves may not converge — contain the equimodular eigenvalues._i |.2 explicitly assumes that the eigenvalues of A are distinct and have distinct absolute values. and i\^ i in r (2.18) occurs only whenilL' . This.18) at the same rate as these matrices approach zero. and even when the conditions of Theorem 2.17) are exactly zero. on which we must perform many iterations. This reduction may not be much.. but for a large matrix. However. Now in the unshifted algorithm these matrices tend to zero. THE QR ALGORITHM NowA f c = Q^AQk = Q%QHAQQk = Q^TQk. HESSENBERG FORM We now turn to the practical implementation of the QR algorithm. Ak decouples at its ith diagonal element. and by continuity Ak will tend toward a matrix of the form (2. It follows that: Under the hypotheses of Theorem 2. The block triangularization in (2. is what happens in practice. which converge to zero as \\i/\i_i\k and \Xi+i/\i\k respectively. L\^ l.e. by considering a suitable partitioning of the matrices in question.2. Nonetheless. we see that the most slowly converging elements are l\ ^ and i\^ t-. It follows that i. Its convergence is unreliable. The rate of convergence is at least as fast as the approach of max{ | A. suppose that the shifted algorithm is converging to an eigenvalue A.2 are satisfied.3./ A. where T is the upper triangular factor of the Schur decomposition of A. As the shifts «& converge.

) Such an operation count is far too high for the algorithm to be practical. This brings the total operation count down to an acceptable 0(n3). the operation count to find all the eigenvalues is O(n4). The cure to this problem is to first reduce the matrix A to a form in which the algorithm is cheaper to apply. which overwrites A with HA. Here we will give a brief treatment of the basics. THE EXPLICITLY SHIFTED QR ALGORITHM 81 of Ak. 2. we work with ever smaller submatrices. is called a Wilkinson diagram. in §1:4. that is. a matrix of the form illustrated below forn = 6: (This representation. A Householder transformation or elementary reflector is a matrix of the form where It is easy to see that a Householder transformation is Hermitian and unitary. This equation suggests the following algorithm. The natural form. Since at least one QR step is needed to compute a single eigenvalue. Specifically. a QR step preserves Hessenberg form.SEC. in which an 0 stands for a zero element and an X stands for an element that may be nonzero.The computation of this decomposition requires 0(n3) floating-point operations. Householder transformations The reduction to Hessenberg form is traditionally effected by a class of transformations called Householder transformations. Moreover. emphasizing the complex case. The product of a Householder transformation and a matrix A can be cheaply computed. it turns out. is upper Hessenberg form'. Real Householder transformations have been widely treated in the literature and in textbooks (in particular.} As we shall see later.1 in this series). a QR step for a Hessenberg matrix requires only 0(n 2 ) operations. Vol II: Eigensystems . (This is true even if we take advantage of the fact that as the matrix deflates.

Since we generally represent row vectors as aH.uu . • If a = 0. • There is no danger of cancellation in statement 11 because the preceding statement forces u[l] to be nonnegative. as usual. we will always invoke housegen in the form In this case our Householder transformation is J . • If a is of order n and we replace the index 1 in housegen with n. W H . The condition that the vector a have norm one and that its first component be nonnegative is rather restrictive. the algorithm returns a row vector u such that a(I .82 CHAPTER 2. to reduce a general vector a to a multiple of ei we can replace a by where p is a scalar of absolute value one that makes the first component of a nonnegative. A similar algorithm will compute the product AH. This observation leads to Algorithm 2.1. Householder transformations are useful because they can introduce zeros into a matrix. • If a is a row vector. We begin with the special case of a vector. and so on.WWH such that . Let a be a vector such that \\a\\2 = 1 and a\ is real and nonnegative. we obtain a new function that produces a Householder transformation H = I .UHU) — i/ef. this algorithm requires about 2mn flam. the routine returns u = \/2. Theorem 23. Five comments. Let Then The proof of this theorem is purely computational and is left as an exercise. which makes H the identity with its (1.1)element changed to — 1. However. which generates a Householder transformation that reduces a vector a to a multiple of ei. THE QR ALGORITHM If A is rax n.

«[!] = ! + «[!] 12. 2. u — (p/v)*u 11. if (v = 0) tt[l] = \/2.u/v/ttpl 13. and the generation of Householder transformations for real vectors is usually done somewhat differently (see Algorithm 1:4. if(«[l]/0) 6. « . 2. housegen(a. else 8. p = «[l]/|ti[l]| 7. end housemen Algorithm 2. soisw. 3. t>) w=a ^=N|2 4.1. see Algorithm 1. 1. THE EXPLICITLY SHIFTED QR ALGORITHM 83 This algorithm takes a vector a and produces a vector u that generates a Householder transformation H = I — WWH such that Ha = I/GI.3.SEC. i/ = — pv 14.1: Generation of Householder transformations o For an example of housegen2 applied to a row vector. in this case such constructions as p = «[l]/|u[l]| are unnecessarily elaborate. fi 5. in this volume we will regard housegen as a generic program. Let A be of order n and partition A in the form Let HI be a Householder transformation such that Vol II: Eigensystems . end if 10. p= 1 9. return . However. Nonetheless. Reduction to Hessenberg form We have already mentioned that the principal use of Householder transformations is to introduce zeros into a matrix. • Whenaisreal. Chapter 5. We will illustrate the technique with the reduction by unitary similarities of a matrix to Hessenberg form.1). applicable to both real and complex vectors. w.

Figure 2.1: Reduction to upper Hessenberg form o If we set HI = diag(l. Hk-i such that where AH is a Hessenberg matrix of order k — 1.84 CHAPTER 2. For the general step. suppose that we have generated Householder transformations HI . . . .1 illustrates the course of the reduction on a matrix of order five. Let Hk be a Householder transformation such that A If we set Hk = diag(/A. THE QR ALGORITHM Figure 2. JYjt).. then Thus we have annihilated all the elements in the first column below the first subdiagonal. then which reduces the elements below the subdiagonal of the fcth column to zero. . The elements with hats are the ones used to generate the Householder transformation that . H I ) .

SEC. 5. k+l:n] . 11.U*VH ^[^+2:71. 2. 1. 9. u = Q[k+l:n.Q) Reduce A H =A for k = I to ra-2 Generate the transformation housegen(H[k+l:n. THE EXPLICITLY SHIFTED QR ALGORITHM 85 This algorithm reduces a matrix A of order n to Hessenberg form by Householder transformations.n] = e n . 10.k+l:n]*u H[l:n.W*VH 17.k+l:n] H[k+l:n. 13. The reduced matrix is returned in the matrix H.k] 15. k]) Q[k+l:n.^ = 0 Postmultiply the transformation v = H[l:n. and the transformations are accumulated in the matrix Q. 8. end hessreduce Algorithm 2. Ar].l 14.2: Reduction to upper Hessenberg form o VolII: Eigensystems . hessreduce( A. u. k+l:n] = H[l:n. 2.H.k] = ek 18.e n _i 4. 6.Q[:. Q[:. 12. forfc = n . k+l:n] . k+l:n] = H[k+l:n. 1.k+l:n] .Q[k+l:n. k+l:n] . H[k+l. Q[k+l:n.2 t o l b y . 3. end for fc 19.n-l]=. k+l:n] 16.k] = u Premultiply the transformation VE = uH*H[k+l:n. VH = «H*Q[fc+l:n.V*UH end k Accumulate the transformations Q[:.

However. n—2) transformations. to the array Q[l:n. the term flam is an abbreviation for "floating-point addition and multiplication.The natural way to accumulate the transformations is to set Q = /." When A is real. An alternative is to save the vectors generating the Householder transformations and accumulate them after the reduction in the order #l(---(#n-4(#n-3(#n-2g)))--')- It is easy to see that this order fills up Q from the southeast to the northwest. There are several comments to make about it. When A is complex.2 implements the reduction to upper Hessenberg form. Thus the count for the premultiplication is about The postmultiplication at the k stage requires about 2n(n—k) flam. which gives a total of n3 flam. and a complex multiplication requires two additions and four multiplications. a complex addition requires two ordinary floating-point additions. Specifically. THE QR ALGORITHM produces the next matrix. The operation count for the accumulation of Q is the same as for the premultiplication. k+l:n]. however. only affects Q[k+l:n. and then accumulate the transformations in the order (•••(((g#i)#2)#3)--0#n-2. n]. At the Arth stage the premultiplication requires about (n-fc) 2 floating-point additions and multiplications (flams) to compute v and a like number to update H. additions and multiplications are composite operations. fc+1. This has the advantage that the reduction and the accumulation can be done in a single loop. we will have to apply the subsequent Hf.86 CHAPTER 2. the flam actually represents a floating-point addition and multiplication on the machine in question. Hence: The* /^r\f»rotif^n fnimf fr\r A Irmnthm 1 O ic flam for the redcatiof of a flam for the accumulatiof oif Q • As mentioned above. Algorithm 2. In this implementation we have saved the generating vectors in Q itself and initialize Q to the identity column by column as the loop regresses. the reduction is complete.e.. so that the application of Hj. since HI is almost full. Note that after three (i. • The matrix Q is the product H\H2 — -Jfn-2. • An operation count for the algorithm can be obtained as follows. Thus in complex arithmetic the flam is . The generation of the transformations is negligible.

The stability result just cited is typical of many algorithms that proceed by a sequence of orthogonal transformations. A^. we make the following definition. For the implications of this result see the discussion following Theorem 1..2 is e1.3. Definition 2. THE EXPLICITLY SHIFTED QR ALGORITHM 87 equivalent to four ordinary flams. If we replace H by A.. Am be generated by the recursion where Rk and Sk are unitary matrices generated from Ak.Set P = Rm-i'"Ri and Q = Si~-Sm-i. • An extremely important observation is the following. Specifically. • The algorithm is backward stable. For later reference.• • #n-2 • Then there is a matrix E satisfvine such that the computed Hessenberg matrix H satisfies Here 7n is a slowly growing function of n.SEC. Here fl denotes floatingpoint computation with rounding unit eM and includes any errors made in generating and applying Rk and Sk. the first column of the orthogonal kmatrix ! generated by Algo rithm 2. 2. then the algorithm overwrites A with its Hessenberg form.4. In other words. This fact can be verified directly from the algorithm. In the sequel we will report operation counts in terms of the arithmetic operations—real or complex—in the algorithm and leave the conversion to machine operations to the reader. the computed Hessenberg form is the exact Hessenberg form of A + E. Then we say that the sequence (2.. Let the sequence AI.. and the operation counts above must be adjusted by a factor of four. let Hk be the exact Householder transformation generated from the computed Ak and let Q = HI . Most packages choose this option to save storage. We will use it later in deriving the implicit shift version of the QR algorithm.22) is STABLE IN THE USUAL SENSE if there is a matrix E and a slowly growing function 7m such that Vol II: Eigensystems . • The algorithm generates H in situ. where ||E\\ is of the order of the rounding unit compared to \\A\\.

A plane rotation (also called a Givens rotation) is a matrix of the form where The transformation is called a rotation because in the real case its action on a two vector x is to rotate x clockwise through the angle 9 = cos"1 c. Plane rotations can be used to introduce zeros into a 2-vector. Plane rotations CHAPTER 2. Specifically. We could easily use Householder transformations for this purpose. a plane rotation in the (t. j)-plane is a matrix of the form . The proof is constructive and involves the use of 2x2 unitary transformations.88 and Am = P(Al+E)Q. THE QR ALGORITHM We have mentioned that a QR step leaves a Hessenberg matrix invariant. If we set v = ^/\a\2 + |6|2 and then (note that c is real) We can apply plane rotations to a matrix to introduce zeros into the matrix. let a and 6 be given with a ^ 0. Specifically. but it is more efficient to use another class of transformation—the plane rotations.

Because our pseudocode can alter the arguments of routines this means that if a and 6 are elements of a matrix then those elements are reset to the values they would assume if the rotation were applied. • For the sake of exposition we have stayed close to the formulas (2. then the routine returns the identity rotation. which saves work when we apply the rotation to a matrix. . • The constant r defined in statement 9 is a scaling factor designed to avoid overflows in statement 10 while rendering underflows harmless. we could replace r by | Re(a)| + | Im(a)| + | Re(a)| +1 Im(a)|. the routine returns the rotation that exchanges two components of a vector. 2. j)-plane is an identity matrix in which a plane rotation has been embedded in the submatrix corresponding to rows and columns i and jTo see the effect of a rotation in the (i. If a is zero. and we can introduce a zero anywhere in the it\\ or jth column. These formulas can be considerably economized (for example. a simple routine to combine two vectors will serve to implement the general application VolII: Eigensystems .3 generates a plane rotation satisfying (2. Four comments on this algorithm. They can also be simplified when a and 6 are real.24).23).j)-plane combines rows i and j and leaves the others undisturbed.. Since the application of a rotation to a matrix alters only two rows of the matrix. • The final values of the application of the rotation to a and 6 are returned in a and 6. By an appropriate choice of c and s we can introduce a zero anywhere we want in the zth or jth row. Algorithm 2. • A simpler and in many ways more natural definition of c and s is which makes the final value of a equal to v.SEC. Postmultiplication by P^ affects the columns in a similar way. a rotation in the (i. thus avoiding some multiplications and a square root in statement 9). then Thus premultiplication by a rotation in the (i.)-plane on a matrix. The definition we have chosen makes c real. THE EXPLICITLY SHIFTED QR ALGORITHM 89 In other words. let X be a matrix and let If X and Y are partitioned by rows. • If b is zero.

5 = /j. 4. The cosine of the rotation is real. end rot gen Algorithm 2.24) from the quantities a and b. s = 0. 5.s. THE QR ALGORITHM The following algorithm generates a plane rotation satisfying (2. If(a = 0) 6. ^ = a/\a\ 9r = H+l&l 10.x. end if 8. rotapp(c. if (6 = 0) 3.4: Application of a plane rotation o .90 CHAPTER 2. end if 5.y] £ = c*x-f5*2/ y = c*y—s*x ar = * end rotapp Algorithm 2. 6 = 0 return 7. s = 1. return 4. s) 2. c = 1.3: Generation of a plane rotation o This algorithm takes a plane rotation P specified by c and s and two vectors x and y and overwrites them with 1.*b/v 13. The program overwrites a with its final value and b with zero. rotgen(a. v = rHVIa/rp+lfc/rl 2 11. c. 6. 2. c=|a|/i/ 12. c = 0. 1. a = z/*// 14. 6= 0 15. a = 6. 3.

Thus our transformed pair of rows is 1 2 3 4 5 X 0 0 X X X X 0 X X In this example we have actually lost a zero element. Transformations that can introduce a zero into a matrix also have the power to destroy zeros that are already there. We will often refer to these counts collectively as a flrot. 4.. 2. In the real case the count becomes 4 flmlt + 2 fladd. X [ : .^X(j. THE EXPLICITLY SHIFTED QR ALGORITHM 91 of plane rotations.SEC. A practical implementation would do something (e. A pair of X's remains a pair of X's (column 1). 1.4 is such a routine. we must multiply real rotations by complex vectors.-s. • Because the cosine is real. 2. In particular. it is useful to think of the transformations as a game played on a Wilkinson diagram. Vol II: Eigensystems .X[:. in which case the count for each pair of elements is 8 flmlt -f 4 fladd.j]) • The routine rotapp uses a vector t to contain intermediate values. Suppose that P is a rotation in the (i. • The routine rotapp has many applications. t]. j)plane defined by c and s. Here are the rules for one step of the game. leaving context to decide the actual count. Then the following table gives the calling sequences for four distinct applications of P to a matrix X.X[j.X[:.']. PX :rotapp(c. The element of your choice becomes zero (X in column 2).:}) XP : wtapp(c.X(i.X{i. Here are some comments on the routine. write out the operations in scalar form) to insure that a storage allocator is not invoked every time rotapp is invoked. Algorithm 2. the cost of combining two components of x and y is 12 flmlt + 8 fladd. 3. A pair of O's remains a pair of O's (column 3). In some algorithms. j ] ) XPR:rotapp(c. -s. Begin by selecting two rows of the matrix (or two columns): 1 2 3 4 5 X X 0 X X 0 X 0 0 X Then after the transformation the situation is as follows.s. plane rotations are frequently used to move a nonzero element around in a matrix by successively annihilating it in one position and letting it pop up elsewhere—a process called bulge chasing.s. A mixed pair becomes a pair of X's (columns 4 and 5).g. X[:. In designing algorithms like this.i].']} pKX:wtapp(c.

.n with Pk. after the secon ..2: Reduction to triangular form in QR step Invariance of Hessenberg form The reason for reducing a matrix to Hessenberg form before applying the QR algorithm is that the form is invariant under a QR step and the operation count for performing a QR step is 0(n2). The QR step is divided into two parts.TQ = TP^ • • • P*J_i. n » so that H is the result of the single. • The pre. Specifically. then Q is the Q-factor of H — i. Thus we have shown that a QR step preserves upper Hessenberg form and requires 0(n 2 ) operations. we consider only the unshifted step. Two comments are in order.5 implements these two steps. Pn-i. Algorithm 2. In the first part. Note that if <2H = -Pn-i. the unitary Q of the QR algorithm. It is easy to see that the algorithm requires about n2 flrot. Figure 2. The point to observe is that the postmultiplication by P^-+1 introduces a nonzero in the (i+1. we will use plane rotations to implement the basic QR step. Since shifting the step consists only of altering the diagonal elements at the beginning and end of the step. • The application of plane rotations dominates the calculation. This step is illustrated by the Wilkinson diagrams in Figure 2. fc-|-l)-plane so that T = Pn-i. t)-element but leaves the other zero elements undisturbed. THE QR ALGORITHM Figure 2. this translates into O(n2) operations.t is upper Hessenberg.and postmultiplications can be interleaved.2. Thus the final matrix H is in Hessenberg form.. the resul..3 illustrates this postmultiplication step.k+i acting in the (fc. and we have shown: If a Qr step is performed on an upper GHessenberg matrix. Let H be the upper Hessenberg matrix in question.n • • -Pi2. Whether the matrix is real or complex. To establish these facts. unshifted QR step. The second step is to form H ..n ' • • Pi^H is upper triangular. we determine rotations Pi2.e.92 CHAPTER 2.

5. 7.k]. 6. k+l:n]) end for k fork = lton-l rotapp(ck. THE EXPLICITLY SHIFTED QR ALGORITHM 93 Figure 2. for k = 1 ton—1 rotgen(H[k.SEC. ck. 3. H[k+l. k]. H[k+l. H[l:k+l.5: Basic QR step for a Hessenberg matrix o Vol II: Eigensystems . $k) rotapp(ck. k].3: Back multiplication in QR step /s This algorithm applies one unshifted QR step to an upper Hessenberg matrix H. 2. sk. 2. k+l:n]. 4. H[k. H[l:k+ltk+l]) end for k Algorithm 2. 1. sk.

To see why./&.to zero amounts to replacing A by A = A .. so that there is an orthogonal matrix Q such that QHAQ — A.2 suggest that other subdiagonal elements may also tend to zero.n-i will tend rapidly to zero.• to zero. The process continues with the generation and application of Pk. THE QR ALGORITHM rotation (P^ in Figure 2. we will assume that the current value A of A has been computed without error. However.4).26).-+!.5) exists primarily to show the invariance of upper Hessenberg form under a QR step. If the elements of A are roughly the same .4. When e equals the rounding unit £M. we must first decide what it means for a subdiagonal element to be negligible. then the problem deflates. Now setting'/».. the results in §2. let us examine the consequences of setting /&. albeit slowly. At the same time. to claim this bonus. • The method is stable in the usual sense (Definition 2. 2.94 CHAPTER 2.2) has been premultiplied we can postmultiply by PJ^. To simplify the discussion.. THE EXPLICITLY SHIFTED ALGORITHM Algorithm (2.• to zero amounts to making a normwise relative perturbation in A of size e. If we set then Moreover. the perturbation E is of a size with the perturbation due to rounding the elements of A. A natural criterion is to choose a small number e and declare /it+i?l negligible provided where A is the matrix we started with.. then In other words. In this subsection we will work our way to a practical algorithm for reducing an upper Hessenberg matrix H to Schur form.1 we have reason to expect that when the shifted QR algorithm is applied to an upper Hessenberg matrix H the element /&n. setting /&*_!..•+!. Detecting negligible subdiagonals From the convergence theory in §2. if hi+i^ satisfies (2. If any subdiagonal element can be regarded as negligible.-+! . and we can save computations.k+i being followed by the application of P^_1 ^.-ei+ieJ1.

SEC. A typical situation is illustrated by the following Wilkinson diagram: Here the seventh.>t-| + |/i.3.26) does not insure. something the criterion (2.26) becomes less compelling. As the algorithm progresses. In practice. For more on this point see the discussion following Theorem 1. however. THE EXPLICITLY SHIFTED QR ALGORITHM 95 size. 2.26). revealing three eigenvalues at the eighth. so that the remaining eigenvalues are contained in the leading principal submatrix of order three and in the principal submatrix of order four whose elements are labeled 6. we can deflate the problem. and tenth diagonal elements. In that case we need only transform Vol II: Eigensystems . then we need only perform QR steps between rows and columns four and seven. we must distinguish two cases. one can replace \\A\\p by any reasonably balanced norm—say the 1.27) easy to derive. ninth. however. In applying the rotations. In the shifted QR algorithm subdiagonal elements will usually deflate beginning with the (n. eighth. subdiagonal elements in other positions may also deflate.26) only because its unitary invariance made the bound (2. The third subdiagonal element has also become negligible. Deflation Once a subdiagonal element has been deemed insignificant. one can revert to (2. n—1)-element and proceeding backward one element at a time. It is worth observing that we used the Frobenius norm in (2. The usual cure for this problem is to adopt the criterion If by chance |/i. If we decide to treat the latter first. and ninth subdiagonal elements have become negligible.+i)Z-+i| is zero._i>z to zero only if it is small compared to the nearby elements of the matrix. the matrix frequently has small eigenvalues that are insensitive to small relative perturbations in its elements. then this is as small a perturbation as we can reasonably expect. In this case it is important that we set /i.or oo-norm. The first is when we only wish to find the eigenvalues of A. In such cases. When the elements of A are not balanced—say when the matrix shows a systematic grading from larger to smaller elements as we pass from the northwest corner to the southeast—the backward error argument for the criterion (2.

while (*7>1) if (#[i7.29) we can determine by inspection the rows—call them il and /2—that define where the QR step is to be applied. Back searching In (2. 1.6: Finding deflation rows in an upper Hessenberg matrix o the elements labeled b in (2. *2) /7 = i2 = l 3. if we wish to compute a Schur decomposition of A.i7-l] = 0 if(il = i2) i2 = il = il-l 8. il = il-l 13. 6. z'7. 7.29). in which case the QR step is to be taken between these indices.8. or the matrix is completely deflated. and the QR algorithm is finished. i7-l] is negligible) #[i7. 1 = il = i2. end while 15. It either returns distinct indices il and i'2. Algorithm 2. else 9. end if 11. we must determine them on the fly by applying whatever criterion we choose to decide if a subdiagonal element is zero. leave while 10. 1. In particular. 4. in which case all the subdiagonal elements of H in rows 1 through I are zero. In this case the transformations must also be accumulated into the matrix Q from the initial reductions to Hessenberg form. end backsearch Algorithm 2. ^. On the other hand. Negligible subdiagonal elements are set to . This is usually done by searching back along the subdiagonal from the current value of i2.96 CHAPTER 2. For details see Algorithm 2. 2. we must apply the transformations to the entire matrix. 1 < il < i2 < I. THE QR ALGORITHM This algorithm takes a Hessenberg matrix H of order n and an index i (1 < t < n) and determines indices il. else 12. the transformations must be premultiplied into the elements labeled 6 and c and postmultiplied into the elements labeled 6 and d. end if 14. 2.6 implements such a loop. in which case the matrix deflates at rows il and 12. 5. backsearch(H'. i2 < t such that one of the following two conditions holds. In practice.

we will consider the matrix Thus we want the eigenvalue of B nearest to d. note that Vol II: Eigensystems . 2. i2]. The cure is to use the Wilkinson shift. To determine the largest root. The problem simplifies if we work with the shifted matrix The Wilkinson shift then becomes K = A + d. as H [/2. the shift approaches the Rayleigh-quotient shift so that we retain the quadratic convergence of the latter. i2}. which is defined as the eigenvalue of the matrix that is nearest H[i2. The procedure is to compute the eigenvalues from the characteristic polynomial using the usual formula for the roots of a quadratic equation. The Wilkinson shift 97 Once we have found a range [il. Moreover. where A is the smallest eigenvalue of B-dl. some care must be taken to insure the stability of the resulting algorithm. 12] within which to perform a QR step. if H is real. However. This shift can move the QR iteration into the complex plane. we must choose a shift. as the method converges—that is. Instead we should compute the largest root Amax first and then compute the smallest root A^n from the relation AminAmax = —q. since it gives local quadratic convergence to simple eigenvalues. The natural candidate is the Rayleigh-quotient shift K = H[i2. However.SEC. THE EXPLICITLY SHIFTED QR ALGORITHM zero as they are detected. the Rayleigh-quotient shift is also real and cannot approximate a complex eigenvalue. The characteristic eauation of B — dl is The roots of this equation are To avoid cancellation we should not compute the smallest root directly from this formula. The computation of the shift requires the solution of a 2x2 eigenvalue problem. i2— 1] approaches zero. To simplify our notation.

6. To avoid overflow. This scaling factor can be replaced by the sum of the absolute values of the real and imaginary parts of the elements of B. the algorithm scales B so that its largest element is one in magnitude.5*((a/s) . 1.(d/s)) 8. wilkshift(a.2) gives an algorithm for computing the Schur form of a general matrix.4). 9 = (b/s)*(c/s) 6.1 says that once hqr starts converging to a simple eigenvalue. Reduction to Schur form We can now assemble the algorithmic fragments developed previously into an algorithm for the unitary triangularization of an upper Hessenberg matrix. • The routine hqr is stable in the usual sense (Definition 2. which is cheaper to compute. end if 12.8 computes the Schur form of an upper Hessenberg matrix by the QR algorithm. K) 2. provided an appropriate deflation strategy is used.30) is positive. d. r = -y/p2 + g 9.7: Computation of the Wilkinson shift o Consequently we want to choose the sign ± so that the middle term in (2. Algorithm 2. if (5 = 0) return fi 5.7 implements the computation described above. Algorithm 2. Unfortunately. end wilkshift Algorithm 2. p = 0. AC = AC . it will converge quadratically.s*(<jf/(p+r)) 11. • The convergence theory developed in §2. there is no theory . • The routine hqr combined with hessreduce (Algorithm 2. K=d 3. c. Here are some comments on the algorithm. THE QR ALGORITHM B a =( 5) \c dj wilkshifi returns the eigenvalue of B that is nearest d.98 Given the elements of the matrix CHAPTER 2. 5 = |o| + |6| + |c| + |d| 4. if (Re(p)* Re(r) + Im(»* Im(r) < 0) r = -r fi 10. if(g^O) 7.

19. end for z #[i2. 5t-) 15.K 16.8: Schur form of an upper Hessenberg matrix o VolII: Eigensystems . st-. ^T[z. The routine gives an error return if more than maxiter iterations are required to deflate the matrix at any eigenvalue. 21. hqr(H.i2]. 25.K) 12.i7]~K 13. z'2) 9. end for i 18. JI[i2. i+1] ./2-l]. fi 7.«+l]) rotapp(ci.z2].SEC.«]. 5" [t+1. #[z7. i+1]) ^[i.-.H" [z. fT[l:t+l. #[/2-l. s. JT[i2. il = 1. iter — iter+1 6. cj. 20. if (i2 = 1) leave fc^rfi 10.«]. Q[l:n. backsearch(H'. 23. st-. if(i2^0W/2)iter=Ofi 11. oldi2 = 12 8. t]. rotapp(ci. for i = i/toi2-l rotapp(ci.i7] = ^T[i7. ro?gen( . i+1] = H[i+l. fori = /7toi2-l 14.*] + « 22. Q[l:n. 1. i+l:n]. hqr overwrites it with a unitarily similar triangular matrix whose diagonals are the eigenvalues of H. iter . /2. THE EXPLICITLY SHIFTED QR ALGORITHM 99 Given an upper Hessenberg matrix H. 5"[i+l. H[i+l.i+l:n]) 17.0 4. i]. The similarity transformation is accumulated in Q. z7. if (iter > maxiter) error return. i2 = n 3. wi7jbfc//r(#[*2-l. Q.i] = 5r[i./2] + « end while end /i<7r Algorithm 2. Jf [!:«+!. while (true) 5. 2.i2] = ^[i2. 24. maxiter) 2.i2-l].

31) becomes f fcn3 flrot. Thus we can compute the eigenvectors of A by computing the eigenvectors of T and transforming them by We have already derived a formula for eigenvectors of a triangular matrix (see p. Still later. somewhat less to compute the second eigenvalue. THE QR ALGORITHM that says that the algorithm is globally convergent. and the upper bound (2. hqr gives an error return when in the course of computing any one eigenvalue a maximum iteration count is exceeded. so that il > 0. Most real-life implementations do not return immediately but instead try an ad hoc shift to get the algorithm back on track. if we partition Q^ . if A is defective it does not exist. the demand for eigenvectors is so great that all packages have an option for their computation. The accumulation of the transformations in Q requires the same amount of work to bring the total to 2ra(i2-i2) flrot. For more see §2.5. i'2]. then the matrix of right eigenvectors of A is QX. It should be stressed that the eigenvalue-eigenvector decomposition is not a stable decomposition—in fact. We can derive an upper bound by assuming that il = 1 and that it takes k iterations to find an eigenvalue. This reduces the operation count for a QR step in the range [i7. we may restrict the range of our transformations to the elements labeled 6 in the Wilkinson diagram (2. An overall operation count depends on the number of iterations to convergence and the behavior of the ranges [j7. In most applications. Specifically. Given il < 12. which gives further savings.il)2 flrot. hqr takes several iterations to deflate the first eigenvalue. i2] to 2(i2 . however. • The operation count is dominated by the application of plane rotations. we no longer need to accumulate the transformations. the matrix deflates at the upper end. Nonetheless. if the matrix is large enough. Moreover. neither of which is known a priori.and postmultiplication of H requires about n(i2 — il) flrot. Typically. In some applications a Schur decomposition is all we need. and so on.100 CHAPTER 2. what is wanted is not the Schur vectors but the eigenvectors of A.29). Eigenvectors We have now seen how to compute the Schur decomposition of a general matrix A. • To guard against the possibility of nonconvergence. If A = QTQU is the Schur decomposition of A and X is the matrix of right eigenvectors of T. The bound is approximately • If we are only interested in computing the eigenvalues of H. the pre. Eventually the subdiagonals of H become small enough so that quadratic convergence is seen from the outset. 3).

First. which we further reduce by a recursive application of the same procedure. in which case our algorithm will terminate attempting to divide by zero. All this gives the following algorithm. However. THE EXPLICITLY SHIFTED QR ALGORITHM and Tkk is a simple eigenvalue of T. 2. let us drop subscripts and superscripts and consider a general algorithm for solving the equation (T — ///)# = 6. then some diagonal of TU may be equal to Tkk. x = b 2. end for k This is essentially the algorithm we will use to compute eigenvectors.fil)x* + £t* = &*. as in (2. it has two drawbacks. for j = nto 1 by -1 3. It follows that if we write the fcth eigenvector in the form (k} then we can obtain x\ ' by solving the upper triangular system To derive an algorithm for solving (2. The cure Vol II: Eigensystems .//)£ = (3.x[j]*T[l:j-l.32). If we partition this equation in the form then the second row of the partition gives the equation (r . from which we get From the first row of (2. 1. whence This is a system of order one less than the original.SEC. If T has multiple eigenvalues. x(j] = x x[j}/(T(jJ}-») 4. z[l:j-l] = a[l:j-l] .j] 5.32). then 101 is an eigenvector of T.33) we have (T* . in our final algorithm the shift /^ will be a diagonal of our original matrix T.

The strategy is a variant of one found in LAPACK.9 below we restrict it to be greater than 6M/^ unless that number is itself too small. the diagonal element is adjusted to keep the divisor d from becoming too small. There is a row-oriented version of the basic algorithm for solving triangular systems. if underflows are set to zero. the eigenvector itself must be sensitive to small perturbations in A—i.9 computes the right eigenvectors of a triangular matrix. This algorithm is to be preferred in languages like C that store their arrays by rows. which is larger than the cost of the unsealed algorithm. however. it must be ill conditioned.. • If there is no rescaling. we may scale the vector so that the offending component has modulus one. if a component of the solution is in danger of overflowing. In Algorithm 2. • The cost of rescaling can be comparatively large. Since simple eigenvectors are determined only up to a scalar factor. • As we mentioned above. where underflow is a number just above the underflow point. This orientation is to be preferred for programming languages like FORTRAN that store arrays by columns. However. when the component to be computed would be bigger than a number near the overflow point of the computer. we see that the replacement corresponds to a small perturbation in A. Algorithm 2. Each eigenvector Xi satisfies where ||jE". This means that for the procedure to introduce great inaccuracies in the computed eigenvector. the attempt to compute the eigenvectors can result in overflow.e. the cost is ^n3 flmlt. in which smallnum is taken to be ^underflow. In particular.102 CHAPTER 2. This implies that if the eigenvector is not ill conditioned then it will be accurately computed. • The algorithm is stable in the following sense. It is worth remem- . which can be adapted to compute right eigenvectors.3. &]—// to be too small. in which case we replace it with a number whose reciprocal is safely below the overflow point. In such cases. This may result in the underflow of very small components. Many people are somewhat nervous about replacing an essentially zero quantity by an arbitrary quantity of an appropriate size.27)]. • The algorithm is column oriented in the sense that the matrix T is traversed along its columns. For this reason the algorithm rescales only when there is a serious danger of of overflow — specifically. Here are some comments. by projecting the error made back to the original matrix A [see the discussion leading to (2. the results will generally be satisfactory. if scaling is done at every step in the computation of every eigenvector. the algorithm has an operation count of |n3flam. For example. For a further discussion of these matters see §1:2.||/||TJ| < *y€M for some modest constant 7. The second problem is that in some applications eigenvectors can have elements of widely varying sizes. THE QR ALGORITHM for this problem is to not allow T[k. we are free to scale the solution.

X[k+l:n.k] 6.j]-T[k. if(\X[j. d = T\j. k] \. THE EXPLICITLY SHIFTED QR ALGORITHM 103 Given an upper triangular matrix T. end righteigvec Algorithm 2.k] = -T[l:k-l. X) 2. smallnum = a small number safely above the underflow point 3.k]\ 14. forj = fc-ltolby-1 10. *]*T[l:j-l. dmin = max{ eM * | T [k. endforj 19. if (\d\ < dmin) d = dmin fi 12. end if 16. righteigvec(T. X\j.k] 11. for k — n to 1 by -1 5.9: Right eigenvectors of an upper triangular matrix o Vol II: Eigensystems . k] = X[l:j-l.SEC.k]\\2 20. X [ l : k .k] = X[l:k. The columns of X are normalized to have 2-norm one. this routine returns an upper triangular matrix X of the right eigenvectors of X. k} . X[l:k. X[l:k-l.k]\/bignum>\d\) 13. end for k 21.X ( j .k]/d 17.k] 15. s = \d\/\X{j. 1. j] 18. 2.k]/\\X[l:k. X[kjk] = l 7.k] = X\j. k ] = 3*X[l:k.k] = Q 8. smallnum} 9. X[l:j-l. bignum = a number near the overflow point 4.

A similar algorithm can be derived to compute left eigenvectors. • Although we have chosen to compute the entire upper triangular array of eigenvectors. then is a left eigenvector of T corresponding to Tkk.34) that the residual T{ — Txi — taxi satisfies Thus the norm of the scaled residual for each computed eigenvector is near the rounding unit. the resulting matrix may not be the inverse of the matrix of right eigenvectors — we say that the right and left eigenvectors have lost biorthogonality. If we compute the left eigenvectors separately. BITS AND PIECES The algorithm sketched in the last subsection is an effective method for finding the eigenvalues (and eigenvectors. THE QR ALGORITHM bering that Ei is different for each eigenvector zt-. then directly inverting the matrix of right eigenvectors will give better results. However. the methods are not numerically equivalent. • It follows from (2. Unfortunately. If biorthogonality is required. If X is the matrix of right eigenvectors of a diagonal matrix. there is an alternative. Specifically if we partition T as usual in the form and assume that Tkk is a simple eigenvector of T. A detailed treatment of these topics would lengthen this chapter beyond reasonable . which continues to hold for the eigenvectors of the original matrix. then FH = X~l is the matrix of left eigenvectors.Thus we can also find left eigenvectors by solving triangular systems. Thus we can also obtain the left eigenvectors of a matrix by inverting the matrix of right eigenvectors.5. essentially the same algorithm could be used to compute a few selected eigenvectors.104 CHAPTER 2. This fact. there are modifications and extensions that can improve the performance of the algorithm. 2. provides an easy diagnostic for routines that purport to compute eigenvectors: compare the scaled residual against the rounding unit. But this will not work in cases where we want only a few left and right eigenvectors. if desired) of a general complex matrix. so that as a whole the eigenvalueeigenvector decomposition is not necessarily stable. However.

and balancing. It does not make much sense to try more than one or two ad hoc shifts. deflation at consecutive small subdiagonal elements. where and Vol II: Eigensystems . An alternative is to restart the iteration with a new shift. Instead. Suppose we begin the QR step at row i. For purposes of illustration we will assume that the elements in this diagram are real and that d is positive. The LAPACK routine CGEEV takes the shift to be | Re(/^. Specifically. where i is the row where the problem is supposed to deflate. i—l)-element of H. The subsection concludes with a discussion of graded matrices.SEC. we can sometimes start the QR step at the diagonal to the right of the first small element. we will give a brief survey of the following topics: the ad hoc shift. consider the Wilkinson diagram where the upper-left-hand x represents the (i—2. sorting for eigenvalues. There is little general theory to guide the choice of ad hoc shifts (which is why they are called ad hoc). THE EXPLICITLY SHIFTED QR ALGORITHM 105 bounds. If that fails one can either try again or give an error return. 2. After the reduction to triangular form. we have the following diagram.8)._i)| + | Re(/ij_i >2 _2)|. The problem usually manifests itself by the algorithm exceeding a preset limit on the number of iterations (see statement 6 in Algorithm 2. Ad hoc shifts Convergence of the Hessenberg QR algorithm is not guaranteed. In such an event the natural action is to give an error return. Aggressive deflation If two consecutive subdiagonals of H are small but neither is small enough to permit deflation. commonly referred to as an ad hoc shift. even with the Wilkinson shift.

we may be able to start the QR step even when e\ or €2 are not small enough to deflate the matrix. Then the computed value of c will be one. Now suppose that €2 is so small that ft(d2+€2) = d2. if it exists. we can ignore it. Note that it is the square of ei and the product €162 that says whether we can start a QR step. It takes at most n2 comparisons to isolate such a row. and the computed value of the element ccj will simply be €1 —unchange from the original. we can assume that the matrix is deflated at the element ej —at least for the purpose of starting a QR step. Moreover. THE QR ALGORITHM The postmultiplication by the rotations does not change elements in the second column.106 CHAPTER 2. if the product £162 is so small that it is negligible by whatever deflation criterion is in force. Consequently. this matrix can be permuted so that it has the form We can then repeat the process on the leading principal matrix of order ra-1. Thus for c\ and e2 sufficiently small. At an additional cost of O(n2} operations. Since the product of two small numbers is much smaller than either. if A has the form then the single nonzero element in the second row is an eigenvalue. and so on until it fails to isolate an eigenvalue. For example. it is possible that some eigenvalues are available almost for free. the computed value of the element sci = ei^/1' is €i €2/d. The result is a permutation of A that has the form where T is upper triangular. . Cheap eigenvalues If the original matrix A is highly structured.

we can permute A into the form where S is upper triangular. We proceed by balancing the elements of a matrix one row and column at a time. if it has. they are accidental. for example. Balancing Many applications generate matrices whose elements differ greatly in size. Because of its cheapness. by a poor choice of units of measurement. In this case balance can often be restored by the following procedure.SEC. Let us consider the case of the first row and column. caused. however. In some cases these disparities are a a consequence of the structure of the underlying problem. Properly implemented the process requires only 0(n2) operations per eigenvalue. the process is usually made an option in matrix packages. the form 107 then it can be permuted to the form Proceeding as above. 2. THE EXPLICITLY SHIFTED QR ALGORITHM Turning now to A. Let p ^ 0. Thus we have deflated the problem at both ends. and let Then (for n = 4) If we now require that p be chosen so that the first row and column are balanced in the sense that Vol II: Eigensystems . the same as for a single QR step. say. In others.

If we set then is graded downward by rows. THE QR ALGORITHM Having balanced the first row and column. we can go on to balance the second. For this reason. The balancing procedure is relatively inexpensive. and so on.while is graded downward by columns. However. For example. cycling back to the first if necessary. the third. a matrix is graded if it shows a systematic decrease or increase in its elements as one proceeds from one end of the matrix to the other.0515156210790460e-16 -1. Although the balancing of one row and column undoes the balancing of the others. Graded matrices Loosely speaking.108 then we find that CHAPTER 2. For example. Moreover. Balancing often improves the accuracy of computed eigenvalues and eigenvectors.5940000000000001e+00 . the method can be shown to converge to a matrix in which all the rows and columns are balanced. We therefore say that A is graded downward by diagonals. Each step of the iteration requires only 0(n) operations. the matrix exhibits a symmetric grading from large to small as we proceed along the diagonals. at times it can make things worse. the computed eigenvalues of A are 5.1034150572761617e-33 1. any package that incorporates balancing should include an option to turn it off. It turns out that the QR algorithm usually works well with matrices that are graded downward by diagonals. a very loose convergence criterion will be sufficient to effectively balance the matrix.

4e-01. we get the following results. This is a small error compared to the norm of B. 3.3e-16. However.2900000000000000e-24 -7.4e-16. THE EXPLICITLY SHIFTED QR ALGORITHM 109 which are almost fully accurate.OOOOOOOOOOOOOOOOe+00 This display shows that the first column of the Hessenberg form is the same as the one obtained by first setting the element 1.6859999999999994e-33 1.1e-01.Oe+16. while the computed eigenvalues of DAD~l have relative errors of 1. For example. If we compute the eigenvalues of B.1190000000000004e-17 7. 6.8e-16.290e-24 to zero. consider the first column of B and the corresponding first column of the Hessenberg form of B: 6. 3.6859999999999994e-33 6. and the largest is correct to working precision. On the other hand.Oe+00.6859999999999994e-33 1.SEC.0515156534681920e-16 -1. O. from which it is seen that the smallest eigenvalue has no accuracy at all. In particular the relative errors in the eigenvalues of the altered matrix B are 1. Vol n: Eigensystems .3e+00.5940000000000005e+00 The relative errors in these eigenvalues are 3. The two smaller eigenvalues of both matrices have no accuracy.6e+00. 1. 2. 2. reverse the rows and columns of A to get the matrix which is graded upward by diagonals. 2.Oe+00. but the smallest eigenvalue of B is quite sensitive to it. the algorithm does not do as well with other forms of grading.1189999999999992e-17 O. Row and column grading can also cause the algorithm to fail.1e-08. 2. The computed eigenvalues of D~1AD have relative errors of l.3e+00. inspection of special cases will often show that the algorithm causes a loss of important information by combining large and small elements. the second smallest is accurate to only eight decimal digits. There is no formal analysis of why the algorithm succeeds or fails on a graded matrix. O. For example.

which states that if the function defined by the power series has one simple pole r at its radius of convergence then r — lim^oo ak/ak+i. balancing DAD l and D 1AD •estores the lost accuracy.1955] then showed that if the numbers from one stage of the qdalgorithm were arranged in a tridiagonal matrix T then the numbers for the next stage could be found by computing the LU factorization of T and multiplying the factors in reverse order. Kublanovskaya[154. The result was what is known as the LR algorithm. the LU algorithm is numerically unstable. For more historical material see Parlett's survey [204] and the paper [73] by Fernando and Parlett. Francis went beyond a simple substitution and produced the algorithm that we use today. 223. complete with a preliminary reduction to Hessenberg form and the implicit double shift for real matrices. I have heard that the Q was originally O for "orthogonal" and was changed to Q to avoid confusion with zero. 2.) Rutishauser [224. since it cannot change the direction of the grading.Later Hadamard [106. OR DDIGONALS AND BALANCE THE MATRRIX. We do not claim that this strategy is foolproof.110 CHAPTER 2.1926] independently derived these expressions and gave recurrences for their computation. This suggests the following rule of thumb. Except for special cases.1958]. Later Rutishauser introduced a shift to speed convergence [225. On the other hand. like positive definite matrices.1961] and Francis [76. He generalized the process to a full matrix and showed that under certain conditions the iterates converged to an upper triangular matrix whose diagonals were the eigenvalues of the original matrix.37) for poles beyond the radius of convergence. When dealing with a graDED MATRIX ARRANGE THE GERsding downward BY TOWS COLUMNS. Aitken [1. and therefore anyone employing it should take a hard look at the results to see if they make sense. balancing B has no effect. (For a systematic treatment of this development see [122]. Thus the algorithm preceded the decomposition. The name of the QR decomposition comes from the fact that in Francis's notation the basic step was to decompose the matrix A into the product QR of an orthogonal matrix and an upper triangular matrix. Rutishauser consolidated and extended the recurrences in his qd-algorithm [222.1961] independently proposed substituting the stable QR decomposition for the LU decomposition. However. THE QR ALGORITHM Balancing can help in this case.1954]. For example.1884]. NOTES AND REFERENCES Historical The lineage of the QR algorithm can be traced back to a theorem of Konig [ 151.1892] gave determinantal expressions in terms of the coefficients of (2..6. .

The QR algorithm.SEC. QL.2. The proof given here is due to Wilkinson [301]. this result is implied by work of Ostrowski on the Rayleigh-quotient method [193]. those of the RQ and the LQ. the inverse power method. Kahan (see [206. applies to the other three variants.3. which we may represent symbolically by QR. no general analysis seems to have appeared in the literature. For more details on the QL algorithm.5 for notes and references on Householder transformations and plane rotations The treatment here differs from that in many textbooks in its emphasis on com plex transformations. H. RQ. 2. The stability analysis is due to Wilkinson. It is possible to use nonorthogonal transformations to reduce a symmetric matrix to tridiagonal form—a process that bears the same relation to Givens' and Householder's method as Gaussian elimination bears to orthogonal triangularization. Ch. 352]. however. however. The relation to the power method was known from the first. is not as compelling as the case for GausVol II: Eigensystems . which was originally established using determinants. Chapter 3. [300. His general approach to backward error analysis of orthogonal transformations is given in his Algebraic Eigenvalue Problem [300] and is the basis for most of the rounding-error results cited in this and the next chapter. 173]). Local convergence The local convergence results are adapted from [264]. as was the convergence result in Theorem 2. which is so important to the implicitly double shifted algorithm. Watkins and Eisner [292] have given a very general convergence theory that holds for both the QR and LR algorithms. §§29-30. and the power method The relation of the QR algorithm to the inverse power method was known to J. p. Householder transformations and plane rotations See §1:4. For the special case of the Rayleigh-quotient shift. Wilkinson and W. As mentioned in the text.1. Theorem 2.1. the iteration converges quadratically to a nondefective multiple eigenvalue. The theory of the QR algorithm. The shifted variants of the QR and QL algorithms exhibit fast convergence in the last and first rows. who worked out the numerical details. although the first explicit statement of the relation is in a textbook by Stewart [253. 8]. THE EXPLICITLY SHIFTED QR ALGORITHM Variants of the QR algorithm 111 There are four factorizations of a matrix into a unitary and a triangular matrix. is due to Francis [76]. mutatis mutandis. p. in the first and last columns. where R and L stand for upper and lower triangular. and LQ. see §1. The case for nonorthogonal reduction. For more on complex Householder transformations see [159]. Hessenberg form The reduction of general matrix to Hessenberg form is due to Householder [123] and Wilkinson [297].

Deflation criteria A personal anecdote may illustrate the difficulties in finding satisfactory criteria for declaring a subdiagonal element zero.28)." So it remains to this day. He ended by saying. "But if you want to argue about it. THE QR ALGORITHM sian elimination." and described the relative criterion (2. . The computational advantages of this procedure are obvious. Preprocessing The balancing algorithm (p. in the Handbook [208] and in subsequent packages. For further details. §6. and the method is not used in the major packages. The Handbook ALGOL codes [305] are well documented but show their age. 107) is due to Osborne [187] and was used. In 1965 I attended a series of lectures on numerical analysis at the University of Michigan. The observation that the QR step preserves Hessenberg form is due to Francis [76].. "When a subdiagonal element becomes sufficiently small..8—cannot be regarded as a polished implementation." After one of the lectures I asked him. . For another scaling strategy. "Here is what I do. see [50]. For details and an error analysis see [300. as do their EISPACK translations into FORTRAN [81]. and difficult to read. The LAPACK [3] codes are state of the art—at least for large matrices—but they are undocumented. In addition. His talk was punctuated by phrases like. "How small is small?" He began.. Watkins [289] has shown that the shift is accurately transmitted through the small elements.8-19]. there is little to do but go to the better packages and look at the code. sparsely commented. The Hessenberg QR algorithm As this subsection makes clear.112 CHAPTER 2. Aggressive deflation The idea of deflating when there are two consecutive small subdiagonal elements is due to Francis [76]. the implementation of the QR algorithm is a large task. along with the strategy for the initial deflation of eigenvalues. where it insures global convergence of the QR algorithm. and our final result—Algorithm 2. The Wilkinson shift The Wilkinson shift was proposed by Wilkinson [302] for symmetric tridiagonal matrices. However. many people felt that such small elements could retard the convergence of the implicitly shifted QR algorithm (to be discussed in the next section). where Jim Wilkinson presented the QR algorithm. I would rather be on your side. Its use in the general QR algorithm is traditional. which works well with sparse matrices.

3 we pull the material together into an algorithm for computing the real Schur form.2 we establish an important result that enables us to take advantage of the Hessenberg form. Since the concern is with solving the real eigenproblem: In this section A will be a real matrix of order n. in §3. 3. each such block containing a conjugate pair of eigenvalues. Contrary to (2. Unfortunately. which can also be used to solve generalized eigenproblems and compute the singular value decomposition. Since real matrices can have complex eigenvalues and eigenvectors. This matrix is then passed on to a variant of the QR algorithm—the QL algorithm—that also works from the bottom up. In this section we will show that there is a real equivalent of the Schur form and that the QR algorithm can be adapted to compute it in real arithmetic.SEC. Hence in §3. THE IMPLICITLY SHIFTED QR ALGORITHM Graded matrices 113 There is little in the literature about general graded matrices. VolII: Eigensy stems . For a general theory and references see [266]. Reinsch. it is generally impossible to avoid complex arithmetic if the actual Schur form is to be computed. The fact that the direction of the grading must be taken into account was noted for symmetric matrices by Martin. and Wilkinson [169]. 3. THE IMPLICITLY SHIFTED QR ALGORITHM In the previous section we described a variant of the QR algorithm that is suitable for computing the Schur decomposition of a general complex matrix. The adaptation involves a technique called implicit shifting. The following theorem establishes the existence of such decompositions.36). complex arithmetic is quite expensive and to be avoided if at all possible. It turns out that these inefficiencies can be removed if the matrix in question is Hessenberg. We begin the section with the real Schur form and an inefficient algorithm for computing it. REAL SCHUR FORM In this subsection we derive the real Schur form and present a preliminary algorithm for its computation.1. 3. Finally. they recommend that the matrix be graded upward. Real Schur form A real Schur form is a Schur form with 2x2 blocks on its diagonal. because their version of the reduction to Hessenberg form starts at the bottom of the matrix and works up.

or By computing the characteristic polynomial of Z. Chapter 1). Because RLR is similar to L. The blocks can be made to appear in any order. its eigenvalues are A and A. Since y and z are independent (see Theorem 1. Chapter 1. 1 . Proof. z) be a complex eigenpair with From the relations 2.vz and Az = w 4. one can verify that the eigenvalues of L are A and A. Let A be real of order n. Specifically. and It follows as in the proof of Theorem 1.or.1 (Real Schur form). Chapter 1.LIZ. The blocks of order two contain the pairs of complex conjugate eigenvalues of A. R is nonsingular. that which completes the deflation of the complex conjugate pair A and A. Then there is an orthogonal matrix U such that T = £7T AU is block upper triangular of the form The diagonal blocks ofT are of order one or two. Then (y z] = X\R. Hence X\ = (y z)R~l. The blocks of order one contain the real eigenvalues of A.3. • Three comments.114 CHAPTER 2. let (A. THE QR ALGORITHM Theorem 3. Now let be a QR decomposition of (y z).12.y = x + x and Izi — x .y . The proof is essentially the same as the proof of Theorem 1. we find that Ay = fj. except that we use two real vectors to deflate a complex eigenvalue.12.

and then computing H = QTHQ. This strategy of working with complex conjugate Wilkinson shifts is called the Francis double shift strategy.K!). The starting point is the observation that if the Wilkinson shift K is complex. 3.2) can be done in real arithmetic.2 Re(/c) A +1«|2/. By Theorem 2. Here we will start with an expensive algorithm that points in the right direction. It is an instructive exercise to work out the details.8) in real arithmetic until such time as a complex shift is generated.2Re(«)# + W 2 /. • Like the proof of Schur's theorem. Hence so are Q and H. it turns out that we can use a remarkable property of Hessenberg matrices to sidestep the formation of A2 . We can then apply the shifted QR algorithm (Algorithm 2. Such a space is called an eigenspace or invariant subspace. Thus the QR algorithm with two complex conjugate shifts preserves reality.SEC. Given the necessary eigenvectors. if is the QR decomposition of (H . one can reduce a matrix to real Schur form using Householder transformations. Unfortunately. Suppose that we apply two steps of the QR algorithm. the very first step of this algorithm—the formation of H2 -2 RQ(K)H + \K\2I—requires 0(n3) operations. However. A preliminary algorithm The QR algorithm can be adapted to compute a real Schur form of a real matrix A in real arithmetic. We begin by noting that the reduction of A to Hessenberg form (Algorithm 2. THE IMPLICITLY SHIFTED QR ALGORITHM 115 • A matrix of the form (3. We no turn to this property.1) in which the blocks are of order one or two is said to be quasi-triangular. At that point. we need a new approach to avoid the introduction of complex elements in our matrix. resulting in a real matrix H. one with shift K and the other with shift K to yield a matrix H. which makes the algorithm as a whole unsatisfactory.Kl)(H . But is real. then its complex conjugate K is also a candidate for a shift. VolII: Eigensy stems .1. • The relation implies that A maps the column space of X\ into itself. Such subspaces will be the main concern of the second half of this volume. then H = QHHQ. computing its Q-factor Q. We can even avoid complex arithmetic in the intermediate quantities by forming the matrix H2 . the proof of this theorem is almost constructive.

. as the following argument suggests. where \D\ = /.« We are now in a position to give conditions under which a reduction to upper Hessenberg form is essentially unique. then H = Q^AQ is also a reduction to upper Hessenberg form. but with two provisos. then the first column of this relation is Multiplying (3. and we will say that the reduction is determined up to column scaling ofQ. we have . Then H is UNREDUCED if fct+i f » ^ 0 ( i = 1. . Formally we make the following definition. Now a unitary matrix has n(n-l)/2 degrees of freedom (see the discussion preceding Theorem 1.2. Theorem 3. This reduction is clearly not unique. First. we must assume that the subdiagonals of H are nonzero. THE QR ALGORITHM 3. the matrices Q and H are determined by the first column of Q. Chapter 1). there is a limit to this nonuniqueness. Consider the relation If we partition Q = (q1 »• qn) by columns.12.. In either case the rescaling makes no essential difference. If Q = QD. It turns out that this is essentially true. The two reductions are not essentially different. IfH is unreduced then up to column scaling ofQ the matrices Q and H are uniquely determined by the first column ofQ or the last column ofQ. just enough to specify the first column of Q. we have n-1 degrees of freedom left over in Q. given A. the QR algorithm produces a sequence of Hessenberg matrices each of which is similar to the original matrix A.5) and remembering that q^qi = 1 and gj1^ — 0. THE UNIQUENESS OF HESSENBERG REDUCTION Let A be of order n and let H = Q^AQ be a unitary reduction of A to upper Hessenberg form. Since we must use (n— l)(n-2)/2 of the degrees of freedom to introduce zeros in A. The choice of scaling will also affect the elements of H\ specifically..3 (Implicit Q theorem).2. Proof. However. Let H be upper Hessenberg of order n. It is therefore reasonable to conjecture that. let H = QEAQ be a unitary reduction of A to Hessenberg form. hij is replaced by tiihijtij. In reducing A to Hessenberg form H by a unitary similarity. Definition 3. since Q and Q differ only in the scaling of their columns by factors of modulus one.116 CHAPTER 2. For example. Second. we must introduce (n-l)(ra-2)/2 zeros into A. Let A be of order n and let H = QHAQ be a unitary reduction of A to upper Hessenberg form.

THE IMPLICITLY SHIFTED QR ALGORITHM We then have 117 Since /i2i ^ 0. which determines q-2 uniquely up to a factor of modulus one.4) gives the relation Since q\.. Since \\qz\\2 = 1 we may take /i2i = H^gi . .. and we may take which determines qk+i up to a factor of modulus one. qk- 3. the vector Aq\ — h\\qi is nonzero. Its proof is actually an orthogonalization algorithm for computing Q and H. cancellation in the computation of Aqk — hikqi hkkqk can cause the computed qk+i to deviate from orthogonality with qi. . The algorithm itself has numerical drawbacks. • The theorem is sometimes called the implicit Q theorem.. we have Moreover. THE IMPLICIT DOUBLE SHIFT We now return to our preliminary algorithm and show how it may be modified to avoid the expensive computation of H2 — 2Re(«)# -f |«|2/. because it allows the determination of Q implicitly from its first or last column.3.SEC. . Aqk — h\kq\ — hkkqk is nonzero. For example. 3.. Now suppose that qi. It is called the Arnoldi method and will play an important role in the second half of this volume. qk+i are orthonormal..£ is nonzero..huqi ||. We will begin with a very general strategy and then fill in the details.. VolII: Eigensystems .4) we get To show that Q is uniquely determined by its last column we start with the relation and repeat the above argument on the rows of QH starting with the last.. Finally.. Then the fcth column of (3. from the last column of (3. Since /&&+!. q^ have been determined.

Determine the first column c of C = H2-2 Re(K)H+\K\2L 2. We now turn to the details. Getting started The computation of the first column of C = H2 . Let QQ be a Householder transformation such that Q^c = cre\. To do this we do not need to compute K itself. 4.2 RQ(K)H + |Ac|2/ requires that we first compute the scalars 2 Re(«) and |«|2.2Re(»# + \K\2I then H = QTHQ is the result of applying two steps of the QR algorithm with shifts K and R.6) are the computation of the first column of C and the reduction of Q^HQi to Hessenberg form. 2. 1. It turns out that because H is upper Hessenberg we can effect the first calculation in 0(1) operations and the second in O(n2) operations. To see this. if H is unreduced. We will proceed in stages. Set 0 = 0o0i. For it is easily verified that . we must show that H and Q are the matrices that would result from two steps of the QR algorithm with shifts K and K. The following is a general algorithm for simultaneously determining Q and H. Setjy 1 =0H40o. Now if we partition Q0 — (g(°)0i°^). 1. 5. THE QR ALGORITHM Let K be a complex Francis shift of H. The first column of QoQi is proportional to q. This is because QI&I — ej [see (2. Call the accumulated transformations Q\.3). 3. Thefirstcol umn of the reducing transformation is proportional to q. then Q = QoQi and H = (QoQi)TA(Q0Q1). The first column of QQ and the first column of Q are. To establish that this algorithm works. the same. The key computations in (3. We have seen above that if we compute the Q-factor Q of the matrix H2 . up to scaling. partition the QR factorization of C in the form Then c = pq. Use Householder transformations to reduce H\ t upper Hessenberg form H. then from the relation c — O-QQGI we have c = aq(°\ Hence q and q^ are proportional to c.21)]. 3. By construction the matrix (QoQi )r£H(QoQi) is Hessenberg.118 A general strategy CHAPTER 2. Hence by the implicit Q theorem (Theorem 3.

when the problem deflates. see the notes and references. Hence the scaling factor s in statement 2 is well defined. The formulas for the components of h^c involve products of the elements of H and are subject to overflow or underflow.8) of d.7) and (3. the QR step will not be applied to the entire matrix.SEC. For this reason we do not invoke the algorithm with the matrix H.8) has real eigenvalues A and u. This. only the first three components of the first column of H2 are nonzero. Specifically. Note that we may assume that h21 is nonzero. The particular form of the formulas has been chosen to avoid certain computational failures involving small off-diagonal elements. For more on this. but with the specific elements of H used to generate c. suppose that the matrix (3. multiplies c by the scaling factor. The cure is to scale before calculating c. Then Thus the formulas in Algorithm 3. THE IMPLICITLY SHIFTED QR ALGORITHM and 119 Because H is upper Hessenberg. VolII: Eigensystems . we can ignore the multiplicative factor h^\ in the definition of c. but between certain rows and columns of H. 3.1 generate a vector u that starts the iteration with shifts of A and p. They are given by Thus the first three components of the first column of c are given by Inserting the definition (3.1 calculates the vector u generating the Householder transformation to start a doubly shifted QR step.7) of t and (3. for otherwise the problem would deflate. we obtain the following formula for c: Since two proportional vectors determine the same Householder transformation. An important consequence of our substitution of t for 2 RC(K) and d for \K\2 is that our algorithm works even if the Francis double shifts are real. which does not affect the final Householder transformation. in effect. Algorithm 3. In practice.

u) s = l/max{|/z77|.1: Starting an implicit QR double shift o Reduction back to Hessenberg form Let RQ denote the Householder transformation corresponding to the vector u returned by Algorithm 3. w. Algorithm 3. h!2 = s*h!2.120 CHAPTER 2. 6. v] end startqrstep 4. h21 = s*/*27.2Re(«)# + \K\2I to a multiple of e^ 1. THE QR ALGORITHM This program generates a Householder transformation that reduces the first column of C = H2 .\hnnl\. h22. \hnlnl\.\hnn\} hll = s*/*77. h32. hnn.|/z27|. 3. h!2. hnln = s*hnln.\hnln\. 9. h21. ( 8. Then premultiplication by RQ acts only on the first three rows of H.|/z32|. hnnl. hnlnl. hnnl = s*hnnl. we get the matrix . hnn — s*hnn p = hnn — hll q = hnlnl . h22 = s*h22\ h32 = s*h32. 7. 2.|/z72|.1.|/i22|. postmultiplication by RQ operates on only the first three columns of H so that after the complete transformation has been applied. 5. hnln. startqr2step(hll. so that RoH has the form \ / Similarly.hll r = h22 .hll (p*q-hnnl*hnln)/h21 + h!2\ r-p-q c h32 J housegen(c. hnlnl = s*hnln].

The result of applying this transformation is We now continue with a transformation R^ that annihilates the hatted elements in this matrix to give Yet another transformation R3 annihilates the hatted elements in the third column: The last transformation is a special case.2 implements a double implicit QR shift between rows and columns Vol II: Eigensystems . In the usual reduction to Hessenberg form we would annihilate all the elements below the first subdiagonal. since there is only one element to annihilate. This can be done with a 2x2 Householder transformation or a plane rotation RA: Note that the transpose in the above diagram is necessary if R3 is a rotation. But here only two of the elements are nonzero. Algorithm 3. which is symmetric.SEC. 3. and we can use a Householder transformation RI that acts only on rows and columns two through four. THE IMPLICITLY SHIFTED QR ALGORITHM 121 We must now reduce this matrix to Hessenberg form. it makes no difference when #3 is a Householder transformation.

z'2} 7.j:n] = H[i:i+2. end qr2step Algorithm 3. H[l:iu. /2]) 18. iu = min{i-f 3.U*VT 6. rotapp(c.1 (startqrstep). Q) 2.2: Doubly shifted QR step o . *:»+2] = Q[l:n. Q[l:«. 5. this program applies utoH and then reduces H to Hessenberg form. nw) fi 12. rotgen(H[i2-l. 5) 15. Q[l:n.j:n] 5.i2-2]. v = . o+2]-v*ttT 9. w. end for 2 14. H[i2. -y = Q[l:rz.z7} 4. J7[i"2. for i = il to /2-2 3. J3"[l:/2. i:z+2]-v*wT 11. qr2step(H.ff[l:iw. rotapp(c. i7. i]. fi 13. c. The transformations are applied between rows il and 12 and are accumulated in a matrix Q. 12-2]. i:i+2] = H[l:iu.*:*+2]*u 8. /2-1]. 1.j] = H[i+2J] = 0. VT = uJ-*H[i:i+2.i'2-l:n].i2-l:n]) 16. 3. H[i:i+2.122 CHAPTER 2.H"[z'+l:z'+3. THE QR ALGORITHM Given an upper Hessenberg matrix H and the vector u from Algorithm 3.j:n] . s. i2. z:i+2]*w 10. rotapp(c. w. J5T[l:i2. if (t ^ i7) ff[i+l. #[/2-l. j = maxjz—I. i2-l]. /2]) 17. <^[l:n. if (z / /2—2) /i0ws0gew(.

SEC. if it contains real eigenvalues. THE IMPLICITLY SHIFTED QR ALGORITHM 123 il and i2 of H. the subdiagonal elements other than /i n _i >n _2 and /in-i. A 2 x 2 block may require additional processing. Algorithm 3. even when they are not small enough to cause a deflation. then /i n _i )n _2 converges quadratically to zero. It is easy to do an operation count on this algorithm.n converge quadratically to zero. If the eigenvalues are real and nondefective. we can start a QR step at a point where two consecutive subdiagonals are small.8) requires k'n3 complex flrots. At the very least. In either case. The coding is a bit intricate. we must allow for the deflation of a 2x2 block at the bottom of the matrix. If we assume that il = 1 throughout the reduction and that it requires k iterations to compute an eigenvalue. It helps to think of i as pointing to the row containing the subdiagonal at which the current Householder transformation starts. Deflation The behavior of the doubly shifted QR algorithm once asymptotic convergence has set in depends on the eigenvalues of the matrix which determines the shift. see the notes and references. There are several comments to be made about the algorithm. For more on this topic.6 in recognizing deflation of both 1x1 and 2x2 blocks.3 implements a backsearch for deflation.n rnay show a slow convergence to zero so that we must allow for deflation in the middle of the matrix. the explicitly shifted QR algorithm (Algorithm 2. It differs from Algorithm 2. Computing the real Schur form Algorithm 3. As with the singly shifted algorithm. then the total operation count is On the other hand.4 reduces a real Hessenberg matrix to real Schur form. since there are special cases at both ends of the loop on i. If the eigenvalues are complex and nondefective. As usual. both the /i n _ 1>n _ 2 and /in-i. it should be rotated into upper triangular form so that the eigenvalues are exhibited on the diagonal. 3.2 requires 6n(i2—il) flam. • We have already observed that the routine qr2step requires 6n2 (12—il) flam. with the result that: Algorithm 3. In terms of real arithmetic this is Vol II: Eigensystems . where k' is the number of iterations to find an eigenvalue.

1. in which case the matrix deflates at rows il and il. In processing 2x2 blocks all transformations are accumulated in Q. 1 = il — i2.124 CHAPTER 2.3: Finding deflation rows in a real upper Hessenberg matrix . in which case no two consecutive subdiagonal elements of H in rows 1 through t are nonzero. 1 < il < i2 < I. i2 < t such that one of the following two conditions hold. 2. Algorithm 3. THE QR ALGORITHM This algorithm takes a Hessenberg matrix H of order n and an index t (1 < t < n) and determines indices il.

hqr overwrites it with a orthogonally similar quasi-triangular matrix.SEC.4: Real Schur form of a real upper Hessenberg matrix Vol II: Eigensystems . The similarity transformation is accumulated in Q. Algorithm 3. THE IMPLICITLY SHIFTED QR ALGORITHM [25 Given a real upper Hessenberg matrix H. 3. The routine gives an error return if more than maxiter iterations are required to deflate the matrix at any eigenvalue.

THE QR ALGORITHM Even if k — k'.126 CHAPTER 2.4 and transforming them back using the orthogonal transformation Q.9. works in the complex field. Eigenvectors The eigenvectors of the original matrix A can be found by computing the eigenvectors of the Schur form T produced by Algorithm 3. the double shift algorithm is numerically stable in the usual sense. Rather than present a complete program. Thus the problem of finding the eigenvectors of the original matrix A is reduced to computing the eigenvectors of a quasi-triangular matrix T. . Suppose T has the form If we seek the eigenvector corresponding to A = r^ in the form (x j f 2 1) . The singly shifted algorithm.8 and 3.4 are both backward stable that they produce essentially the same matrices. • It should not be thought that because Algorithms 2. It will be sufficient to illustrate the procedure on a quasi-triangular matrix T consisting of three blocks. • Like the single shift algorithm.9) we have whence £22 = -723/(r22 . then we have the equation rp (The reason for putting A to the right of the eigenvector will become clear a little later. The method is a generalization of the back-substitution process of Algorithm 2. the superiority of the double shift algorithm is clear. we will sketch the important features of the algorithm. This is just the usual back-substitution algorithm. For example. In fact. since the real Schur form is real. the doubly shifted algorithm produces a matrix whose complex eigenvalues occur in conjugate pairs. We begin with the case of a real eigenvalue. and the imaginary parts of a pair of complex eigenvalues can drift away from conjugacy — very far away if the eigenvalue is ill conditioned. This is yet another argument for computing the real Schur form. on the other hand. which would necessarily be lengthy. since each iteration of the double shift algorithm is equivalent to two steps of the single shift algorithm. We will assume that the eigenvalues of a diagonal block are distinct from those of the other diagonal blocks.A). one would expect that k < k'.) From the second row of (3.

10) we have Vol II: Eigensystems . The system is nonsingular because by hypothesis A is not an eigenvalue of T\\. On the other hand. If the row in question has a scalar on the diagonal. This system is nonsingular because T22 is not an eigenvalue of L. 3.10). we have from which it follows that Since L is similar to T^ it contains the complex conjugate pair of eigenvalues of T^. then two components of the eigenvector are obtained by solving a system of order 2. then the process reduces to the usual back-substitution. We already have two examples of such block eigenvectors in (3.9) we have 127 Hence we may obtain x\ as the solution of the linear system (T\\ — \I}xi = —tis — £2*1 • Here we have deviated from the usual back-substitution algorithm: the two components of the eigenvector in x\ are obtained by solving a 2x2 linear system. THE IMPLICITLY SHIFTED QR ALGORITHM From the first row of (3.2) and (3. We begin by letting X$ be nonsingular but otherwise arbitrary. From the third row of (3.SEC. if the diagonal block is 2x2. For a general quasi-triangular matrix T the back-substitution proceeds row by row as above.3).10) we have Hence we may compute zj asthesolutionofthe2x2systemorj(r22-i) = -t^A'ss. From the second row of (3. The case of a complex conjugate pair of eigenvalues is sufficiently well illustrated by the matrix We will look for an "eigenvector" (later we will call it an eigenbasis) in the form for some matrix L of order 2. From the third row of (3.

± iv are the eigenvalues of Tys. For larger matrices the process continues. we have where fj..4. where x3 = y3 + ^3 is the right eigenvector of TSS. which we will use to compute eigenvalues of a symmetric tridiagonal matrix. It is easy to cast them into columnoriented form in the spirit of Algorithm 2. Moreover. from the real Schur form to the implicit double shift. which is implemented in the LAPACK routine xHSEQR. we have presented only row-oriented versions of the algorithms. However. which we can solve for AI because 1 n and L have no eigenvalues in common (see Theorem 1. are due to Francis [76]. which is simply ^33. We now turn to the choice of X3.128 CHAPTER 2.9. . The major packages adopt a different approach that is equivalent to our second choice of T^. Arguing as in the proof of Theorem 3. how to solve the linear systems and how to scale to prevent overflow. Specifically.1 [see (3. it is better to take X$ = (y^ 23). there is a single shift variant. Thus our choice generates the real and imaginary parts of the desired eigenvector. There is also a multishift variant (see below).If we then compute X as described above and partition This relation implies that y ± iz are the right eigenvectors of T corresponding to the eigenvalues // ± ip. each row of the matrix yielding either a 2 x 2 system or a Sylvester equation for the corresponding part of X. The idea of an implicit shift has consequences extending beyond the computation of a real Schur form.16. A natural choice is X$ — I—the generalization of the choice £3 = 1 for an ordinary eigenvector. Chapter 1). NOTES AND REFERENCES Implicit shifting The ideas in this section. THE QR ALGORITHM This is a Sylvester equation. In the next section we will show how to use implicit shifting to solve the generalized eigenvalue problem and in the next chapter will apply it to the singular value decomposition.2)]. for a complex eigenvalue A they write out the relation Ax = Ax and backsolve as in the real case but compute the real and imaginary parts explicitly in real arithmetic. This sketch leaves open many implementation details—in particular. 3. if actual eigenvectors are desired. Although we have focused on the double shift version. The principal advantage of this choice is that it does not require any computation to form L.

Vol II: Eigensystems . the result is a sequence of small. THE GENERALIZED EIGENVALUE PROBLEM The generalized eigenvalue problem is that of finding nontrivial solutions of the equation where A and B are of order n.. and Mathias [29]. For example. The problem reduces to the ordinary eigenvalue problem when B = I—which is why it is called a generalized eigenvalue problem. adjacent bulges being chased down the matrix.SEC. Watkins [290] has shown that if the number of shifts is great they are not transmitted accurately through the bulge chasing process. In spite of the differences between the two problems.2. An alternative is to note that if. This algorithm and a new deflation procedure is treated by Braman. If this process is continued. when we treat the symmetric eigenvalue problem. . We then determine an (ra+l)x(ra+l) Householder transformation QQ that reduces c to ei and then apply it to A. can take advantage of level-3 BLAS—the larger m the greater the advantage. THE GENERALIZED EIGENVALUE PROBLEM Processing 2x2 blocks 129 When a 2 x 2 block has converged. We compute the first column c of (A . Specifically.. suppose that we have chosen m shifts KI. it should be split if it has real eigenvalues. the bulge from a 2 x 2 shift is chased two steps down the diagonal then another double shift can be started after it. The shifts can be computed as the eigenvalues of the trailing mxm submatrix. which is chased down out of the matrix in a generalization of Algorithm 3. say.Kml) . The resulting algorithm is called the QZ algorithm. The matrix QoAQo has an (m+l)x(m+l) bulge below its diagonal. Although the generalization results from the trivial replacement of an identity matrix by an arbitrary matrix B. a generalized eigenvalue problem can have infinite eigenvalues. Unfortunately.• • (A «i/). Km. which will have m + 1 leading nonzero components. For example. The paper also contains an extensive bibliography on the subject. This algorithm. we will use single shifts. It also transforms a block with complex eigenvalues into one in which the diagonal elements are equal. The LAPACK routine XLANV2 does this. the problem has many features not shared with the ordinary eigenvalue problem. Byers.. 4. 4. Moreover. and the purpose of this section is to describe it. The multishifted QR algorithm There is no need to confine ourselves to two shifts in the implicitly shifted QR algorithm. the QR algorithm can be adapted to compute the latter. We can also use more than two shifts. the generalized eigenvalue problem has the equivalents of a Hessenberg form and a Schur form. proposed by Bai and Demmel [9] and implemented in the LAPACK routine xHSEQR.

. Like the QR algorithm. We then turn to the details of the QZ algorithm. THE THEORETICAL BACKGROUND In this subsection we will develop the theory underlying the generalized eigenvalue problem.1. we only sketch the outlines. if (A. The reader is referred to the notes and references for more extensive treatments.1. B) ify ^ 0 and yHA = R Xy B.130 CHAPTER 2. the direction of x remains invariant under multiplication by A. B) (also written A .3. eigenpairs of the generalized problem are assumed to be right eigenpairs unless otherwise qualified. Ax = \Bx. the QZ algorithm begins by reducing A and B to a simplified form. called Hessenberg-triangular form. then A maps x into a multiple of itself—i. It is important to keep in mind that the geometry of generalized eigenvectors differs from that of ordinary eigenvectors. if . This difference has important consequences. x) is an EIGENPAIR or RIGHT EIGENPAIR of the PENCIL (A. 4. The pair (A. This elementary fact will be useful in understanding the material to follow. the direction of x is not necessarily preserved under multiplication by A or B. Definitions We begin with the definition of generalized eigenvalues and eigenvectors. 2. For reasons of space.e. If (A. Let A and B be of order n. y) is a LEFT EIGENPAIR of the pencil (A. For example. Definition 4. it is possible for any number A to be an eigenvalue of the pencil A .\B) if 1.\B. The pair (A. x ^ 0. the same. THE QR ALGORITHM We begin in the next subsection with the theoretical background. and then iterates to make both pairs triangular. As with the ordinary eigenvalue problem. On the other hand. We will treat the reduction to Hessenberg-triangular form in §4. the direction of both Ax and Bx are.2 and the QZ iteration in §4. The scalar A is called an EIGENVALUE of the pencil and the vector x is its EIGENVECTOR. In the first place if B is singular. Regular pencils and infinite eigenvalues An important difference between the generalized eigenvalue problem and the ordinary eigenvalue problem is that the matrix B in the former can be singular whereas the matrix / in the latter cannot. B). provided we agree that the direction of the zero vector matches that of any nonzero vector. or) is an eigenpair of the pencil (A. Instead. or) is an eigenpair of A. including Schur forms and perturbation theory.

It should be stressed that infinite eigenvalues are just as legitimate as their finite brethren. Now if A / 0 is an eigenvalue of a pencil (A. However. Now p(X) = det(A . whose proof is left as an exercise. if A and B have a common null vector x. B). If the pencil is regular.5).XB) = 0. However. Vol II: Eigensystems . then (A. B) by an EQUIVALENCE TRANSFORMATION. J?) for any A. In all these examples the determinant of A — XB is identically zero. then p(X) is not identically zero and hence has m zeros. the pencil must have an infinite eigenvalue corresponding to the vector x. A). among others. More generally. this can only occur if B is singular. we will appeal to a generalization of the Schur form (Theorem 4. To see this. We also say that the pair (A. They preserve the eigenvalues of a matrix and change the eigenvectors in a systematic manner. A matrix pencil (A. Ae2 = XBe%. Equivalence transformations We have already observed that the natural transformations for simplifying an ordinary eigenvalue problem are similarity transformations. B).XB) is not identically zero. Definition 4. x) is an eigenpair of (A. An even more general example is the pencil in which A and B have no null vectors. then Bx = 0 for some x ^ 0. in which case it appears that the pencil has no eigenvalues. infinite eigenvalues do present mathematical and notational problems. which are eigenvalues of (A. B) is regular.2. Definition 43. then /x = A"1 is an eigenvalue of (5. we say that(UH AV.SEC. B). Then the pencil (UHAV. For example. If B is singular. This suggests the following definition.XB) is a polynomial of degree m < n. what is the multiplicity of an infinite eigenvalue? To answer this question. It is possible for p(X) to be constant. THE GENERALIZED EIGENVALUE PROBLEM 131 then Ae2 = Be^ — 0. Let (A. right or left. B) is REGULAR if det( A . The possible existence of infinite eigenvalues distinguishes the generalized eigenvalue problem from the ordinary one. It is therefore natural to regard oo = 1/0 as an eigenvalue of (A. Alternatively. Hence for any X. A regular matrix pencil can have only a finite number of eigenvalues. and zero is an eigenvalue of (J5. Thus if p(X) is constant. The corresponding transformations for the generalized eigenvalue problem are called equivalence transformations. URBV) is obtained from (A. in common. The effect of a equivalence transformation on the eigenvalues and eigenvectors of a matrix pencil is described in the following theorem. But first we must describe the kind of transformations we will use to arrive at this generalized Schur form. note that Ax = XBx (x ^ 0) if and only if det(A . A) and vice versa. B). UEBV) is said to be EQUIVALENT to(A. B) be a matrix pencil and let U and V be nonsingular. 4.

and let (v V±) be unitary. Let u — Av/||Av||2 and let (u U_L) be unitary. Then The (2. Moreover. Let (A. l)-block of the matrix on the right is zero because by construction Av is orthogonal to the column space of £/j_. U~ly) are eigenpairs of (UEAV. THE QR ALGORITHM Theorem 4. like the Jordan form. In particular. • We now turn to four consequences of this theorem. at least one of the vectors Av and Bv must be nonzero — say Av. Theorem 4.XB) essentially unaltered. In fact. Since U and V are nonsingular. Specifically. Chapter 1). Fortunately. the equivalence transformation multiplies the p( A) by a nonzero constant.IfU and V are nonsingular. we can reduce the matrices of a pencil to triangular form. V~lx) and (A. B) normalized so that || v\\2 = 1. then (A. B) is regular. B) to triangular form. is not numerically well determined. . Equivalence transformations also leave the polynomial p(X) . we have the following theorem. the roots of p( A) and their multiplicity are preserved by equivalence transformations. Similarly.UHBV).6. Let v be an eigenvector of (A. Unfortunately. so can we simplify a pencil by equivalence transformations. and the transformations that produce it can be ill conditioned (see §1. Let (A.det(A . For example. Proof. x) and (A. Since the (A. Then there are unitary matrices U andV such thatS — UHAV andT = UHBV are upper triangular. which are well conditioned. The proof is now completed by an inductive reduction of (A. Generalized Schur form Just as we can reduce a matrix to simpler forms by similarity transformations. if we restrict ourselves to unitary equivalences. if Bv ^ 0 then it must be proportional to Av. there is an analogue of the Jordan form for regular pencils called the Weierstrass form.B). this form.4.132 CHAPTER 2. B) be a regular pencil.5 (Generalized Schur form). y) be left and right eigenpairs of the regular pencil (A.

4. it must be an eigenvalue of (A. An eigenvalue of algebraic multiplicity one is called a SIMPLE EIGENVALUE.SEC. In either case we can find an eigenvector corresponding to the second eigenvalue.5. I f m is the degree of the characteristic polynomial and m < n. If the second eigenvalue in the sequence is finite. Another consequence of the above argument is that we can choose the eigenvalues of (A. then we say that (A.6.2) and (4. the eigenvalues of (A. Corollary 4. if the second eigenvalue is infinite. Specifically. while the latter correspond to the infinite eigenvalues. Hence exactly m of the ba are nonzero and n—m are zero. This argument justifies the following definitions. B). It also suggests how to put finite and infinite eigenvalues on the same notational footing. we have from (4. The former correspond to the finite eigenvalues of (A. Then is the CHARACTERISTIC POLYNOMIAL of (A. Then where \u>\ = 1 (a. the degree of p^A^ would be too large. B) be a regular pencil and let (A. We say that the ALGEBRAIC MULTIPLICITY of a finite eigenvalue of (A. B) can be made to appear in any order on the diagonals of(S. i. Definition 4. Thus we have established the following corollary. B must be singular. On the other hand.A B) = 0. B) is its multiplicity as a zero of the characteristic equation. we can choose any eigenvalue.3) that where \u\ — 1. to be in the first position.. B).e. B) be a generalized Schur form of (A.7. After the first step of the reduction. T). Projective representation of eigenvalues The Schur form has enabled us to solve the mathematical problem of defining the multiplicity of generalized eigenvalues—finite or infinite.1)]. B). B) to appear in any order in a generalized Schur form. VolII: Eigensystems . Let (A. is the product of the determinants of the unitary transformations that produce the generalized Schur form). finite or infinite. for otherwise. Thus any generalized Schur form will exhibit the same number of finite and infinite eigenvalues. Now the degree m of p is invariant under equivalence transformations [see (4. B) has an INFINITE EIGENVALUE OF ALGEBRAIC MULTIPLICITY n-m. In Theorem 4. THE GENERALIZED EIGENVALUE PROBLEM Multiplicity of eigenvalues 133 Let (A. then it must satisfy det(A . B).B)bea regular pencil of order n.

Otherwise the eigenvalue is infinite. This is equivalent to writing the generalized eigenvalue problem in the form Unfortunately. fin).8. .. fin) represents the same eigenvalue. we can reduce the shifted problem to the ordinary eigenvalue problem (B + wA)~lAx = \LX. Generalized shifting Let (-4. whose eigenpairs are (A/(l + wA). B) are determined by the pairs (a. Theorem 4. A = eta/Pa is a finite eigenvalue of A. B) be a regular pencil reduced to Schur form. An ordinary eigenvalue A can be written in the form (A. B) if and only if is an eigenpair of (C. D). as the following theorem shows. We can shift A and B simultaneously. Let Then ((a. Thus when B is nonsingular. Let (A. B) be a pencil with B nonsingular. this representation is not unique.-.0) is an infinite eigenvalue.1} is a zero eigenvalue and (1. /%. We solve this problem by introducing the set We will call this set the projective representation of the eigenvalue. x). The reduction is not possible when B is singular.is nonzero. We can erase the distinction between the two kinds eigenvalues by representing them by the pairs (o^-.134 CHAPTER 2. since any nonzero multiple of the pair (an. In this nomenclature (0.B}bea regular pencil and let be nonsingular.. and it has computational difficulties when B is ill conditioned. THE QR ALGORITHM Let (A. Then the eigenvalues of (A. Then the equation Ax = \Bx can be written in the form B~lAx — \x. If we can determine w so that B + wA is well conditioned. x) is an eigenpairof(A.-). these eigenpairs have the same multiplicity. we can reduce the generalized eigenvalue problem to an ordinary eigenvalue problem—a computationally appealing procedure. Moreover. 1). To remedy this difficulty we can work with the shifted pencil Ax = jj. /?).(B + wA)x. If /?t-. We are not restricted to shifting B alone.

6) = (a. B) is in generalized Schur form with the pair (a.9.<$).ei) is an eigenpair of (C. Consequently.0). B) then by the nonsingularity of W the transformed eigenvalue appears exactly m times on the diagonal of (C. THE GENERALIZED EIGENVALUE PROBLEM 135 Proof. as the following theorem shows. Likewise. Proof. B) has the form Then x is a nonzero multiple of ei.5) we may pass to the Schur form and assume that the pencil (A.SEC. To establish (4. 4. Chapter 1]. The converse is shown by using the nonsingularity of W to pass from (C. Now the first component of y is nonzero. The left and right eigenvectors of a simple eigenvalue of a pencil can be orthogonal. Then (C. B). D} is also in Schur form with (7. For otherwise the vector y consisting of the last n-\ components of y would be a left eigenvector of the pencil (A. For example. Let (a. D). ft} be a simple eigenvalue of the regular pencil (A.8) that Vol II: Eigensystems .Nonetheless. Since W is nonsingular (7. (3)W in the initial position. ft}. Assume without loss of generality that (A. D).D) back to (A. Ifx and y are right and left eigenvectors corresponding to (a. • Left and right eigenvectors of a simple eigenvalue We have seen that if A is a simple eigenvalue of a matrix A then its left and right eigenvectors cannot be orthogonal [see (3. Hence t/H£ / 0. and it follows from (4. ((7. This allows us to calculate the eigenvalue in the form of a Rayleigh quotient yHAx/yHx. there is an generalized form of the Rayleigh quotient. B). The statement about multiplicities follows from the fact that if an eigenvalue appears m times on the diagonal of (A. then Consequently. the right and left eigenvectors of the pencil corresponding to the eigenvalue 1 are ei and 62. so that the vector ei is the corresponding eigenvector. ft) in the initial position. Theorem 4.12). B). ft). contradicting the simplicity of (a.6) ^ (0.

We also wish to know how close the perturbed eigenpair is to the original as a function of e. B) that converges to the eigenpair ((a. y^Bx ^ 0. ft). Again. x) as e —» 0.136 CHAPTER 2. since yH A is proportional to y^B.5) and the regularity of the pencil. /?}. ft). F) on the eigenvector x.x) of (A. The implication (4. we can assume that B is nonsingular. THE QR ALGORITHM To establish (4. Chapter 1.13. B). we drop the Schur form and suppose that yH Ax = 0. Since Ax is proportional to Bx. Then by (4. • Since a — yHAx and ft = y^Bx.7) is established similarly. Let us agree to measure the size of this perturbation by We wish to know if there is an eigenpair ({a. Perturbation theory We now turn to perturbation theory for the generalized eigenvalue problem. we can write the ordinary form of the eigenvalue as This expression is a natural generalization of the Rayleigh quotient. Let be the perturbed pencil. By shifting the pencil.6).x) of a regular pencil (A. and specifically to the perturbation of a simple eigenpair ((a. we must have ?/H A — 0. we have for the left eigenvector y From the perturbation theory of inverses and Theorem 3. we know that there exist perturbed eigenvectors x and y satisfying . we must have Ax = 0. since B plays the role of the identity in the Rayleigh quotient for the ordinary eigenvalue problem. This theorem will also prove important a little later when we consider the perturbation of eigenvalues of matrix pencils. Hence the problem can be reduced to the ordinary eigenvalue problem Similarly. We begin with a qualitative assessment of the effects of the perturbation (E.

Theorem 4.B). Then by (4. THE GENERALIZED EIGENVALUE PROBLEM If we normalize x.SEC.10)]. 4. (3) + O(e) is an abbreviation for the more proper (a + 0(e). The key to the proof lies in writing the perturbed eigenvectors x and y in appropriate forms. we must have 7 —>• 1.6) we have u^x ^ 0. Hence we can write x = -yx + Uc for some 7 and c. Now since x —>• x [see (4. not both yH A and y^B can be zero. Then Proof. Let x and y be simple eigenvectors of the regular pencil (A. we may write Similarly we may write Now Vol II: Eigensystems . /? -f O(e)). This implies that if U is an orthonormal basis for the orthogonal complement of u then the matrix (x U)is nonsingular. y so that 137 then (4.9) implies that For the eigenvalue we have from (4. We will use the notation established above. First-order expansions: Eigenvalues We now turn to first-order expansions of a simple eigenvalue.] Thus the projective representation of the eigenvalues converges at least linearly in e.10. Hence renormalizing x by dividing by 7 and setting e = f/c/7.5) that [Here the notation (a. Since the pencil is regular. x. y. (3} = (y^Ax. For defmiteness assume that UH = t/ H A / 0.y^Bx} be the corresponding eigenvalue. and let (a.

Chapter 1. a + c] and \J3 — e. It is therefore natural to measure the distance between two eigenvalues (a. then . CHAPTER 2.11) can be written in the form Hence if A is finite.4) we see that the projective representation (a. /3) is the one-dimensional subspace of C 2 spanned by the vector (a ft) . • Since ||a. • If/? and S are nonzero and we set A = a/ft and p = 7/8. We can convert these bounds into bounds on the ordinary form A of the eigenvalue. This generalizes the Rayleigh-quotient representation of A in (3. ft) and (7. • The function x is a metric. Two observations on this theorem. Definition 4. we have by the Cauchy inequality rp Hence. The chordal metric From (4. it is definite and satisfies the triangle inequality. However.11) says that up to second-order terms a and ft lie in the intervals [a . S) is the number There are several comments to be made about this definition and its consequences. We therefore turn to a different way of measuring the accuracy of generalized eigenvalues.||2 = \\y\\2 = 1.e. S) by the sine of the angle between them.21). ft) and (7. Denoting this angle by 0. the process is tedious and does not give much insight into what makes an eigenvalue ill conditioned.138 Similarly. ft + €\. that is. after some manipulation This justifies the following definition. THE QR ALGORITHM The expression (4. The CHORDAL DISTANCE between (a.11. equation (4.

Vol II: Eigensystems . the chordal metric is not invariant under such shifts. /u). and the chordal distance between them remains unchanged. the angle between two projective points is preserved. the eigenvalue (a. By a limiting argument. we can write Thus the chordal distance includes the point at infinity. Specifically. we have from which we get But the numerator of this expression is Hence We summarize our results in the following theorem. Hence for eigenvalues that are not too large. 4. x(0.SEC.8. In particular. But if W is a nonzero multiple of a unitary matrix. THE GENERALIZED EIGENVALUE PROBLEM 139 Thus x defines a distance between numbers in the complex plane. ft)W). The condition of an eigenvalue Turning now to our first-order expansion. By abuse of notation we will denote this distance by x( A. oo) = 1. In general. in which we recapitulate a bit. • In Theorem 4. • The name chordal distance comes from the fact that x( A. the chordal metric behaves like the ordinary distance between two points in the complex plane. we saw how to shift a pencil and its eigenvalues by a nonsingular matrix W of order two. //) is half the length of the chord between the projections of A and ^ onto the Riemann sphere. ft) was shifted into {(a.

12.A + E and B = B + F. THE QR ALGORITHM Theorem 4. • The bound does not reduce to the ordinary first-order bound when B . y). we may have j3 = 0. Hence. although A moves infinitely far in the complex plane. it remains in a small neighborhood of infinity on the Riemann sphere. then v is not large and the eigenvalue is well conditioned. if one of a or j3 is large. and set Then fore sufficiently small.B) and let x and y be its right and left eigenvectors.I and F = 0. In this case. Let the projective representation of A be (a. . B) satisfying where There are five comments to be made on this theorem. Let Xbea simple eigenvalue (possibly infinite) of the pencil (A. This agrees with common-sense reasoning from the first-order perturbation expansion If both a and {3 are small. However. J3) is ill determined. then |A| < 1 and ||ar||2|b||2 = sec Z(x. which means that the subspace (a. if we normalize A so that \\A\\2 = 1 and normalize x and y so that yHx — 1. By (4.15) it is independent of the scaling of x and y. where Let A .140 CHAPTER 2. • The number v mediates the effects of the error £ on the perturbation of the eigenvalue and is therefore a condition number for the problem. there is an eigenvalue A the pencil (A. they are sensitive to the perturbation E and F. • If ||ar||2 = \\y\\2 = 1» the condition number v is large precisely when a and ft are both small./?). In particular. • On the other hand. so that the eigenvalue A is infinite. Such are the regularizing properties of the chordal metric.

141 Hence. as the following example shows. Perturbation expansions: Eigenvectors To derive a perturbation expansion for a simple eigenvector x of regular eigenpair (A. ill-conditioned eigenvalues can restrict the range of perturbations for which the first-order expansion of a well-conditioned eigenvalue remains valid.2)].16). B)(x X) must be of the form (4.SEC.13. let (x X) and (u U) be unitary matrices such that The fact that we can choose U so UHAx = UHBx = 0 is a consequence of the fact that x is an eigenvector. • Unfortunately. It is obvious from the example that only a very special perturbation can undo a well-conditioned eigenvalue. bumping into other eigenvalues and destroying their accuracy. then (u U) (A. if such a U exists. this problem should not be taken too seriously. 4. for the case \\A\\2 = 1 the results of ordinary perturbation theory and our generalized theory agree up to constants near one. by (4. such as rounding or measurement error. Conversely. Consider the pencil where e is small. 10 . In particular. Such perturbations are unlikely to arise from essentially random sources. If we perturb it to get the pencil then Thus the eigenvalues of the perturbed pencil are 1 ± ^/e. The example shows that perturbations can cause an ill-conditioned eigenvalue to wander around. However. Specifically. IT Vol II: Eigensystems . Example 4. we appeal to the first step of the reduction to Schur form [see (4.14). THE GENERALIZED EIGENVALUE PROBLEM Moreover. B). and x must be an eigenvector. ife — 10 the "well-conditioned"eigenvalue 1 has changed by 10~5.

We seek the perturbed eigenvector in the form x — (x -f Xp). the factor (/+<?<?H)~ 2. contradicting the simplicity of the eigenvalue (a. Hence on taking norms we get: In the above notation suppose that and set Then . B) = (A+E. |ft\} — 1. Moreover. We also seek the corresponding matrix U in the form U = (U + uq^)(I + <jr# H )~ 2. sinceX and U are orthonormal. Similarly. up to second-order terms sin Z(z. For x to be an eigenvector of (A. B -f F). THE QR ALGORITHM Now let (A.18) to a first order-bound.142 CHAPTER 2. For otherwise (a. where q is also to be determined. differs from the identity by second-order terms in g. ||XH#||2 < 1. But the last equality following from (4. In particular. x) = ||J^Hx||2. we must have /7H Ax = U^Bx — 0. B}./3) would be an eigenvalue of (A. /?). and subtracting to get Now the matrix /3A . note that we can multiply (4. J7H B x = UHFx+(3q+Bp. B). by Lemma 3. Since we know that x converges to x as e = \j \\E\\p + \\F\\p —*• 0. Hence p and q must satisfy the equations We may eliminate the auxiliary vector q from these equations by multiplying the first by /3 and the second by a. Finally. where p is to be determined.aB is nonsingular. Thus where The condition of an eigenvector To convert (4.12. and we will drop it in what follows. which makes f/ orthonormal. we may assume that p and q are small.17) by a constant to make max{ | a |.16). Chapter 1.

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM

143

The number <$[(a, /?), (A, B}] is analogous to the number sep(A, M) for the ordinary eigenvalue problem [see (3.16), Chapter 1]. It is nonzero if {a, /?) is simple, and it approaches zero as (a, /3} approaches an eigenvalue of (A, B). Most important, as (4.19) shows, its reciprocal is a condition number for the eigenvector corresponding to(a,/3).

4.2. REAL SCHUR AND HESSENBERG-TRIANGULAR FORMS
We now turn to the computational aspects of the generalized eigenvalue problem. As indicated in the introduction to this section, the object of our computation is a Schur form, which we will approach via an intermediate Hessenberg-triangular form. The purpose of this subsection is to describe the real generalized Schur form that is our computational goal and to present the reduction to Hessenberg-triangular form. Generalized real Schur form Let (A, B} be a real, regular pencil. Then (A, B) can be reduced to the equivalent of the real Schur form (Theorem 3.1); i.e., there is an orthogonal equivalence transformation that reduces the pencil to a quasi-triangular form in which the 2x2 blocks contain complex conjugate eigenvalues. Specifically, we have the following theorem. Theorem 4.14. Let (A, B) be a real regular pencil Then there are orthogonal matrices U and V such that

and

The diagonal blocks ofS and T are of order either one or two. The pencils corresponding to the blocks of order one contain the real eigenvalues of(A,B). The pencils corresponding to the blocks of order two contain a pair of complex conjugate eigenvalues of(A,B). The blocks can be made to appear in any order. The proof of this theorem is along the lines of the reduction to generalized Schur form, the chief difference being that we use two-dimensional subspaces to deflate the complex eigenvalues. Rather than present a detailed description of this deflation, we sketch the procedure. Vol II: Eigensystems

144

CHAPTER 2. THE QR ALGORITHM
1. Using Definition 4.6 and Theorem 4.8, argue that complex eigenvalues of a real pencil occur in conjugate pairs and that the real and imaginary parts of their eigenvectors are independent. 2. Let x = y + iz be an eigenvector corresponding to a complex eigenvalue and let X = (y z). Show that for some matrix L, we have AX — BXL. 3. Let(F V±) be orthogonal with K(V} = n(X). Let (U UL} be orthogonal withft(i7) = K(AV) U U(BV).
4. show that

where S and T are 2x2. By repeated applications of this deflation of complex eigenvalues along with the ordinary deflation of real eigenvalues, we can prove Theorem 4.14. Hessenberg-triangular form The first step in the solution of the ordinary eigenvalue problem by the QR algorithm is a reduction of the matrix in question to Hessenberg form. In this section, we describe a reduction of a pencil (A, B) to a form in which A is upper Hessenberg and B is triangular. Note that if A is upper Hessenberg and B is lower triangular, then AB~l is upper Hessenberg. Thus the reduction to Hessenberg-triangular form corresponds to the the reduction to Hessenberg form of the matrix AB~l. The process begins by determining an orthogonal matrix Q such that Q^B is upper triangular. The matrix Q can be determined as a product of Householder transformations (Algorithm 1:4.1.2). The transformation QT is also applied to A. We will now use plane rotations to reduce A to Hessenberg form while preserving the upper triangularity of B. The reduction proceeds by columns. Figure 4.1 illustrates the reduction of the first column. Zeros are introduced in A beginning at the bottom of the first column. The elimination of the (5, l)-element of A by premultiplication by the rotation $45 in the (4,5)-plane introduces a nonzero into the (5,4)-element of B. This nonzero element is then annihilated by postmultiplication by a rotation Z$± in the (5,4)-plane. The annihilation of the (4,1)- and (3, l)-elements of A proceeds similarly. If this process is repeated on columns 2 through n-2 of the pencil, the result is Algorithm 4.1. Here are some comments. • We have implemented the algorithm for real matrices. However, with minor modifications it works for complex matrices. • The computation of Q is special. Since Q is to contain the transpose of the accumulated transformations, we must postmultiply by the transformations, even though they premultiply A and B.

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM

145

Figure 4.1: Reduction to Hessenberg-triangularform

VolII: Eigensystems

146

CHAPTER 2. THE QR ALGORITHM

This algorithm takes a matrix pencil of order n and reduces it to Hessenberg-triangular form by orthogonal equivalences. The left transformations are accumulated in the matrix Q, the right transformations in the matrix Z.

Algorithm 4.1: Reduction to Hessenberg-triangularform

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM

147

• The operation count for the reduction of B is 2 |n3 flam. The reduction of A takes irot. Thus for real matrices, the algorithm requires

• If only eigenvalues of the pencil are desired, we need not accumulate the transformations. • The algorithm is backward stable in the usual sense.

4.3. THE DOUBLY SHIFTED QZ ALGORITHM
Like the QR algorithm, the doubly shifted QZ algorithm is an iterative reduction of a real Hessenberg-triangular pencil to real generalized Schur form. Since many of the details — such as back searching—are similar to those in the QR algorithm, we will not attempt to assemble the pieces into a single algorithm. Instead we begin with the general strategy of the QZ algorithm and then treat the details. Overview Let (A, B) be in Hessenberg-triangular form and assume that B is nonsingular. Then the matrix C — AB~l is Hessenberg and is a candidate for an implicit double QR step. The idea underlying the QZ algorithm is to mimic this double QR step by manipulations involving,only A and B. In outline the algorithm goes as follows. [Before proceeding, the reader may want to review the outline (3.6) of the doubly shifted QR algorithm.] 1. Let H be the Householder transformation that starts a double shift QR step on the matrix AB~l. 2. Determine orthogonal matrices Q and Z with Qei — GI such that (A, B} = Q^(HA, HB)Z is in Hessenberg-triangular form. To see why this algorithm can be expected to work, let C = AB~l. Then

Moreover, since Qei = e i»the first column of HQ is the same as that of H. It follows that C is the result of performing an implicit double QR step on C. In consequence, we can expect at least one of the subdiagonal elements c njn _! and c n _ 1?n _ 2 to converge to zero. But because (A, B) is in Hessenberg-triangular form,

Hence if frn-i,n-i and 6 n _2, n -2 do not approach zero, at least one of the subdiagonal elements a n>n _i and a n -i,n-2 must approach zero. On the other hand, if either 6 n _i ?n _i or 6n_2,n-2 approaches zero, the process converges to an infinite eigenvalue, which can be deflated. Vol II: Eigensystems

148

CHAPTER 2. THE QR ALGORITHM

If repeated QZ steps force a n)n _i to zero, then the problem deflates with a real eigenvalue. If, on the other hand, a n _ 1>n _ 2 goes to zero, a 2x2 block, which may contain real or complex eigenvalues, is isolated. In either case, the iteration can be continued with a smaller matrix. Getting started We now turn to the details of the QZ algorithm. The first item is to compute the Householder transformation to start the QZ step. In principle, this is just a matter of computing the appropriate elements of C and using Algorithm 3.1. In practice, the formulas can be simplified. In particular, if we set m = n—1, the three components of the vector generating the Householder transformation are given by

In EISPACK and LAPACK these formulas are further simplified to reduce the number of divisions. The formulas are not well defined if any of the diagonal elements 611, 622* bmrn, or bmm is zero. For the case of zero &n, the problem can be deflated (see p. 153). If the others diagonals are too small, they are replaced by eM||.Z?||, where B is a suitable norm. The above formulas do not use the Francis double shift computed from the matrix C. Instead they shift by the eigenvalues of the the trailing 2x2 pencil of (A, B): namely,

This approach is simpler and is just as effective. The QZ step Having now determined a 3x3 Householder transformation to start the QZ step, we apply it to the pencil (.4, B) and reduce the result back to Hessenberg-triangularform. As usual we will illustrate the process using Wilkinson diagrams. Our original pencil

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM
has the form

149

If we premultiply this pencil by the Householder transformation computed above, we get a pencil of the form

The next step reduces B back to triangular form. First postmultiply by a Householder transformation to annihilate the elements with superscripts 1. We follow with a plane rotation that annihilates the element with superscript 2. This produces a pencil of the form

We now premultiply by a Householder transformation to annihilate the elements of A with superscripts 1. This results in a pencil of the form

Once again we reduce B back to triangular form using a Householder transformation and a plane rotation:

Vol II: Eigensystems

150

CHAPTER 2. THE QR ALGORITHM

When we repeat the procedure this time, we need only use a plane rotation to zero out the (5,3)-element of A. The result is a matrix that is almost in Hessenberg-triangular form:

A postmultiplication by a plane rotation annihilates the (5,5)-element of B without affecting the Hessenberg form of A. Since the QZ algorithm, like the QR algorithm, works with increasingly smaller matrices as the problem deflates, the sweep will be applied only to the working matrix. Algorithm 4.2 implements a sweep between rows il and i2. In studying this algorithm, keep in mind that k—1 points to the column of A from which the current Householder transformation is obtained (except for the first, which is provided by the input to the algorithm). Also note that the Householder transformation that postmultiplies (A, B) reduces the three elements of B to a multiple of ej —not ei as housegen assumes. We therefore use the cross matrix

in statements 10 and 12 to reverse the order of the components. There are four comments to be made about this algorithm. • An operation count shows that the sweep requires

Thus a QZ sweep is a little over twice as expensive as a doubly shifted QR sweep. • The extra rotations in (4.21) are due to the necessity of eliminating 62 in (4.20). An alternative is to premultiply by a Householder transformation to eliminate the elements 61 in the pencil

SEC. 4. THE GENERALIZED EIGENVALUE PROBLEM

151

Given a real pencil (A, B) in Hessenberg-triangularform, and a vector u that generates a starting Householder transformation, qz2step implements a QZ sweep between rows il and /2. 1. <7z2ste/?(A, £, tt, i7, i2, Q, Z) 2. for jfc = il to /2-2 3. if(k^il) 4. fa?Msegew(egen A[fc:fc+2, fc—1], w, z/) 5. A[Ar,fe-lJ=i/;A[Jb+l:fc+2,Jt-l] = 0 6. end if
7.

8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.

VT = uT*B[k:k+2,k:n]; B[k:k+2,k:n] = B[k:k+2,k:n] - UVT v = Q[:, Ar:fc+2]*w; Q[:, k:k+2] = Q[:, k:k+2] - VUT housegen(B[k+2,k:k+2]*f, tiT, i/) ,B[Ar+2,A;+2] = i/;5[Jfe+2,A?:fe+l] = 0 WT = wT*f v = jB[l:A?-fl, k:k+2]*u; B[l:k+l, k:k+2] = B[l:k+l, k:k+2] - vu1 Jbk = min{A?+3,/2} t; = A[l:Jbk,fe:A;+2]*w;A[likk, k:k+2] = A[l:kk,fc:fc+2]- W1 v = Z[:, A;:A;+2]*w; Z[:,fc:A;+2]= Z[:, A;:fc+2]-vttT ro/^/i(5[A:+l,A;+l],k+1], 5[fc+l,Ar],c,5) rotapp(c, s, B[l:k, k+I], B[l:k, k}) rotapp(c, s, A[l:kk, jfc+l], A[l:Jbfe, fc]) rotapp(c,s,Z[:,k+l],Z[:,k]) end for A; rotgen(A[i2-l,i2-2], A[i2,i2-2], c, s) rotapp(c, s, A[/2-l, i2-l:z2], (A[i2,i2-l:i2]) rotapp(c, s, 5[i2-l, /2-l:i2], (J?[i2, i2-l:i2]) w/flpp(c» 5' Qt,*2-l], Qt,*2]) rotgen(B[i2,i2], B[i2,i2-l]1c,s) rotapp(c, s, 5[l:i2-l,i2], 5[l:/2-l,i2-l]) rotapp(c, s, A[l:i2, i'2], A[l:/2, i2-l]) rotapp(c, s, £[:, i2], Z[:, i2-l]) end qz2step Algorithm 4.2: The doubly shifted QZ step o

VT = u T *A[fc:fc+2, Jk:n]; A[fc:fc+2, fc:n] = A[fc:fc+2, fc:n] - «VT

Vol II: Eigensystems

cn so that all the elements riCj\aij\ and TiCj\bij\ of DRADC and DRBDC are nearly equal. • If only eigenvalues are desired. If we take that value to be one.B[il : i2. rn and c\. and eigenvectors.and GJ. Thus we should like to find diagonal matrices DR and Dc with positive diagonals so that the nonzero elements of of DRADC and DRBDC are as nearly equal in magnitude as possible.. infinite eigenvalues.il : i2]. rn and c\. THE QR ALGORITHM This gives an operation count of which is exactly twice the count for a double QR sweep. back searching. the sweep can be performed only on the subpencil (A[il : i2.. For more see the notes and references...1S2 CHAPTER 2. The purpose of this subsection is to fill the gaps—at least to the extent they can be filled in a book of this scope.il : z'2j) and the transformations need not be accumulated. We will treat the following topics: balancing. then our problem is to choose positive numbers 7*1. 4..22) is nonlinear in the unknowns rt. we end up with the problem . . . Balancing The natural class of transformations for balancing the pencil (A. . If we write then we wish to choose positive numbers ri. since multiplying the pencil (A. . This problem is not well determined.. if we set then the problem is equivalent to finding quantities p\ and 7^ so that If we solve this problem in the least squares sense. B) are the diagonal equivalences. • The algorithm is stable in the usual sense. 2x2 problems..4... IMPORTANT MISCELLANEA Anyone attempting to code a QZ-style algorithm from the description in the last two sections will quickly find how sketchy it is. B) by a scalar does not change the balance.. cn so that The problem (4.. However.. But we can make it well determined by specifying a value around which the scaled elements must cluster.

4. it can be solved by an iterative technique called the method of conjugate gradients.6. il] and A[i2—1. Back searching The QZ step implemented in Algorithm 4.12] have been deemed negligible. The procedure sketched above is not invariant under change of scale of A and B — i. The integers il and 12 are determined by a back-searching algorithm analogous to Algorithm 2. balanced norm. in the course of the QZ iteration. The following is a simple and efficient deflation procedure. Infinite eigenvalues If. There are many ways of treating infinite eigenvalues. the pencil has an infinite eigenvalue. The usual procedure is to assume that the problem is reasonably balanced and declare an element aij negligible if in some convenient norm.23) are very simple.2 is performed between rows il and 12 of A and B. rB) as a and r vary. Although the system is too large to solve directly. where the elements A[/7-l.. it is recommended that A and B be normalized so that their norms are one in some easily calculated. Consequently.SEC. the normal equations for (4. We can postmultiply this pencil by a rotation in the (4. and the method converges so swiftly that the determination of the scaling factors is also an 0(n) process (remember that we do not have to determine the scaling factors very accurately). The result is the pencil VolII: Eigensy stems .e. Consider the Wilkinson diagram in which the fourth diagonal element of B is zero. A similar criterion is used for the elements of B. it will produce different results for the pair (ovl. a negligible element occurs on the diagonal of B. Each iteration requires 0(n) operations. This scaling also helps in controling underflow and overflow in the QZ step.3)-plane to annihilate the hatted element in B. The chief problem is to decide when an element is negligible. THE GENERALIZED EIGENVALUE PROBLEM 153 Without going into details.

.and premultiplication gives Finally premultiplication by a rotation in the (1.4)-plane to get One more step of post.154 CHAPTER 2. THE QR ALGORITHM We may now premultiply a rotation in the (4. the infinite eigenvalue has been deflated from the problem.2)-plane gives As the lines in the above diagram show.2)-plane to get and then premultiplying by a rotation in the (3.5)-plane to return A to Hessenberg form: We can now repeat the process by postmultiplying by a rotation in the (3.

However. this algorithm contains no significant numerical pitfalls.1868]. 4. who established the equivalent of the Jordan form for regular matrix pencils.4)-element as described above. 1890] extended the result to rectangular pencils. In that case. Specifically. Unlike the generalized eigenproblem of order two.269. it turns out. it may have real eigenvalues. The 2x2 problem When the QZ iteration deflates a 2x2 pencil. This. The details are too involved to give here.4)-plane to annihilate ci.1874] later gave a new proof for singular pencils. Eigenvectors Eigenvectors can be computed by a back substitution algorithm analogous to the ones for the ordinary eigenvalue problem. consider the matrix If we apply a plane rotation in the (5. is not an easy problem when the effects of rounding error are taken into account. and once again we refer the reader to the EISPACK or LAPACK code.141.4)-element to zero and deflate the zero (4. so that the real eigenvalues appear explicitly on the diagonal of the Schur form.4)-element becomes Consequently. NOTES AND REFERENCES The algebraic background The generalized eigenvalue problem (regarded as a pair of bilinear forms) goes back at least to Weierstrass [293. The generalized Schur form is due to Stewart [251]. This amounts to computing the Schur form of a pencil of order two. its derivation is lengthy. THE GENERALIZED EIGENVALUE PROBLEM 155 We can also deflate an infinite eigenvalue if there are two small consecutive diagonal elements. if ei€2/c is negligible. 4. 281. Kronecker [152.304].5. The use of the chordal metric in generalized Vol II: Eigensystems . and we only warn the reader not to go it alone.SEC. we can set the (5. Copy the EISPACK or LAPACK code. Perturbation theory For a general treatment of the perturbation theory of the generalized eigenvalue problem along with a bibliography see [269]. then the (5. For modem treatments see [80. Jordan [137. we will naturally want to deflate it further.

However. The QZ algorithm The QZ algorithm. both the reduction to Hessenberg-triangular form and the steps of the QZ algorithm move zero elements on the diagonal of B toward the top. . of course.156 CHAPTER 2. The double shift strategy. THE QR ALGORITHM eigenvalue problems is due to Stewart [255]. This option has been incorporated into the EISPACK and LAPACK codes. Ward [286] and Watkins [291] consider the problem of infinite eigenvalues. Moreover. is due to Moler and Stewart [178]. who builds on earlier work of Curtis and Reid [50]. Watkins shows that the the presence of small or zero elements on the diagonal of B does not interfere with the transmission of the shift. works when the eigenvalues of the trailing pencil are real. who is happy to acknowledge a broad hint from W. Ward [286] has observed that the algorithm performs better if real shifts are treated by a single shift strategy. Kahan. The balancing algorithm is due to Ward [287]. including the preliminary reduction to Hessenberg-triangular form. Thus an alternative to deflating infinite eigenvalues from the middle of the pencil is to wait until they reach the top. In particular. For a chordless perturbation theory and backward error bounds see [112]. where they can be deflated by a single rotation. For the treatment of 2x2 blocks see the EISPACK code qzval or the LAPACK codeSLAG2.

In the first section we will show how to adapt the QR algorithm to solve the symmetric eigenvalue problem. the QR algorithm for symmetric matrices has an analogue for the singular value problem. In the ordinary eigenvalue problem. where A is symmetric and B is positive definite. we treat the symmetric positive definite generalized eigenvalue problem Ax = XBx. The procedure begins as usual with a reduction to a simpler form—in this case tridiagonal form—followed by a QR iteration on the reduced form.3 THE SYMMETRIC EIGENVALUE PROBLEM We have seen that symmetric (or Hermitian) matrices have special mathematical properties. the Cholesky algorithm for triangularizing a symmetric matrix requires half the operations that are required by its nonsymmetric counterpart.1 and the tridiagonal QR iteration itself in §1. In practice.2. in a brief coda.2. This problem has appealing analogies to the ordinary symmetric eigenproblem. Thus. and it is better to solve the singular value problem on its own terms.1). Fortunately. their eigenvalues are real and their eigenvectors can be chosen to form an orthonormal basis for Rn. a divide-and-conquer algorithm. It is included in this chapter because the squares of the singular values of a real matrix X are the eigenvalues of the symmetric matrix A = X^X and the eigenvectors of A are the left singular vectors of X. real and complex matrices are treated in es157 . They also have special computational properties. The first part of this chapter is devoted to efficient algorithms for the symmetric eigenvalue problem. but it is mathematically and numerically less tractable. we can frequently use the symmetry of the problem to reduce the operation count in certain algorithms. the algorithms of the first section of this chapter can be used to solve the singular value problem. We next turn to the singular value decomposition. In particular. Gaussian elimination (see Algorithm 1:2. in principle. We will treat this reduction in §1. Because of their symmetry they can be stored in about half the memory required for a general matrix. this approach has numerical difficulties (see §3.2). For example. In §2 we will consider some alternative methods for the symmetric eigenvalue problem: a method for updating the symmetric eigendecomposition. and methods for finding eigenvalues and eigenvectors of band matrices. Moreover. Finally.

the complex case differs from the real case only in the fine details. But first we must consider how symmetric matrices may be economically represented on a computer. Because orthogonal similarities preserve symmetry. Second. REDUCTION TO TRIDIAGONAL FORM If a symmetric matrix A is reduced to Hessenberg form by orthogonal similarity transformations. Thus the algorithm computes the spectral decomposition where V is an orthogonal matrix whose columns are eigenvectors of A and A is a diagonal matrix whose diagonal elements are the corresponding eigenvalues. the columns of the accumulated transformations are the eigenvectors of A. For the symmetric eigenvalue problem. Finally.158 CHAPTER 3. the Hessenberg form becomes a tridiagonal form [see (1. THE SYMMETRIC EIGENVALUE PROBLEM sentially different ways. For this reason we will assume in this chapter that: A and B are real symnetric matrices of order n. Hermitian. Hence: Unless otherwise stated a tridiagonal matrix is assumed to be symmetric or. and the definite generalized eigenvalue problem. The general outline of the procedure is the same as for the nonsymmetric eigenvalue problem—reduce the matrix to a compact form and perform QR steps. 1. THE QR ALGORITHM In this section we will show how the QR algorithm can be adapted to find the eigenvalues and eigenvectors of a symmetric matrix. all the elements below its first subdiagonal are zero and hence by symmetry so are all the elements above the first superdiagonal. the singular value problem. . if complex. The purpose of this subsection is to describe the details of the reduction of a symmetric matrix to tridiagonal form. 1. the final Schur form becomes diagonal.1)]. First. Thus the matrix assumes a tridiagonal form illustrated by the following Wilkinson diagram: In this volume almost all tridiagonal matrices will turn out to be symmetric or Hermitian.1. the process simplifies considerably. accumulating the transformations if eigenvectors are desired.

The following diagram illustrates a 4x4 symmetric matrix stored in an array a in column order. the upper half of a two-dimensional array of real floating-point numbers and code one's algorithms so that only those elements are altered. Because our algorithms are easier to understand in the natural representation. First. j] while leaving the rest of the array free to hold other quantities. First. The disadvantage is that the rest of the array is seldom used in practice because its n(n-l)/2 elements are insufficient to hold another symmetric or triangular matrix of order n. unlike the natural representation. 1. There are two ways of taking advantage of this fact to save storage. We partition A in the form and determine a Householder transformation H such that Ha^i = 76!. one can put the diagonal and superdiagonal elements of a symmetric matrix into a one-dimensional array — a device known as packed storage. This approach has the advantage that it permits the use of array references like A[z. a symmetric matrix is completely represented by the n(n+l)/2 elements on and above its diagonal. on nonoptimizing compilers it may run more efficiently. The two usual ways of doing this are to store them in column order and in row order. one can store the matrix in. The price to be paid for these advantages is that the resulting code is hard to read (although practice helps).SEC. Packed storage offers two advantages. Then Vol II: Eigensystems . we will assume in this volume that the symmetric matrix in question is contained in the upper part of a two-dimensional array. Second. THE QR ALGORITHM The storage of symmetric matrices 159 Since the elements a^ and ajz of a symmetric matrix are equal. since the code references only a one-dimensional array. say. Second. it uses all the storage allocated to the matrix. Reduction to tridiagonal form The algorithm for reduction to tridiagonal form is basically the same as the algorithm for reduction to Hessenberg form described beginning on page 83.

The computation of EA^E can be simplified by taking advantage of symmetry. THE SYMMETRIC EIGENVALUE PROBLEM This matrix has the Wilkinson diagram (n = 6) from which it is seen that the transformation has introduced zeros into the appropriate parts of A. we must decide how to represent the final tridiagonal matrix. since in our storage scheme traversing a row . The usual convention is to store the diagonal of the matrix in a onedimensional array.160 CHAPTER 3. say d. then This formula can be simplified by setting so that Before we can code the algorithm. • The computation of the product of A[k+1. Thus the final tridiagonal matrix has the form Algorithm 1. The reduction continues by recursively reducing EA^E to tridiagonal form.1 implements the tridiagonalization described above. Here are some comments. If we denote the vector defining E by r. say e. and the off-diagonal elements in another one-dimensional array. k+1] and the Householder generating vector must be done at the scalar level.

1.(w/2)*r[k+l:n] 18. 4. in the array e. The diagonal elements of the tridiagonal matrix are stored in the array d\ the offdiagonal elements. k+l:n] 31. e[n-l] = A[n-I. The transformations are accumulated in the array V.SEC.n] 26.fc+l:«]. s[k+l:n] = s[k+l:n] .k] = r[k+l:n] w =0 for i = fc+1 ton s[i] = 0 9. TriReduce reduces A to tridiagonal form. d. e. 12. V[fc+l:n. end for fc. r[k+l:n]. -f 5[z]*r[z] 16. 10. V r [:. F[l:n-2. end for j 23. tor j = k+l to i-l s[i] = s[j] + A [7. d(k) = A[k. 14. 11. end for i 17.r*vT 32. V[n-l:n. for A. f o r i = 1 to j 20. A [ i . end for k 34. end for i 22.s[i]*r[j] 21.n-l:n] = I 28. 7. end TriReduce Algorithm 1. 1. 13. A?+l:w] = V[A:+l:w. 15. 5. r = V[k+l:n1k] 30. rf[n-l] -1] = A[n-l. 8. d[n] = A[n. 2. n] 27.r[i]*s[j] .n-l:n] = 0.j]*r[. d*r[j] end for j for j = i to n sM = *[*] + 4[i. j ] = A[iJ] . V) for k ~ 1 to n-2 3.A ? ] = c Jb 33.1: Reduction to tridiagonal form o Vol II: Eigensystems . THE QR ALGORITHM 161 Given a real symmetric matrix A of order n. 24. VT = rTF[A. e[k+l]) V[k+l:n. 6.k+l:n].n-l] 25.+l:n. f o r j = Ar+1 to 7^ 19.k] housegen(A[k.] end for j iu = i/. = n-2 to 1 by-1 29. TriReduce(A.

Specifically: The operation count for Algorithm 1. This computation is neither column nor row oriented.162 CHAPTER 3. Fortunately.2.n].2) is done in statements 8-17.1 is flam for the reduction of A flam for the accumulation of V This should be compared with an operation count of |n3 flam for the reduction of A in Algorithm 2. In this form the array d will necessarily contain real numbers. This particular implementation uses the array containing the original matrix A. we can make the elements of e real by a sequence of diagonal similarity transformations. • As in Algorithm 2. and it is an interesting exercise to recast it so that it has uniform orientation. which is expensive (see the comment on page 91). • The computation of s.4. • The algorithm is cheaper than the reduction of a general matrix to Hessenberg form. • The algorithm is backward stable in the usual sense (see Definition 2. To keep the indexing consistent we store the Houselder vector in r[fc+l. destroying A in the process. The process is sufficiently well illustrated for the case n — 3. The computation of A^r requires two loops because only the upper part of the array A is referenced. • The first column of the matrix V is GI . the array e. Chapter 2.2. can contain complex numbers. This fact will prove important in deriving the implicit shift QR algorithm for symmetric tridiagonal matrices. THE SYMMETRIC EIGENVALUE PROBLEM of A must be done by proceeding down the corresponding column to the diagonal and then across the row. This means that subsequent manipulations of the tridiagonal form will use complex rotations. Making Hermitian tridiagonal matrices real Algorithm 1. The tridiagonal matrix T has the form . and r 1 A^r in (1.1 can easily be converted to compute the tridiagonal form of a complex Hermitian matrix. Chapter 2). • In spite of the fact that the final tridiagonal form requires only two one-dimensional arrays to store. Chapter 2. we economize by saving the generating vectors for the Householder transformations and then accumulating them in a separate loop. however. its computation requires the upper half of a two-dimensional array.

In this case we may take 1/1 = 1. THE SYMMETRIC TRIDIAGONAL QR ALGORITHM Since a tridiagonal matrix T is necessarily upper Hessenberg and the basic QR step preserves upper Hessenberg form. Algorithm 1. It is incomplete in an important respect. we will assume that the tridiagonal matrix T in question is represented by an array d containing the elements of the diagonal of T and an array e containing the elements of the superdiagonal [see (1. When this fact is taken into account the result is a sleek. As above. The array e is necessarily an array of type complex. Thus the QR algorithm preserves symmetric tridiagonal form. The details are left as an exercise. if T is symmetric and tridiagonal. Otherwise we take v\ — e-t l\e\ . 1. Let D\ = diag(l. and we do not need to do anything.SEC. where then Thus the original matrix has been reduced to real tridiagonal form. whereas at the end we require an array of real floating-point numbers. This problem can be overcome by adjusting the e^ (and the columns of V) as they are generated in the course of the reduction to tridiagonal form.2. v<£). Since the QR step also preserves symmetry.2 sketches this scheme. z/i. 1). 1. Then 163 If e\ — 0. Vol II: Eigensystems . it is real.3)]. THE QR ALGORITHM The first step is to make e\ real. speedy algorithm for computing the eigenvalues and optionally the eigenvectors of a symmetric tridiagonal matrix. 1. the result of performing a QR step on T is symmetric and tridiagonal. the result of performing a QR step on T is upper Hessenberg. so that Similarly if we define D-2 — diag(l.

1.2)-plane. Determine an orthogonal matrix Q whose first column is a multiple of ei such that f = QT-Fj2 APi2Q is tridiagonal. In broad outline the tridiagonal QR step goes as follows. the matrix P^APu has the form . THE SYMMETRIC EIGENVALUE PROBLEM This algorithm takes a Hermitian tridiagonal matrix represented by the arrays d and e and transforms it to a real tridiagonal matrix.6). if (& / n-1) 7. An implementation of this sketch goes as follows. end if 11. &+1] 10.164 CHAPTER 3.1)-element ofP-^T-K^iszero. shows that T is the matrix that would result from an explicit QR step with shift AC. 3. 5. The transformations are accumulated in a matrix V. end if 9. 2. the reasoning following (3. end for fc Algorithm 1. 2. Chapter 2. As with the implicit double shift. Ar+1] = */*F[l:n. for k = 1 ton-1 a = \ek\ if(a/0) v — ek/a ek = a 6. 1. Given a shift AC determine a plane rotation P12 such that the (2. 4. efc+i = £*efc+i 8. F[l:n. of which the latter is the most commonly implemented. Since PI 2 is a rotation in the (1.2: Hermitian tridiagonal to real tridiagonal form The implicitly shifted QR step The symmetric tridiagonal QR algorithm has explicitly and implicitly shifted forms.

SEC. Algorithm 1.3 implements a QR step between rows il and 12 of T. 1. take into account the fact that the problem may have deflated. As one enters the loop on i. The application of the rotations is tricky. as usual. THE QR ALGORITHM 165 Figure 1. The element g is an intermediate value of di after the rotation Pi-ij has been applied.1 illustrates the entire procedure. In the implementation of this algorithm we must.1: Implicit tridiagonal QR step We now choose a rotation P23 in the (2. the matrix has the form where a hat indicates a final value. we obtain a matrix of the form We see that / has been chased one row and column down the matrix.3)-plane to annihilate /. Consequently. This process can be repeated until the element / drops off the end. The first two statements of the loop complete the Vol II: Eigensystems . On applying the similarity transformation. Figure 1.

AC. c[t2-l] = ^ 18. 1. 5) 8. The transformations are accumulated in V.fi 9. b = c*e[i] 1. g = c*v—b 14. t]. v = (d[z+l] — w)*5 + 2*c*6 11. i'2. /o/flpp(c. 5. rotgen(g. triqr(d.K 3. /7.166 CHAPTER 3. d[i2] = d[i2] .p 17. p = 5*v d[i] = w+p 12. c.p = 0. F[:. V) g = d[i7] . 5 = 1. 13. w = d[i] — p 10. e. end for i 16. this algorithm performs one QR step between rows il and i"2. V[:. end triqr Algorithm 13: The symmetric tridiagonal QR step o .«+!]) 15. fori = i7toi2-l 5. 4. / = s*e[i] 6. if(tV/7)e[2-l] = ^. THE SYMMETRIC EIGENVALUE PROBLEM Given a symmetric tridiagonal matrix whose diagonal is contained in the array d and whose superdiagonal is contained in e. /. c = l. 2.

In fact. say. The first is the Rayleigh quotient shift. which is stable in the usual sense. this is not true of Algorithm 1. Such "simplifications" based on exact mathematical identities can turn a stable algorithm into an unstable one.2. It is easy to see that if m = 12 — il.2.The second is the Wilkinson shift. It is worth noting that the derivation of this sequence requires the application of the identity c2 -f s2 = 1.3.. provided the Rayleigh quotient shift is used. dn is converging to a simple eigenvalue. then the transformation of T requires 6m fladd and 5m flmlt plus m calls to rotgen. statement 14 can be eliminated with great saving in computations.14). THE QR ALGORITHM application of P{-\^ to bring the matrix in the form 167 Finally the subsequent statements bring the matrix into the form which advances the loop by one step. if.+I.3.The Wilkinson shift is globally convergent and is generally the one that is used. then (2. If only eigenvalues are required. Fortunately. 1. implies that the values of e n _i converge cubically to zero.SEC. which is the eigenvalue of the matrix that is nearest c/. Specifically. because the multiple eigenvalues of a defective matrix VolII: Eigensystems . Choice of shift Two shifts are commonly used with Algorithm 1. Local convergence The local convergence analysis beginning on page 73 applies to the symmetric problem. Thus the computation of eigenvectors is the dominant part of the calculation. so that in the presence of rounding error the statements do not produce the same results as the direct application of the plane rotations PI. The accumulation of the transformations in V requires mn flrot. which is simply the diagonal element c/. Chapter 2.

The characteristic equation for T is where g = e2. the matrix has eigenvalues d and pd. Since A = pd. THE SYMMETRIC EIGENVALUE PROBLEM are nondefective the convergence is cubic even for multiple eigenvalues. Differentiating this equation implicitly with respect to g gives where \g is the derivative of A with respect to g. Thus we must consider what criterion we shall use to decide when a subdiagonal element of a tridiagonal matrix is effectively zero.168 CHAPTER 3. Chapter 2. We will now see how e perturbs the eigenvalue pd when e is small. We have already treated this problem in §2. Hence when e is small. The Wilkinson shift usually gives cubic convergence. where we proposed two possible solutions. This possibility is so remote that it is not considered a reason to prefer the Rayleigh quotient shift over the Wilkinson shift. although in theory the convergence can be quadratic. one ending with di and the other beginning with d{+i. When e is zero. the perturbation in the eigenvalue pd is approximately —e2/a. Deflation If ei is zero. the tridiagonal matrix T deflates into two tridiagonal matrices. Consider the matrix where p < 1 represents a grading ratio. we can set it to zero to deflate the problem. we must have or . we have the latter approximation holding when p is small.4. If et is negligible. one for balanced matrices and one for graded matrices. We will proceed informally by deriving a criterion for deflating a graded matrix of order two. Here we will propose yet another solution adapted to tridiagonal matrices. If we wish the relative error in pd to be less than the rounding unit.

which tends to preserve the smaller eigenvalues. On the other hand. which is especially valuable when only the eigenvalues of a symmetric tridiagonal matrix are needed. but it is too liberal for ordinary matrices. 1. called the QL algorithm. Graded matrices It is important to remember that when one is using the tridiagonal QR algorithm to solve graded problems. then the criterion is stricter than (1. If the grading is upward we can flip the matrix about its cross diagonal to reverse the grading. people have proposed many versions of the basic tridiagonal algorithm. See the notes and references. Of these variants perhaps the most important is the square root free PWK method.3 to realize that the formulas can be manipulated in many different ways. Not surprisingly.SEC. Wilkinson's Algebraic Eigenvalue Problem [300] contains much valuable material. the Hoffman-Wielandt theorem (Theorem 3. which performs well on matrices that are graded upward. Chapter 2].3. if the matrix is graded as in (1. as does the Handbook series of algorithms [305].10. Chapter 1) insures that the criterion (1. For more see the notes and references. Variations This brings to an end our discussion of the QR algorithm for symmetric matrices. there is a variant of the QR algorithm. which is required reading for anyone interested in the subject. THE QR ALGORITHM 169 Note that the quantity y |a||pa| is the geometric mean of the two diagonal elements of T. 1. each with their advantages—real or imagined. Vol II: Eigensystems . and the criterion reduces comparing et. We therefore replace it with Note that if di and di+i are of a size.4). then their geometric mean is about the same size. Alternatively. the grading should be downward [see (2.to the rounding unit scaled by the size of the local diagonals.6) always preserves the absolute accuracy of the eigenvalues. This suggests that we regard a subdiagonal of a graded tridiagonal matrix to be negligible if This criterion is effective for graded matrices. NOTES AND REFERENCES General references The title of this chapter is the same as that of Beresford Parlett's masterful survey of the field [206]. But it by no means exhausts the subject. Because ^f\d^di^\\ < ||T||F. One has only to write down a naive version of the tridiagonal QR step and compare it with Algorithm 1.36).5).

However. In 1968 Wilkinson [302] showed that the QR algorithm with Wilkinson shift (not his name for it) is globally convergent. He went on to find the eigenvalues of the tridiagonal matrix by Sturm sequences and the inverse power method (see §2.3). and it is the first off-diagonal element that converges rapidly to zero. Ch. He used plane rotations. which is based on the sequence defined by The QL step moves northwest from the bottom of the matrix to the top. 4] gives a proof of the convergence of the QR algorithm with Rayleigh quotient shift that he attributes to Kahan. For a detailed treatment of the algorithm see [206. An alternative is the QL algorithm. For a brief summary of the various implementations of the symmetric tridiagonal . and the method is called Givens' method. which in consequence are often called Givens rotations. Theorem 2. the QR algorithm will generally fail to compute the smaller eigenvalues accurately. We have derived the criterion (1. The latter automatically chooses to use either the QR or QL algorithm based on the relative sizes of the diagonals at the beginning and ends of the current blocks. Householder [123] observed that a general matrix could be reduced to Hessenberg form by Householder transformations and by implication that a symmetric matrix could be reducted to tridiagonal form.35]. Jiang and Zhang [135] have proposed a way of alternating between the Rayleigh quotient and Wilkinson shifts in such a way that the algorithm is globally and cubically convergent. THE SYMMETRIC EIGENVALUE PROBLEM Reduction to tridiagonal form Givens [83. although the method performs well in practice. The result is not quite global convergence. The actual algorithm given here is due to Wilkinson [297]. Ch. For an error analysis of Givens' method see [300. the criterion can be rigorously justified [59.1954] was the first to use orthogonal transformations to reduce a symmetric matrix to tridiagonal form. and this has been the preferred shift ever since. Perhaps the most complete set of options is given by the LAPACK codes xSYTRD and xSTEQR.170 CHAPTER 3.6. §5. 8]. For positive definite matrices. The tridiagonal QR algorithm Although Francis proposed his implicitly shifted QR algorithm for Hessenberg matrices. rather than top down as in Algorithm 1. Chapter 2. §5. If the tridiagonal matrix was obtained by Householder tridiagonalization. As mentioned in §2. For an error analysis of Householder's method see [300. that algorithm must also proceed from the bottom up.6) for setting an element to zero informally by considering a 2x2 matrix. the extension to symmetric tridiagonal matrices is obvious.1. The former reduces a symmetric matrix to tridiagonal form and can be coaxed into performing the reduction either top down or bottom up. Parlett [206.3].26]. If a symmetric tridiagonal matrix is graded upward.

which we already know. A CLUTCH OF ALGORITHMS 171 QR algorithm see [206.3 we will show how to compute selected eigenvalues and eigenvectors of symmetric band matrices.1 outlines the basic steps of the updating algorithm.14-15]. which also contains a complete exposition of the PWK (Pal-Walker-Kahan) algorithm.SEC. to simplify the computation—a process called updating. One way of solving this problem is to form A and use the QR algorithm to compute its eigensystem.1). and the central part of the computation—the diagonal updating problem—is useful in its own right. since a — ±1.2. it might be hoped that we can use the spectral decomposition (2. At each stage we underline the currently reduced matrix First. Chapter 1]. the Cholesky decomposition—updating can result in a striking reduction in computation. Let where a — ±1. In this section we will consider some important alternative algorithms. typically from 0(n3) to O(n2). However. This algorithm can in turn be used to implement a divide-and-conquer algorithm for the symmetric tridiagonal problem.. We will treat these steps each in turn. A CLUTCH OF ALGORITHMS The QR algorithm is not the only way to compute the eigensystem of a symmetric matrix. Nonetheless. For an implementation see the LAPACK routine xSTERF. Algorithm 2. RANK-ONE UPDATING In this subsection we will be concerned with the following problem. Unfortunately. Reduction to standard form We first turn to the problem of reducing A = A-fcrxx1 to the standard form D + zz1. 2.1. updating the spectral decomposition requires 0(n3) operations.g. §§8. in §2. we may write the updating problem in the form Vol II: Eigensystems . which is treated in §2. With some decompositions—e. Finally. 2.10). 2. the updating algorithm is less expensive than computing the decomposition from scratch. We will begin this section with the problem of computing the eigensystem of a diagonal matrix plus a rank-one modification. Let the symmetric matrix A have the spectral decomposition [see (1. We proceed in stages. Find the spectral decomposition A — VAV"T of A.

Deflate the problem to get rid of all very small components of z and to make the diagonal elements of D reasonably distinct. 5. 2. Reduce the problem to that of updating the eigensystem of D + zz . such that Sy is nonnegative. This algorithm sketches how to compute the spectral decomposition of A = A + axx^. 3. where <r = ±l.172 CHAPTER 3. THE SYMMETRIC EIGENVALUE PROBLEM Let A — FAFT be the spectral decomposition of the symmetric matrix A. which is diagonal.1: Rank-one update of the spectral decomposition This makes the sign of xx1 positive. where D is diagonal with nondecreasing diagonal elements. let 5 be a diagonal matrix whose elements are ±1. Fourth. Third.1). let P be a permutation such that the diagonals of cr_PT AP appear in nondecreasing order. Use the eigenvalues to compute the eigenvectors of D -f zz^. 4. Second. Then If we now set then Thus if we can compute the spectral decomposition . 1. This is done by solving a rational equation whose roots are the required eigenvalues. Algorithm 2. from the spectral decomposition (2. Transform the eigensystem of D + ZZT back to the original eigensystem of A. we have rp The matrix to be updated is now <jA. and z is a vector with nonnegative components. Then This makes the elements of the vector defining the update nonnegative. Find all the remaining eigenvalues of D + zz1'.

Vol II: Eigensystems . But still it is faster. The computation of ( V S ) P is a permutation of the columns of VS. • To complete the updating. When we speak of such permutations.A and then change the sign back at the end. 2. The computation of VS is simply a column scaling.SEC. A CLUTCH OF ALGORITHMS then with A = ah the decomposition 173 is the spectral decomposition of A. Any permutation of the diagonals must be reflected in a corresponding permutation of the columns of VS and the components of z. the diagonals of D. • We will have occasions to exchange diagonals of D later in the algorithm. Algorithmically. we will tacitly assume that VS and z have also been adjusted. Deflation: Small components of z If the ith component of z is zero. so that z has the form then D -f ZZL can be partitioned in the form In this case the element di is an eigenvalue of the updated matrix. By a permutation we can move this eigenvalue to the end of the matrix and bring it back at the end of the update. care should be taken to choose an efficient algorithm to sort the diagonals of D in order to avoid a lengthy series of column interchanges in the matrix VS. • The way a = ±1 keeps popping in and out of the formulas may be confusing. • In an actual implementation it may be better not to permute the columns of VS. • Unless the order n of the problem is small. Here are some comments on the reduction. we must compute VSPQ (this is essentially step five in Algorithm 2. and the components of z until the very end. about which more below. This has the consequence that our updating algorithm does not gain in order over the naive scheme of computing the decomposition of A from scratch.1). what the equations say is that if a — -1. update . The computation of (VSP)Q is a matrix-matrix multiplication and takes about n3 flam. Instead one might construct a data structure that keeps track of where these quantities should be and reference them through that structure.

. Thus a reasonable criterion is to require Z{ to be so small that where eM is the rounding unit. we can use a plane rotation to zero out one of the corresponding components of z.by zero is to replace D + zz1 by D + ZZT . Now by computing the Frobenius norm of E. setting Z{ to zero will move each eigenvalue by no more that \\D-\. Instead we will find that some component zt. By Theorem 3. and we must decide when we can regard it as zero.Unless the original matrix has special structure. we will seldom meet with components of z that are exactly zero. THE SYMMETRIC EIGENVALUE PROBLEM In practice. this is the kind of perturbation we should expect from rounding the elements of A.zz^ I^M. if this criterion is satisfied.is small. we get the bound Moreover we can estimate Hence our criterion becomes Here we have absorbed the constant \/2 into 7.8. The process is sufficiently well illustrated by the 2x2 problem Let c and s be the cosine and sine of a plane rotation that annihilates the component Z2 of z: . Chapter 1. thereby deflating the problem. Deflation: Nearly equal d{ It turns out that if two diagonals of D are sufficiently near one another.174 CHAPTER 3. Now in this case we can write Thus the effect of replacing <?.E. and 7 is some modest constant (more about that later).

the accuracy of our final results will deteriorate. Of course. As above. A strict rounding-error analysis suggests that 7 ought to grow with n. The conventional wisdom is that 7 should be a modest constant. Since 7 controls the stringency of the criterion. once a diagonal has been deflated. When to deflate There is an important tradeoff in our criteria for deflation. Saving accuracy at one stage is a poor bargain if it is lost at another. we deem this quantity sufficiently small if In the general case. this is too liberal because the rounding-error analysis is necessarily conservative. 2. The calculations of the eigenvalues and eigenvectors to be described later require that the Z{ be nonzero and that the d{ be strictly increasing.1) satisfy Vol II: Eigensystems . if (d^ — d\)cs is sufficiently small we can ignore it. If it is too stringent. because the diagonal elements of d are nondecreasing it is only necessary to test adjacent diagonal elements. Then 175 Consequently. A CLUTCH OF ALGORITHMS where z\ = y1' z\ + z\. and the problem deflates. we will seldom deflate and miss the substantial benefits of deflation. Theorem 2. the solutions AI < • • • < An of (2. if the criterion is too liberal. However. The secular equation We turn now to the computation of the eigenvalues of D + ZZH . Thus the deflation procedure can be seen as a preprocessing step to insure that the computation as a whole runs smoothly. the eigenvalues ofD + zz1 satisfy the SECULAR EQUATION Moreover. We begin with a theorem that shows where the eigenvalues lie and gives an equation they must satisfy. the above discussion suggests that it should have a reasonable large value. say 10. the diagonal that remains must be tested again against its new neighbor.SEC. If d\ < d% < • • • < dn and the components of z are nonzero. There is yet another tradeoff. On the other hand.1. This favors a liberal criterion over a conservative one.

excluding overflows and underflows. This can be done by .176 CHAPTER 3. Hence / has a zero AI in (d\.. /(A) becomes negative. • Solving the secular equation Theorem 2. while as A —»• oo.. As A approaches d\ from above. oo) for An]. as A approach dn from above.. The result now follows from the fact that the determinant of a matrix is the product of the eigenvalues of the matrix. Moreover. As A approaches di from below. THE SYMMETRIC EIGENVALUE PROBLEM Proof. But first a word on computer arithmetic.In the same way we can show that / has zeros \i £ (di. and here we will merely sketch the important features.1).8) shows that each root A z is simple and located in (d. The first step is to determine whether A^ is nearer <4 or dk+i.e^).7). The proof is completed by observing that the n zeros A. to determine the eigenvectors the eigenvalues must be calculated to high accuracy. the interlacing inequality (2. Let us consider the zeros of the latter. Hence / has a zero A n satisfying An > dn. This assumption is critical for the algorithms that follow. /(A) -*• 1. Thus our root-finding algorithm must be specifically designed to work with the secular equation. n . /(A) must become positive. The characteristic polynomial of D + zz^ is the last equality following from (2.. we assume that. Throughout these volumes we assume that floating-point computations are done in standard floating-point arithmetic.+i) [or in (dn. For definiteness we will consider the problem of computing A^.. d. where k < n.1 shows that we can find the updated eigenvalues A z of A (up to the sign a) by solving the secular equation (2. Finally. For more see the notes and references.9). To derive the secular equation we will employ the useful fact that This can be proved by observing that any vector orthogonal to v is an eigenvector of / — UVT corresponding to one. /(A) must become negative. while the vector v is an eigenvector corresponding to 1 — v*-u. Specifically. However. Thus in principle we could use an off-the-shelf root finder to compute the A. The detailed description of such an algorithm is beyond the scope of this book. Thus the characteristic polynomial is the product of Hi(^ ~ ^) anc* /(A) defined by (2. of/ are also zeros of the characteristic polynomial p and hence are the eigenvalues of D + zz^..7).di+i) (i — 1. each floating-point operation returns the correctly rounded value of the exact operation.

let and define g(r) = f(dk -f T). the most effective schemes are based on approximating g ( r ) by a rational function g ( r ) and finding the root of of g as the next approximation. If the value is negative. Because of the simplicity of the function g(r). A CLUTCH OF ALGORITHMS 111 evaluating f[(dk -f dk+i )/2]. The second step is to shift the secular equation so that the origin is d^. Afc is nearer di+i. A& is nearer di\ if it is positive. However. the functions r(r) and s(r) can be evaluated to high relative accuracy whenever r is between 0 and (cfor+i — cfo)/2. It can be shown that this criterion returns eigenvalues that are sufficiently accurate to be used in the computation of the eigenvectors. For definiteness we will suppose that \k is nearer d^. and hence the normalized eigenvector is given by VolII: Eigensystems . Finally. In principle any good zero-finding technique applied to g will suffice to find the correction r^. see the notes and references.D) lz. to which we now turn. 2. one must decide when the iteration for T> has converged. Then the corresponding eigenvector q must satisfy the equation (D + zz^}qj = \jq3 or equivalently Thus qj must lie along the the direction of (\jl .SEC. because g ( r ) has poles at zero and dk+i — dk. as it is calculated one can also calculate a bound on the error in its computed value. Specifically. Our stopping criterion is: Stop when the computed value of g(T) at the current iterate ios s than its error biound. For more. Computing eigenvectors Let A j be an eigenvalue of D+zz1. Moreover. Then This shift has the advantage that we are now solving for the correction T^ to dk that gives Afc.

Now the ith component of qj is given by where rjj is the normalizing constant. be far from orthogonal to one another. One cure for this problem is to compute the eigenvalues Xj to high precision (it can be shown that double precision is sufficient). we do not know the eigenvalues of D + zz1'.14) under the assumption that the required Zi exist.2. THE SYMMETRIC EIGENVALUE PROBLEM Unfortunately. as it will be when d{ and di+i are very near each other. then the eigenvalues of D — zz1 are AI. Setting A = c/j. It is based on the following theorem. However. Hence the z^ are well defined and given by (2. If we set z = (zi • • • zn) .178 CHAPTER 3. Theorem 2.10)]. Instead we have approximations Xj obtained by solving the secular equation. replacing Xj by Xj in (2. .12) will make a large perturbation in qij. In particular. there is an alternative that does not require extended precision. eigenvectors computed in this way may be inaccurate and. worse yet. and solving for zf.. . r-r\ Proof.. and then show that these values have the required property. Let the Quantities A. A n . . satisfy the interlacing conditions Then the quantities are well defined. we find that The interlacing inequalities imply that the right-hand side of this equation is positive. unless Xj is a very accurate approximation to Xj. We compute the characteristic polynomial of D + ZZT in two ways: [see (2. If Xj is very near di. We will first derive the formula (2.14).

e. Now in standard floating-point arithmetic. the matrix Q of these eigenvectors will be orthogonal to working accuracy. they must be identical. they must be within a small multiple of the rounding unit.SEC. As we shall see in a moment. after all. We summarize the computation of the eigenvectors in Algorithm 2. we discover that they are zeros of the second expression—i. with the appropriate signs. the vectors corresponding to poorly separated eigenvalues will be inaccurate. The inequality (2.. the Aj satisfy the secular equation for D + zz?'. • The computation of the Z{ requires approximately 2n2 flam and n2 fldiv. There remains the question of the stability of this algorithm. this does not imply that the computed eigenvectors are accurate.11) in finding the approximate eigenvalues A. and hence insignificant. For if Z{ and Z{ have opposite signs. • Equation (2. the Z{ can be computed to high relative accuracy. and. it can be shown that if we use the stopping criterion (2.12).16) also justifies our replacing sign(^) by sign(z. In particular if we evaluate both expressions at the AJ. Since both expressions are monic polynomials of degree n. Chapter 2]. Moreover. note that by the way the Z{ were defined the two expressions for the characteristic polynomial of D+z ZT in (2. • We have already noted that the algorithm is backward stable. 2. However. and then use them to compute the components of the eigenvectors from (2. Thus the computation VolII: Eigensy stems . we have not found the eigenvectors of D + zz1 but those of D -f ZZT. The computation of the vectors requires approximately n2 fladd and n2 fldiv for the computation of the components of Q and n2 flam and n2 fldiv for the normalization (the divisions in the normalization can be replaced by multiplications). In particular.13. since the right-hand side contains the factor sign(^).35). They will have small residuals [see (2. as we have pointed out.14) does not quite specify 5t-. However. Here are two comments on this program. Chapter 1. then where fi is a slowly growing function of n.).2. Thus the computation of the eigenvectors of D + ZZT is backward stable.. they will be orthogonal almost to working accuracy. suppose that we compute the zt. A CLUTCH OF ALGORITHMS 179 To show that the eigenvalues of D -f ZZT are the Aj. But by Theorem 3.from (2. in the expression the Aj are the exact eigenvalues of D + ZZT.15) are equal at the n distinct points dj. Hence the components QJ of the computed eigenvectors qj will be to high relative accuracy the components of the exact eigenvectors of D + ZZT . in our application it is sufficient to replace sign(ii) with sign(^). To apply the theorem.14).

flM:.di) end for j forj r j — i ton-1 7.di) end for j Zi — sqrt(zi)*sign(zi) end Compute the eigenvectors for j — 1 ton for i .di)/(dj . Zi = Zi*(Xj .2: Eigenvectors of a rank-one update o .j] = «M/IW[:. Compute the modified update vector 1. 6. Si = Zi*(Xj . Q[iJ] = 2i/&-di) end for i 15. for i = 1 ton 2.1 to n 13. 3.180 CHAPTER 3. 9. Z{ = A n — u{ for j = 1 to i—1 4. this algorithm returns approximations to the eigenvectors of D + ZZT in the array Q.di)/(dj+i .J]||2 16. 14. 5. 12.11)] to the eigenvalues of D + zz1 that satisfy the interlacing inequalities (2. THE SYMMETRIC EIGENVALUE PROBLEM Given sufficiently accurate approximations Xj [see (2. 11. 10. 8.13). end for j Algorithm 2.

Thus for n large enough the updating algorithm must be faster. 2. there is a second aspect. We will return to this point in the next subsection. at least for sufficiently large matrices. the deflation criteria—e. we can repeat this dividing procedure on PI and P2. We can estimate the cost of a divide-and-conquer algorithm by counting the costs at each level of the recurrence. Generalities The notion of a divide-and-conquer algorithm can be described abstractly as follows. it must be remembered that the diagonal updating is only a part of the larger Algorithm 2. which allows small but meaningful elements in a graded matrix to be declared negligible.SEC.6) — are couched in terms of the norms of D and z. The deflation also saves time. At the top level of the recurrence we solve one problem at a cost of C(n). Now if PI and P2 are problems of the same type as P. Extra storage is required for the eigenvectors of the diagonalized problem D + zz1'. particularly the expensive accumulation of the plane rotations generated while solving the tridiagonal system. In particular. Once the cost of computing VSPQ in (2. assume that n — 2m. We have seen that the O(n3} overhead in the QR algorithm is the accumulation of the transformation. The updating algorithm has only one 0(n3) overhead: a single matrix-matrix multiplication to update the eigenvectors V. For convenience. This recursion will stop after the size of the subproblems becomes so small that their solutions are trivial—after no more than about Iog2 n steps. At the third Vol II: Eigensystems . A DlVIDE-AND-CONQUER ALGORITHM In this subsection we will present a divide-and-conquer algorithm for solving the symmetric tridiagonal eigenvalue problem. Concluding comments Our sketch of the algorithm for spectral updating is far from a working program. and the routine to solve the secular equation is intricate. (2. However. 2. Let P(n) denote a computational problem of size n. A CLUTCH OF ALGORITHMS 181 of the eigenvectors is an O(n 2 ) process. we solve two problems each at a cost of C(|) for a total cost of 2(7(|). Thus the code for any implementation will be long and complicated compared with the code for forming A directly and applying the QR algorithm. and in some applications there are many deflations.. At the second level. The data structures to handle permutations and deflation are nontrivial. and suppose that we can divide P(n) into two problems Pi(f) and ^(f) °f size f in such a way that the solution of P can be recovered from those of PI and PI at a cost of C(n). We begin with some generalities about the divide-and-conquer approach. then on the subproblems of PI and PI. the count rises to O(n3). The algorithm does not work well for graded matrices.g. The algorithm is not without its drawbacks.2. However.3) is taken into account.1. and so on. Is it worth it? The answer is yes.

we get for the total cost of the divide-and-conquer algorithm Of course we must know the function C(n) to evaluate this expression. The idea behind our divide-and-conquer algorithm is to compute the eigensystems of TI and T<2 (which can be done recursively) and then combine them to give the eigensystem of T. let and .) We can also write this partition in the form Now let and so that TI is TI with the element ds replaced by ds — es and T-z is T% with the element ds+i replaced by ds+i — es. Summing up these costs. we solve four problems at a cost of C( J) for a total cost of 4C( J)—and so on. To show how to combine the eigensystems. Derivation of the algorithm Let T be a symmetric tridiaeonal matrix of order n. after we have derived our algorithm. and let T be partitioned in the form (Here s stands for splitting index. We will return to it later. Then we can write Thus we have expressed T as the sum of a block diagonal matrix whose blocks are tridiagonal and a matrix of rank one.182 CHAPTER 3. THE SYMMETRIC EIGENVALUE PROBLEM level.

Here are some comments. A CLUTCH OF ALGORITHMS be spectral decompositions of TI and T^. We are now in a position to apply the divide-and-conquer strategy to compute the eigensystem of a symmetric tridiagonal matrix. a direct. In practice. as we shall see. we abandon the divide-and-conquer approach and compute the decompositions directly via the QR algorithm. Algorithm 2. so that for small matrices the QR approach is more competitive. specified here by the parameter mindac.g. • We have already observed that the method for updating a spectral decomposition has a great deal of overhead. The code is not pretty. Consequently.3 inherits this drawback. Algorithm 2. 2. • The recursive approach taken here is natural for this algorithm. when the submatrices in the recursion fall below a certain size. bottom-up approach is possible. However. for large matrices the ill consequences of using the wrong value are not very great. Then one can work upward combining them in pairs.Then 183 The matrix D + zz1 is a rank-one modification of a diagonal matrix. as in (2.SEC. Vol II: Eigensystems . • We have noted that the updating algorithm diagupdate may not work well with graded matrices. This reduces the work from n3 flam to |n3 flam. we would represent T in some compact form — e. one might initially partition the matrix into blocks of size just less than mindac. • For ease of exposition we cast the algorithm in terms of full matrices.18). • Because the algorithm diagupdate is backward stable.3 implements this scheme.. Ideally. In LAPACK it is set at 25. It follows that is the spectral decomposition of T. • In statements 16 and 17 we have taken advantage of the fact that V is block diagonal in forming VQ. For example. the value of mindac should be the break-even point where the two algorithms require about the same time to execute. Actually. it can be shown that cuppen is also stable. whose spectral decomposition can be computed by the algorithm of the last section.

THE SYMMETRIC EIGENVALUE PROBLEM Given a symmetric tridiagonal matrix T. 1.Ai.Vi) 10.V2*Q[s+l:n. Cuppen(T. o.3+l] 12.:] = Vi*9[l:5. V[a+l:n. y[l:«. A. if (n < mindac) 4. T[s. s+1]. z. f2 2= T[s+l:n.:] 17. D = diag(Ai. diagupdate(D. end cuppen Algorithm 23: Divide-and-conquer eigensystem of a symmetric tridiagonal matrix o . n = the order of T 3.%>) 13. If the size of the matrix is less than the parameter mindac (min divide-and-conquer) the QR algorithm is used to compute its eigensystem. A. &!. A.s]-T[s.l] = r 2 [l. s= \n/1\ 7. z. T1[s.l:s] 8.A2) 14.s+l:n] 11.184 CHAPTER 3.l]--r[3. Q) 16. Cuppen(Ti.s+I] C 9.:] F2[1. Compute A and V using the QR algorithm 5. else 6. f 22 [l. ^ = (Vi[5. Q) that returns the spectral decomposition D + zz1 = QAQ^. fi = T[l:s.:])T 15. end if 19.:} 18. Cuppen computes its spectral decomposition VAVT. V) 2. Cuppen(f2.:] . It uses the routine diagupdate(D.s] = f[s.

the choice of the point at which one shifts to the QR algorithm (parameterized by mindaq} is not very critical.17)] for the total cost of Algorithm 2. the remaining work is insignificant compared to the work done by the higher stages. The second stage doe | of this work. In general after the &th stage only -^ o the total work remains to be done.. so that the divide-and-conquer approach becomes dramatically faster than the QR approach. Thus. But if it has not conquered. Of the total of f n3flamwork | (i.SEC. while the multiplication requires |n3 flam. A CLUTCH OF ALGORITHMS Complexity To apply the formula 185 [see (2. the algorithm benefits from deflations in diagupdate.3 requires about |n3 flam. and we may take It is now an easy matter to get a sharp upper bound on Ctotai. ^n3flam)is done in the veryfirstst the recursion. This has the consequence that for large matrices. It is still O(n 3 ). The order constant | is quite respectable — compara Gaussian elimination and far better than the QR algorithm. We see that our divide-and-conquer algorithm has not really conquered in the sense that it has reduced the order of the work. leaving \ of the work for the remaining stages. We have already noted that the updating requires only O(n2) operations. which effectively make part of Q diagonal. it has certainly prospered. Vol II: Eigensystems . the computation is dominated by the matrix multiplication. leaving only ^ of the total. So little work is being done at the lower stages that it does not matter if we use an inefficient algorithm there. we must first determine the function C(n). 2. Thus after a two or three stages. An examination of the algorithm shows two potentially significant sources of work: the call to diagupdate in statement 15 and the matrix multiplication in statements 16 and 17. With some classes of matrices the number of deflations grows with n.Specifically. Thus: Algorithm 2. This is essentially the cost of Cuppen exclusive of its cost of the recursive calls to itself.e.3. In addition to the low-order constant.

which is of order n. if n is large.3. their complexity puts them beyond the scope of this book. Fortunately. and transform back. cannot be stored. Thus we are forced to find the eigenvectors of A directly from A. in many applications involving band matrices we are interested in only a few eigenpairs. THE SYMMETRIC EIGENVALUE PROBLEM 2. In outline our algorithm goes as follows. and the reader is referred to the notes and references. . often a few of the largest or smallest. For example.186 CHAPTER 3. a symmetric pentadiagonal matrix can be represented by 3n-3 floating-point numbers. 1. In this case it turns out that there are better algorithms than the ones proposed here. 3. Definition 23. then A is a tridiagonal matrix. the transformations will require too much storage. How to compute small but well-determined eigenpairs of a general band matrix is at present an unsolved problem. Let A be symmetric of order n. 2. then A is pentadiagonal and has the form Band matrices are economical to store. find the eigenvectors of T. The computation of a selected set of eigenpairs of a large symmetric band matrix is the subject of this subsection. Unfortunately. This means that it is possible to store band matrices that are so large that the matrix of their eigenvectors. For the most part our algorithms will be cast in terms of absolute tolerances relative to the norm of A. If p = 2. A special case of our algorithm is when the matrix A is already tridiagonal. Another possibility is to store the transformations that reduce A to T. Then A is a BAND MATRIX of BAND WIDTH 2p + 1 if Thus if p = 1. Reduce A to tridiagonal form T Find the selected eigenvalues of T Find the selected eigenvectors by the inverse power method applied to A Note that we compute the eigenvalues from T but the eigenvectors from A. However. This means that small eigenvalues may be determined inaccurately. EIGENVALUES AND EIGENVECTORS OF BAND MATRICES We begin with a definition.

in practice we must come to grips with the fact that such storage is not economical. so that eventually one is working with a full dense matrix. since the corresponding upper and lower elements are not equal but are conjugates of one another. More generally if B is the array in which the matrix is stored.7)-elements. We will illustrate it for a matrix of band width seven (p = 3). this is not just a rearrangement of the same numbers. then The other scheme stores the elements according to the formula Thus one scheme stores the upper part of A. 2.SEC. We begin by choosing a rotation in the (3.4)-plane to annihilate the (4. The current standard is the LAPACK scheme — actually two schemes. If p <C n it would be a waste of memory to store it naturally in an array of order n. A CLUTCH OF ALGORITHMS The storage of band matrices 187 Although we will present our algorithms for band matrices as if they were stored in full arrays. whereas the other stores the lower part. Instead such a matrix is usually stored in an array just large enough to accommodate its nonzero elements.3). This step of the reduction is illustrated below: Vol II: Eigensystems . zeros and all.4)-element) of A. l)-element (and by symmetry the (1. The asterisks denote unused elements. it is easy to see that even if one takes into account the zero elements in the band matrix.1 for tridiagonalizing a dense matrix to the full band matrix. Here we consider another alternative that uses plane rotations and preserves the band size. The application of this rotation introduce new nonzeros outside the band in the (7. The first scheme stores a band matrix in a (p+l)xn array as illustrated below for a pentadiagonal matrix. the band width grows as the reduction proceeds. a symmetric band matrix of band width 2p-fl has approximately (p+l}n nonzero elements on and above its diagonal.and (3. Specifically. Tridiagonalization of a symmetric band matrix We will now show how to reduce a symmetric band matrix to tridiagonal form. When A is complex and Hermitian. There are many ways of doing this. However. One possibility is simply to apply Algorithm 1.

6). l)-element.1). These elements can be chased out of the matrix as illustrated below: .10)-elements. as illustrated below: Thus we have annihilated the (4. However.7)-plane to annihilate the elements outside the band.3)-plane annihilates this element.2). This places new nonzeros in the (10.and (6.and (6. We are now ready to work on the (3. when we eliminate this element. A rotation in the (2. no new elements are generated.2)-elements.4)-elements while preserving the band structure.and (1. THE SYMMETRIC EIGENVALUE PROBLEM We now choose a rotation in the (6. but introduces nonzeros in the (6.188 CHAPTER 3.

as opposed to Householder transformations. since the reduction generates 0(n 2 ) rotations we cannot afford to save or accumulate the transformations when n is very large. in this case E does not share the banded structure of A. and so on. This can be inferred from our example and easily confirmed in general. The basic tool is a method for counting how many eigenvalues lie Vol II: Eigensystems . Hence when the first row and column are being processed.1. we will begin by counting plane rotations. and so on. tips the scales against the band algorithm. Hence the total number of rotations for this algorithm is about Remembering that we only operate on the upper or lower part of the matrix. where ||J?||2/||j4||2 is of the order of £M.8. Similarly it requires (n-l)(p-l)/p rotations to process the second row and column. we see that each rotation is applied to 2(p + 1) pairs of elements (here we count the adjustment of the 2 x 2 block on the diagonal as two applications). (n—2)(p—l)/p to process the third. The reduction now continues by reducing the second row and column. Thus the total number of annlications of rotations to nairs of elements is We can convert this to multiplications and additions as usual. An important implication of these counts is that if we can afford n2 units of storage to accumulate transformations we are better off using the standard Algorithm 1. the total number of rotations is roughly n(p-l)/p. and hence the eigenvalues of A are moved by no more than quantities of order ||A||2£M (Theorem3. The key observation is that when an element is being chased out of the matrix it moves down the matrix in steps of length p. However. and we will not give it here. To get an operation count for this algorithm.SEC. Since we must zero out p— 1 elements in the first row and column. each zero introduced requires roughly n/p rotations to chase the element outside the band to the end of the matrix. The actual code is tedious and not very informative. then the third row and column. On the other hand. Inertia We now turn to the problem of computing some of the eigenvalues of a symmetric tridiagonal matrix. and the extra work in accumulating plane rotations. It is true that the reduction itself requires O(n3) operations as opposed to O(n 2 ) for the banded reduction. Chapter 1). A CLUTCH OF ALGORITHMS 189 Thus we have reduced the first row and column to tridiagonal form. The computed tridiagonal matrix is orthogonally similar to A + E. But for both algorithms the accumulation of the transformations requires O(n3) operations. But it is symmetric. 2. The algorithm is stable in the usual sense. Thus the reduction is truly an 0(n2) process.

Similarly. we begin with a constructive theorem for determining inertia. This suggests the following definition. By hypothesis. Then It follows that X H Z — {0}.e.190 CHAPTER 3. it is much easier to compute than the eigenvalues themselves.5 (Sylvester. The number of eigenvalues lying above the point A on the real line is the same as the number of eigenvalues of A . THE SYMMETRIC EIGENVALUE PROBLEM to the left of a point on the real line. and let Z = Uy. transformations of the form U^-AU. The proof is by contradiction. Let X be the space spanned by the eigenvectors corresponding to the positive eigenvalues of A. Definition 2. Thus to count the location of the eigenvalues of a symmetric matrix with respect to an arbitrary point. But the dimension of X is the number of positive eigenvalues of A and the dimension of Z is the number of nonpositive eigenvalues of UTAU. We begin with the mathematical foundations of this method. it is sufficient to be able to count their locations with respect to zero. The reason is that inertia is invariant under congruence transformations. Hence they have a nonzero vector in common—a contradiction. TT). i. Then Proof. Then inertia (A) is the triplet (f. which are more general than similarity transformations.XI lying above zero. Let A be symmetric. Then Now let y be the space spanned by the vectors corresponding to the nonpositive eigenvalues of U^AU. Jacobi). where v is the number of negative eigenvalues of A. Let A be symmetric. Let A be symmetric and U be nonsingular. we have the following theorem. • Inertia and the LU decomposition Turning now to the computation of inertia. Specifically. Suppose that A has more positive eigenvalues than U^AU.. Theorem 2. and TT is the number of positive eigenvalues of A. (. C is the number of zero eigenvalues of A. the sum of these dimensions is greater than the order of A. Although inertia is defined by the eigenvalues of a matrix. .4. we can show that the number of negative eigenvalues of A is the same as the number of negative eigenvalues of (7T AU and therefore that the number of zero eigenvalues is the same for both matrices.

Unfortunately.1. are just the pivot elements in the elimination). But if A is tridiagonal. Proof. then we can read off the inertia of A by looking at the diagonal elements of U. Let us see how. in fact. • It follows that if we have an LU decomposition of A.3). • • .(A) = inert!a(DU). A CLUTCH OF ALGORITHMS 191 Theorem 2. Gaussian elimination requires pivoting for stability. we will work with the matrix VolII: Eigensystems . in principle.w nn ). ^22. so that A — LDuL^.. Then there is a unique unit lower triangular matrix L and a unique upper triangular matrix U such that Moreover if DU = diag(wn.SEC. Now the LU decomposition of a matrix can be computed by Gaussian elimination (see Chapter 1:3). Since A is nonsingular. Hence A and DU are congruent and have the same inertia. then'meTtia. and the usual pivoting procedures do not result in a U-factor that has anything to do with inertia. Hence. Let A be symmetric and suppose that the leading principle submatrices of A are nonsingular. we can compute the diagonals of U in a stable manner. The inertia of a tridiagonal matrix Consider a symmetric tridiagonal matrix T. D^U — £T. written in the usual form: Since we will later be concerned with the inertia of T — XI as a function of A. and hence by the uniqueness of LU decompositions. 2. It is well known that if the leading principal submatrices of A are nonsingular then A has a unique factorization A = LU into a unit lower triangular matrix L and an upper triangular matrix U (Theorem 1:3. By symmetry (D^1 U)T(LDU) is an LU decomposition of A.6. the diagonal elements of U are nonzero. we can compute the inertia of A by performing Gaussian elimination and looking at the diagonal elements of U (which. Hence we can write A = LDu(DylU).

Algorithm 2. if(w>0) 8. end if end if The generation of w. u = umin 9.21). • The method is stable. We will return to this point later. • The algorithm requires 2n fladd n flmlt and n fldiv. u = d [ i ] . and we must specify what happens when ^^(A) is zero. it can be shown that the the computed sequence Wj(A) is an exact sequence of a perturbed matrix T(A) whose elements satisfy . if (u < 0) LeftCount . THE SYMMETRIC EIGENVALUE PROBLEM Given a tridiagonal matrix T in the form (2. end LeftCount Algorithm 2. LeftCount(d.(A) requires a division by w 4 -_i(A). 1.X + e[i-l]2/u 14. end for i 16. A) 2.4 counts the eigenvalues to the left of A.192 CHAPTER 3. e. Here are some comments. In particular. u = —umin 13. if (u < 0) LeftCount = LeftCount + 1 fi 5. for i:. else 10.A 4. = 2 to n 6. u = d[l] . it will pay to modify LeftCount to accept a precomputed array of the numbers t\. 12. LeftCount — 0 3. In applications where the count must be repeated for many different values of A.20) and a constant A the function LeftCount returns a count of the eigenvalues less than A.4: Counting left eigenvalues If we perform Gaussian elimination on this matrix we get the following recursion for the diagonals of the U-factor: 11.LeftCount + 1 fi 15. The choice of such a number is a matter of some delicacy and depends on how the matrix is scaled. It turns out that we can replace ui-i(X) by a suitably small number. if (|w| < umin) 7. The parameter umin > 0 is a minimum value on the absolute value of the current t*t-( A) defined by (2.

These results do not imply that unpivoted Gaussian elimination applied to a tridiagonal matrix is stable. The stability concerns only the w z . First. so that at the ith step we take This choice assumes that the matrix has been scaled so that ei/umin will not overflow. Consequently. We can do so by combining LeftCount with a technique called interval bisection. where ao and bo denote the initial values. after k iterations. A m G [a. Vol II: Eigensystems . Second.«o I jeps) iterations. we have isolated A m in an interval that is half the width of the original. b] halves at each step. if k < m. not the LU decomposition as a whole. Set c — (a + 6)/2 and k = LeftCount(c).Q\. Here are some comments. the difference between the current a and 6 is bounded by \b . 6].5 uses this procedure to find A m . This procedure is called interval bisection. • Except perhaps at the end. If we rename this interval [a.~k\bo — O. then Am > c. We begin by using Gerschgorin's theorem to calculate a lower bound a on \i and an upper bound b of A n . Consequently. then A m < c. 6]. • In replacing u with ±umin. In either case.-. If we demand that this bound be less that eps. 2. c]. A CLUTCH OF ALGORITHMS and 193 This result implies that the count will be correct unless A is extremely close to an eigenvalue of T — so close that in the presence of rounding error the two can scarcely be distinguished. A m £ [c. the size of the interval [a. Algorithm 2. we can repeat the procedure until the interval width is less than a prescribed tolerance. One possibility is to make the tolerance relative to e. There are two cases. if k > ra. Consequently.SEC.a\ < 2. then the method must converge in about Iog2( |&o . • The choice of umin depends on the scaling of T and how accurately we wish to determine the count. we choose the sign so that it is consistent with our previous adjustment of LeftCount Calculation of a selected eigenvalue Suppose that the eigenvalues of T are ordered so that and we wish to compute the eigenvalue A m . so that LeftCount(a) = 0 and LeftCount(b) — n.

12. 6. e. 9. Find upper and lower bounds on \m a = d(l)|e(l)|. 11. if(fc<m) a—c else b= c end if end while return (a+6)/2 end FindLambda Algorithm 2.5: Finding a specified eigenvalue o . 5. d(i) + \e(i-l)\ + \e(i)\] end for i a = min{a. c) 13. e.d(n) + |e(n—1))} Find \m by interval bisection while (\b-a\ > eps) c = (a+b)/2 if (c = a or c = 6) return c fi k = LeftCount(d.4). The function FindLambda finds an approximation to \m.194 CHAPTER 3.d(i] — e(i—l)\ — \e(i)\} b = max{6. THE SYMMETRIC EIGENVALUE PROBLEM Let the tridiagonal matrix T be represented in the form (2. 10. 18. 3. m) 2.20) and let the eigenvalues of T satisfy AI < A 2 < • • • < A n . The quantity eps is a generic convergence criterion. 7.e/(n) — \e(n—1))} 6 = max{6. 1. 14. 4. 17. It uses the function LeftCount (Algorithm 2.6 = d(l)+|c(l)| for i — 2 to n—1 a = imn{a. 19. 16. 15. FindLambda(d. 20. 8.

but because we are going to use inverse iteration to compute their corresponding eigenvectors. It can also be used to find all the eigenvalues in an interval. If eps is too small. The function FindLambda can obviously be used to find a sequence of several consecutive eigenvalues.. and in this case a choice like will result in low relative error for all nonzero eigenvalues. the product of the w's it generates is the value of the characteristic polynomial at c. For definiteness we will assume that these eigenvalues are and that they are approximated by the quantities We will make no assumptions about how the fjtk were obtained.g. Selected eigenvectors We now turn to the problem of computing the eigenvectors corresponding to a set of eigenvalues of a symmetric matrix A. one generates bounds on the other eigenvalues. the secant method. First the computation of the Gerschgorin bounds is repeated with each call to FindLambda.28). when more than one eigenvalue is to be found. Chapter 1]. c must be equal to one of a and 6. special code adapted to the task at hand should be used. The first thing we must decide is what we can actually compute. But this is inefficient for two reasons. Therefore. Second. Instead we will attempt to find an approximation Xk such that the normalized residual Vol II: Eigensystems . we will assume that they are accurate to almost full working precision. • If the eigenvalue being sought is simple. Although LeftCount does not produce a function whose zero is the desired eigenvalue. In that case. and we will not be able to calculate it accurately [see (3. bounds that can be used to speed up the bisection process.SEC. A CLUTCH OF ALGORITHMS 195 • Statement 11 takes care of an interesting anomaly. The computation of eigenvectors requires very accurate eigenvalues. it is possible to get endpoints a and b that differ only in a single bit. and without statement 11 the algorithm would cycle indefinitely. in the process of finding one eigenvalue. 2. If the eigenvalue Afc is poorly separated from the other eigenvalues of A. • The exact choice of eps depends on the accuracy required by the user. then the eigenvector corresponding to \k will be sensitive to perturbations in A. then eventually it will be isolated in the interval [a. At this point it may pay to shift over to an interation that converges more quickly—e. 6].

say near the rounding unit. then we can accept x^ — v/ \\v\\2 as a normalized eigenvector. Chapter 2]. then Thus if the quantity ||w||2/||A||2||v||2is small enough. THE SYMMETRIC EIGENVALUE PROBLEM is small.kI)v — u. if we take u normalized so that \\u\\2 = 1 as a starting vector for the inverse power step and let v be the solution of the equation (A — }J. There remains the question of whether the inverse power method is capable of producing a vector with a sufficiently small residual. and consequently we can estimate the bound (2. note that (2.kl)~lu. Once again. which says that xk is the exact eigenvector of A+E. where ||£||2/||A||2 is equal to the scaled residual norm.196 CHAPTER 3.23) that Thus the effectiveness of the inverse power method depends on the size of the component y^uofu along yk.3. Chapter 2.22) that Now suppose that ^ is a good relative approximation to A^. Then since |A^| < ||A||2.9). The justification for this criterion may be found in Theorem 1. we have It follows from (2.24) by Thus if n = 10. let us start with a vector u normalized so that 11 u \ \ 2 = 1. Ifu is chosen at random. However. see the discussion following the theorem. so that for some modest constant 7.000. A nice feature of this criterion is that it is easy to test. Let yk be the normalized eigenvector of A corresponding to \k.25) is likely to be an overestimate since it ignores the contribution of the components of u along the n—1 remaining eigenvectors of A. we can expect this component to be of the order l/^/n [see (1. Specifically. Then if v — (A — jj. For the implications of this result. . the estimate is only 100 times larger that 7€M. we have from (2.

these perturbations will move the computed eigenvectors by about 10~8. The cure for this problem is to force x2 to be orthogonal to x\. if ll^i \\2 = 1. so that x\ and x2 we orthogonal to working accuracy. The residual norms for the approximations xi and x2 are about 1-10~15 and 2-10~15. 2. Since the errors in the eigenvectors are independent of one another. it is easy to lose orthogonality when two of the \k are close. in our example. A CLUTCH OF ALGORITHMS 197 If it turns out that the scaled residual is not satisfactorily small. Since this u has had its components associated with the eigenvalues near ^ enhanced. where Thus A has two very close eigenvalues near one and an isolated eigenvalue at two. The eigenvalues AI and A 2 are very close—10~8 apart. the residual norm for x2 is 2-10~15. the vector is easily seen to have 2-norm one and to be orthogonal to x2. By (3. we would like our computed eigenvectors to be orthogonal. Since the residual norms are of order 10~15. For safety one might allow additional iterations.SEC. we can always try again with u = v/\\v\\2. Specifically.7. In addition to having small scaled residual norms.28). as the following example shows. there is a corresponding deterioration in the orthogonality between the two vectors. which is extremely good. we perform this orthogonalization. Let us approximate the first two eigenvalues by and perform inverse iteration with each Hk to approximate the corresponding eigenvectors. we can expect this second iteration to do the job. If. Chapter 1. A matrix A was constructed by generating a random orthogonal matrix Q and setting A = <2AQT. Example 2. A little perturbation theory will help us understand what is going on here. we find that x^x2 = 7-10~18. Unfortunately. so that VolII: Eigensystems . we find that The vectors x\ and x2 have lost half the orthogonality we expect from eigenvectors of a symmetric matrix. However. when we compute the inner product z T #2. the computed vectors are exact vectors of A perturbed by matrices of order 10~~15. Moreover.

198 CHAPTER 3. suppose that we have a group ^k-g+i •>• • • . with Xk to be determined.. it satisfies where E is of the order of the rounding unit. As we pass through a cluster we use the Gram-Schmidt algorithm to orthogonalize each vector against the previously computed vectors in the cluster.1. Xk-i. Now suppose that we have computed v by the inverse power method with shift //& applied to normalized vector u. Since the details of the orthogonalization are tricky (for the full story see §1:4. Now the approximate eigenvector Xk nroduced bv esreorthne satisfies Hence It follows that Hence . . Specifically. The orthogonalization process can be adapted to handle a sequence of close eigenvalues. the orthogonalization complicates our convergence criterion. the routine returns a normalized vector v that is orthogonal to the columns of X and satisfies Unfortunately. we leave the process to a routine that orthogonalizes a vector u against the columns of X returning the result in v (which can be identified with u).4). THE SYMMETRIC EIGENVALUE PROBLEM the quality of x2 as an eigenvector has not been compromised by the orthogonalization process. We will also assume that we have bounds on the residual norms of the or z . If a stable method is used to calculate v.. To see why.. Mfc of <? approximate eigenvalues and have determined normalized approximate eigenvectors x^-g+i. We divide the //& into clusters according to some tolerance.

+ lA*i-A«)bl)*l r i-Jb+ J | 22. w. 1. 15. 9. 7. for k = 1 to ra 5.1 fi end if it = 1 u — a vector of random normal deviates if (g > 1) gsr£0r?/i0g(J*C[:. 11.SEC. 2. v.k—g+l. rik=:'nk + (*7. 16. This algorithm returns in the columns of the array X approximations to the eigenvectors corresponding to the nk. 8.fc-(jr-f !. 2. 17. end while 26. 10. A CLUTCH OF ALGORITHMS 199 Let Pi < V2 < ' ' ' < Vm be a consecutive set of eigenvalues of a symmetric matrix A with relative accuracy approximating the rounding unit eM. end for k Algorithm 2.&-!]. 14. f = k-g+ltok-l k-1 20. u . groupsep= ||A||oo*10~3 tol = 100*5gr/(n)*||A||00*€M maxit — 5. if(Ar>l) 6. 3. w. 21. r. 18. end for z 23. 12. w. p) else u — u/\\u\\2 end if it = 1 while (frae) Solve the system (A—/*fc/)v = w if (p > 1) gsreorthog(X[:. // = it + 1 24. r.v/p end if rfk = \\u\\2 + ||v|| 2 *>/(n)*ll^ll<x>*€M 4.26)]. 19.6: Computation of selected eigenvectors o Vol II: Eigensystems .k—l].The algorithm uses the routine gsreorthog to orthogonalize a vector against a set of vectors [see (2. if (it > maxit) error fi 25. 13. />) else p = ||v||2. g = 1 if (fjtk <= /**-! + groupsep) g = g+1 else g .

• An absolute criterion is used to group eigenvalues for orthogonalization (see statements 1 and 6).. For more see §1:3. See the notes and references. and some form of pivoting is required for stability. Moreover.6 can be expected to work only because we have aimed low. Unfortunately. the convergence criterion. . with L of bandwidth p+l and U of bandwidth 2p+l. But the L and U factors it computes are banded. It will not work on graded matrices. • The starting vector for the inverse power method is a random normal vector. the choice of a starting vector.) • The while loop can be exited with a new vector or with an error return when the number of iterations exceeds maxit. It is intended for use on balanced matrices eigenvalues and uses absolute criteria to determine when quantities can be ignored (e. There is a certain common sense to our choices. • We have not specified how to solve the system in statement 15. the distance between successive groups can be small. (The routine gsreorthog insures that the vectors within a group are orthogonal to working accuracy. The matrix A — ^1 will in general be indefinite. THE SYMMETRIC EIGENVALUE PROBLEM is a bound on the residual norm for XkAlgorithm 2.F°r example. It is orthogonalized against the other members of the group to purge it of components that will contribute to loss of orthogonality. and it is not clear that it can be modified to do so. destroys band structure. if the criterion for belonging to a group is too small. but the same can be said of other choices. diagonal pivoting. resulting in loss of orthogonality between groups.g. Limited experiments show that the bound tracks the actual residual norm with reasonable fidelity. It should be stressed that Algorithm 2. There are several comments to be made about it. If we use too liberal a grouping criterion.g.2. when to orthogonalize.6 pulls all these threads together. • The algorithm uses the residual bound (2.. Even as it stands Algorithm 2.4.27) to test for convergence. Thus the storage requirements for the algorithm are four times the amount required to represent the original symmetric matrix. This method does not respect symmetry. On the other hand. the clustering criterion. Thus the natural choice is Gaussian elimination with partial pivoting. we will perform a lot of unnecessary orthogonalizations. which preserves symmetry.6 involves a number of ad hoc decisions—e. the algorithm is not a reliable way of computing a complete set of eigenvectors of a small perturbation of the identity matrix.200 CHAPTER 3. see statements 1 and 2). it can and does fail on groups of eigenvalues that are separated by up to 103cM.

posthumous]. Gu and Eisenstat [101] propose a different divide-and-conquer algorithm in which the critical step is computing the spectral decomposition of an arrowhead matrix — that is.4. An implementation of this method can be found in the LAPACK code xLAEDl. Gu and Eisenstat [100. 2. 1954]. beginning with a survey of eigenvalue updating problems by Golub [87.SEC.1994] noted that by recomputing the Z{ so that the A z are exact eigenvalues of the corresponding matrix the orthogonality could be maintained in the eigenvectors without resorting to higher precision arithmetic. Instead he used closely VolII: Eigensystems .1853] and independently by Jacobi [129. in unpublished communications. Sorensen and Tang [247. and Sorensen [30. who was the first to find eigenvalues of tridiagonal matrices by bisection.4 to count eigenvalues. For an error analysis see [13]. Kalian. Finally. Inertia and bisection The fact that the inertia of a matrix is preserved by congruence was discovered by Sylvester [272. NOTES AND REFERENCES 201 Updating the spectral decomposition This updating algorithm has a long history. The divide-and-conquer algorithm The divide-and-conquer algorithm is due to Cuppen [49].1978] formulated the algorithm treated here and noted that there were problems with maintaining orthogonality among the eigenvectors. 1973]. A CLUTCH OF ALGORITHMS 2. Givens [83. It was later modified by Dongarra and Sorensen [66]. Bunch. suggested that the algorithm can be made to work provided extended precision is used at certain critical points and noted that for IEEE standard arithmetic the extended precision arithmetic could be simulated. For an analysis of methods for solving the secular equation see [164].1857. did not use Algorithm 2. a matrix of the form Reduction to tridiagonal form The technique of reducing a band matrix to tridiagonal form is due to Schwarz [235]. The algorithm has been implemented in the LAPACK routine SLAEDO.1991] showed that more was required and gave an algorithm that worked with simulated extended precision on all IEEE compliant machines. Nielsen. The stopping criterion in the secular equation solver insures that the procedure is backward stable.

Eigenvectors by the inverse power method Algorithm 2. jk)-element of Ak+i = R^A^Rk is zero. Unfortunately. they are prone to overflow and underflow. THE SYMMETRIC EIGENVALUE PROBLEM related Sturm sequences. then the product pk( A) = wii • • • Ukk is the determinant of the leading principal minor of T — A/. Thus. The thesis also contains a review and critique of previous attempts to use the inverse power method to compute eigenvectors and an extensive bibliography. §2. §3. j'^-plane is determined so that the (»*. the p/t(A) change sign when the u^ do. The polynomials^(A) are called a Sturm sequence (see [122.4] for definitions and properties). If we let r| be the sum of squares of the diagonal elements of Ak and <r| be the sum of squares of the off-diagonal elements.3. For working code see the Handbook routine tristurm [305. In other words. If T — XI has the LU decomposition LU. Jacobi's method Jacobi's method for the solution of the symmetric eigenvalue problem uses rotations to annihilate off-diagonal elements of the matrix in question. Clearly. then it is easy to show that r| -» \\A\\p. after which the eigenvectors could be found by an iteration based on a perturbation expansion.6 is included to illustrate the problems of implementing a power method. a rotation Rk in the (ik.D. not as a complete algorithm in itself.1]. after which it could be readily solved by what is now called the point Jacobi iteration [285. Specifically. There are bisection codes in EISPACK and LAPACK. This program also saves information from previous counts to speed up the convergence of the bisection procedure.202 CHAPTER 3. given a current iterate A^ and a pivot element a\ '• . Jacobi [127.jk) so that o?^fc is maximal. n/18] or the LAPACK routine SSTEIN. Specifically. then and Hence. The Handbook program bisect [15] appears to be the first to use the diagonal elements of U to count eigenvalues. . In 1846 [128]. if we choose (ik. thesis Dhillon [65] has described an O(n2) to compute all the eigenvalues and eigenvectors of a tridiagonal matrix.1845]fr"8*proposed this method as a technique for reducing the normal equations from least squares problems to diagonally dominant form. In his Ph. Jacobi regarded the algorithm now named for him as a preprocessing step. although the authors justified their use by an appeal to Sturm sequences. he applied the same idea to reduce the symmetric eigenproblem to a diagonally dominant one. Ak approaches a diagonal matrix.

In §3. though it may be the algorithm of choice in special cases. This cyclic Jacobi method considerably complicates the convergence proofs.298].3 we will begin our development of the QR algorithm by showing how to reduce a general matrix to bidiagonal form. THE SINGULAR VALUE DECOMPOSITION 203 According to Goldstine and Horowitz [84].3 of this series. In §3.) But since then it has not fared well. THE SINGULAR VALUE DECOMPOSITION In this section we will treat methods for computing the singular values and vectors of a general nxp matrix X. has limitations that render it unsuitable as a general-purpose routine. if the matrix has no multiple eigenvalues. The subsequent literature was dominated by two aspects of the convergence of the method. For definiteness we will assume that n > p and that X is real. well-determined eigenvalues—at least when the matrix is positive definite. Jacobi's method works well with graded matrices and matrices with small. it was proposed to sweep through the elements of the matrix in a systematic order. 3.1. Second. For more on these topics see [75. the method was rediscovered by Goldstine. and at one time it and its variants were considered candidates for systolic arrays — a class of computers that never realized their promise. Vol II: Eigensystems . Like other algorithms that proceed by rotations. including the perturbation of singular values and vectors. a search for a maximal element is too expensive on a digital computer. (For an elegant implementation by Rutishauser— always a name to be reckoned with — see the Handbook code [226]. Jacobi's method was the algorithm of choice for dense symmetric matrix. The main purpose of this section is to describe this algorithm. there is an analogue of the QR algorithm to solve the singular value problem.SEC. after which the process is repeated with a smaller threshold. Until it was superseded by the QR algorithm. The singular value decomposition is closely allied to the spectral decomposition.107. Consequently. This method. The situation with multiple eigenvalues and clusters of eigenvalues is considerably more complicated. But no one can say for sure what the future holds for this bewitching algorithm. and many results will be presented without proof. its asymptotic convergence is at least quadratic. The QR algorithm itself is treated in §3.110.4. We will begin with the mathematical background. An interesting alternative to the cyclic method is the threshold method [212]. 3. In particular. BACKGROUND In this subsection we will present the background necessary for this chapter.2 we will show how the singular value decomposition can be reduced to the solution of a symmetric eigenvalue problem. in which all off-diagonal elements greater than a given threshold are annihilated until there are none left above the threshold. Much of this material has been treated in detail in §1:1.172]. First. It has great natural parallelism.171.4. For more on this topic see [59. 3. as we shall see. and Murray in 1949 (also see [85]). von Neumann.

Then there are orthogonal matrices U and V such that where with Here are some comments on this theorem.204 CHAPTER 3. We will use the notation &i(X) to denote the ith singular value of X. in which case maintaining the nxn matrix U can be burdensome. \\X\\2 = \\Z\\z. • It often happens that n > p. • The columns of are called left and right singular vectors ofX. where n > p. THE SYMMETRIC EIGENVALUE PROBLEM Existence and nomenclature The following theorem gives the particulars of the singular value decomposition.From this it is easily established that: . As an alternative we can set and write This form of the decomposition is sometimes called the singular value factorization. • The numbers <jj are called the singular values of X. They satisfy and Moreover.2) is conventional and will be assumed unless otherwise stated. Theorem 3. LetX£Rn*p. The nondecreasing order in (3. • Since the 2-norm is unitarily invariant.1.

it follows that This shows that: The squares of the singular values X are the eigenvalues of the CROSSPRODUCT MATRIX XTX. The right singular vectors corresponding to a multiple singular vector are not unique.SEC. 3.17)]. The right singular vectors are the eigenvectors of X^X. Relation to the spectral decomposition From (3. VolII: Eigensystems . the left singular vectors corresponding to nonzero singular values are uniquely determined by the relation Xv{ = OiU{. We will denote it by • Since the Frobenius norm is unitarily invariant. it follows that In other words: The square of the Frobenius norm of a matrix is the sum of squares of its singular values. An analogous result holds for XX1-. Sometimes they cannot be ignored [see (3. THE SINGULAR VALUE DECOMPOSITION The 2-norm of a matrix is its largest singular value. The singular values themselves are unique. Once the right singular vectors have been specified. but the space they span is. However. when n > p. the matrix XXT has n—p additional zero eigenvalues. and we will call them honorary singular values. The right singular vectors corresponding to a simple singular value (one distinct from the others) are unique up to a factor of absolute value one. Uniqueness The singular value decomposition is essentially unique.1) and the fact that U is orthogonal. From the definition of the 2-norm it follows that 205 • Later it will be convenient to have a notation for the smallest singular value of X.

However. Chapter 1.2. Let <TI ^ 0 be a simple singular value of the nxp matrix X with normalized left and right singular vectors u\ and v\.) Assuming that n > p. Let the singular values ofX be ordered in descending order: Then and An immediate consequence of this characterization is the following perturbation theorem for singular values.9. Theorem 33. like the eigenvalues of a symmetric matrix. (Here we have discarded the conventional downward ordering of the singular values. Chapter 1) has an analogue for singular values. let us partition the matrices of left and right singular vectors in the forms . this does not mean that perturbations of small singular values have high relative accuracy. THE SYMMETRIC EIGENVALUE PROBLEM Characterization and perturbation of singular values Given the relation of the singular value and spectral decompositions it is not surprising that the min-max characterization for eigenvalues (Theorem 3. See Example 3. Theorem 3. First-order perturbation expansions The perturbation of singular vectors is best approached via first-order perturbation theory.5. Let the singular values ofX be arranged in descending order: Let the singular values ofX = X + E also be ordered in descending order Then Thus singular values. are perfectly conditioned.206 CHAPTER 3.

where 0i is assumed to go to zero along with E.7) gives us VolII: Eigensystems . The singular vectors corresponding to a\ in the matrix U^XV are both ei (of appropriate dimensions). . THE SINGULAR VALUE DECOMPOSITION where U^ and Vi have p—l columns. Now let X — X + E.. their product must eventually become negligible. we have Since both f^2 and /i22 go to zero with E.SEC. and we can write From the third row of (3. Now consider the relation From the first row of this relation.gs.. and hi are also assumed to go to zero along with E. we get The second row of (3.7) we have Ignoring higher-order terms.. etc. eJ2 = u^EV2.where the g-2. Then 207 where £2 = diag(<j2. crp) contains the singular values of X other than a\. 3. as above. where E is presumed small. We will seek the singular value of UTXV in the form a\ + 0i. We will seek the corresponding left and right singular vectors of t/T^V in the forms (1 g^ p3 > ) T and(l /iJ)T. Then we can write where (p\ = u^Eui.

THE SYMMETRIC EIGENVALUE PROBLEM or This expression is not enough to determine gi and hi..U?XV2 = diag(<7 2 . Let a\.10) can be readily solved for gi and hi to give the approximations and A simple singular value and its normalized singular vectors are differentiable functions of the elements of its matrix. . and the products we have ignored in deriving our approximations are all 0(||^||2). and h^ are all 0(||E||).. Let the matrices of left and right singular vectors be partitioned in the form where Ui and V<2 have p— 1 columns and let S2 .. Let a\ > 0 be a simple singular value of X £ Rnxp (n > p) with normalized left and right singular vectors u\ and v\.But from the second row of the relation we get The approximations (3.4. If we transform these approximations back to the original system. g-z. Theorem 3. and vi be the corresponding quantities for the perturbed matrix X = X + E.9) and (3. Hence the quantities Q\.ui. Then and . <rp).208 CHAPTER 3. </3. we have the following theorem.

Chapter 1. and /2i. Because £2 is diagonal we may write it out component by component: Here 77.SEC. <£i2. • The hypothesis that 0\ be nonzero is necessary.It follows that Vol II: Eigensystems . • The expression (3. For more on the behavior of zero singular values.4 to derive a condition number for a singular vector.13)—or equivalently between v\ and v\ — we must obtain a bound on ||/&2 \\2Consider the approximation (3. we may work with the transformed matrices in (3.12) can be written in the form 209 In this form it is seen to be an analogue of the Rayleigh quotient for an eigenvalue (Definition 1. Chapter 2).6).5) and (3. and its absolute value is bounded by max{|</>2i|. see the notes and references. /i2. Now the normalized perturbed right singular vector is By Lemma 3.5. the singular value of a scalar is its absolute value. the fraction in this expression is a weighted average of <j>i2 and -<(>2i • Hence it must lie between fa and -<foi. The condition of singular vectors We can use the results of Theorem 3. 3.12.We can rewrite this equation in the form Since a\ and a<2 are positive. For example. THE SINGULAR VALUE DECOMPOSITION Here are two comments on this theorem. |0i2|}. We will begin with the right singular vector. and 02 is the corresponding diagonal element of £2. Since we will be concerned with angles between vectors and orthogonal transformations do not change angles.11) for h^. The problem is that singular values are not differentiable at zero. the sine of the angle between this vector and ei is Thus to get a bound on the angle between ei and the vector (3. and <foi represent corresponding components of /*2.

e. The number 8 is the separation of o\ from its companions. and it plays a role analogous to sep in the perturbation of eigenvectors of Hermitian matrices [see (3. if we know V we . Moreover.15). Now up to terms of order 11E \ \ 2. Specifically: Let where 6 is defined by (3. Hence: Let 6 be defined by (3.15).2. 3. we must take into account g3 defined by (3..We can bound \\g-2\\2 as above. THE SYMMETRIC EIGENVALUE PROBLEM where Since U<2 and v\ are orthonormal.8). we see that the sine of the angle between HI and u\ is essentially x/H^lli + ll^slli.14).28). if n > p.4) we have seen that the eigenvalues of the cross-product matrix A = XTX are the squares of the singular values of X and that columns of the matrix V of the eigenvectors of A are the right singular vectors of X. ||/12|| < \\E\\* Hence from (3. the additional zero eigenvalues of XX^-. We can do this by adjusting 6 to take into account the honorary zero singular values — i. the quantity 11 h2 \ \ 2 is the sine of the angle between Vi and vi. however. Similarly. Then Thus 6 1 (or S0 1 ) is a condition number for the singular vectors corresponding to the singular value a\. Then Turning now to the left singular vectors. THE CROSS-PRODUCT ALGORITHM In (3. Chapter 1].21C CHAPTER 3. which we seek in the form (1 g£ g% ) .

This amounts to replacing X by X — X + E. 3. Otherwise put. E = AJ FormA = Jf T *X Compute the spectral decomposition A = VAVT Up = X*V Normalize the columns of Up Algorithm 3. We already have algorithms to calculate the spectral decomposition. 5. THE SINGULAR VALUE DECOMPOSITION 211 Let X G R n x p (n> p).3). Chapter 1]. to expect any accuracy in crt. which for defmiteness we will call the cross-product algorithm.1. which is cheaper than accumulating transformations. is of the order e"1. By Theorem 3. will be totally inaccurate. This algorithm has some attractive features.3. the perturbed singular value <7. Specifically. By the time K. All this suggests Algorithm 3. Inaccurate singular values The first problem is that we lose information about the singular values in passing from X to A. The formation of A is cheaper than the reduction to bidiagonal form that we will propose later—much cheaper if X is sparse.1: The cross-product algorithm o can recover the first p columns of U via the formula XV = UPE. 3. 4.SEC.we must have Turning now to the cross-product method. Nonetheless. 2. The computation of Up involves only matrix multiplication. if <7. this perturbation can change the singular values of X by as much as (?i€M. and the first p left singular vectors of X. the right singular vectors. let us now assume that we compute A — X^-X exactly and then round it to get A (we could hardly do better than that VolII: Eigensystems . «t becomes larger and cr.. where Here eM is the rounding unit for the machine in question [see (2. the algorithm has severe limitations.has fewer and fewer accurate digits. is the ith singular value of-X t then the relative error in <j{ satisfies As <j{ gets smaller. This algorithm computes the singular values. 1. Let us first consider what happens to the singular values when we round X. and it is important to understand what they are.

000009260909809e-06 1. so that the cross-product algorithm produces an imaginary singular value. Consequently the condition Ai/A z < c"1 is equivalent to This says that the range of singular values that the cross-product algorithm can compute with any accuracy has been severely restricted. when KP > eM 2 . . We can show as above that the eigenvalues of A and A satisfy This bound has the same interpretation as (3. _i_ Although it is not illustrated by this example. if €M = 10~16 then any singular value satisfying ftz > 108 will have no significant figures when represented as an eigenvalue of A—even though it may have as many as eight significant figures as a singular value of X. it is possible for Xp to be negative.999999970046386e-09 VA l.18).000000000000072e-02 1. A matrix X was generated in the form X = UDVT.1). We now note that A 2 = of. THE SYMMETRIC EIGENVALUE PROBLEM in statement 1 of Algorithm 3.000000000014129e-06 9.000000001992989e-04 1.OOOOOOOOOOOOOOOe+00 l. Example 3.5.OOOOOOOOOOOOOOle-02 1. Thus we should not expect the cross-product algorithm to work well if the ratio of the largest to the smallest singular value of X is too large. it is seen that the errors increase as predicted by the analysis above. where U and V are random orthogonal matrices and The singular values ofX and the square roots of the eigenvalues of A = X^X were computed with the following results.177076334615305e-08 Since €M = 10~16. a l.OOOOOOOOOOOOOOOe+00 1. In particular we must have Ai/A.212 CHAPTER 3. < e^1 to expect any accuracy in A. An example will illustrate the above considerations.000000000000487e-04 1. For example.-.

we see that when K{ and KJ are large. THE SINGULAR VALUE DECOMPOSITION Inaccurate right singular vectors 213 We can perform a similar analysis to show that the right singular vectors of X can be less accurately represented in A. so that right-hand sides of the bounds (3.20) are approximately Since K P _I = 1.19) and (3. The singular vector in A is more sensitive to rounding errors than the same vector in X. The common sense of these bounds is that the subpace corresponding to the large singular values is well determined and hence its orthogonal complement—the space spanned by vp — is also well determined. From (3. then Since \\E\\2 < <7ieM» we have A similar analysis for A shows that where V{ is the ith eigenvector of A. (3. we find that KP is much greater than K P _I. Specifically. VolII: Eigensystems . however. that as ap-\ drifts away from a\. the bound for the cross-product algorithm deteriorates more rapidly.SEC. If Vi denotes the singular vector of the rounded matrix.20) has the smaller denominator.19) and (3. Noting once again that A. although the small singular value ap is ill determined. the vector vp is well determined by both algorithms. Specifically. we see that the perturbation of Vi depends on the separation of O{ from its nearest neighbor—call it aj.20). consider the effects of rounding X on the left singular vector vt. suppose v\ = ap-\ while ap is small. even when the singular value itself has no accuracy. Then with i = p and j = p— 1. It is a curious and useful fact that the singular vector corresponding to the smallest singular value can be quite accurate. we can rewrite this bound in the form If we compare (3. = of. Note. 3.16).

wz. as we have seen. we have \\Wi\\2 = &{. —call it Wi — satisfies where Here 7 is a constant that depends on the dimensions of X. be very accurate. and the computed w. if KP is not too large. Now W{ = Xvi + Xei + Evi = W{ + X e{ + Ev{. Assessment of the algorithm The above analysis shows that the cross-product algorithm is not suitable as a generalpurpose algorithm. However. Even when KP is large. will have no accuracy at all. Now an elementary rounding error analysis shows that the computed w. as previously. then e in (3. THE SYMMETRIC EIGENVALUE PROBLEM Inaccurate left singular vectors The left singular vectors computed by our algorithm will also be less accurate. We have not specified the size of e in our analysis because the technique for computing left singular vectors is sometimes used independently with accurate V{ (c = eM). For more see the notes and references. For example. the right singular vectors vp can. e can be quite large and have a devastating effect. if £M = 10~16 and Ki = 106. although the cause of the loss of accuracy is different. the bounds from the analysis suggest when the algorithm can be used. and assume that The cross-product algorithm computes the zth left singular vector by forming Wi = Xvi and normalizing it. Hence Since Wi = <J{Ui.will be computed inaccurately if <rz is small. . if V{ is computed from the cross-product algorithm. Normalizing it will not restore lost accuracy. Hence the normwise relative error in w^ is Thus. For example.21) will be about 10~4.214 CHAPTER 3. then the computed singular value ap may be satisfactory. Perhaps more important. Let V{ be the ith right singular vector computed by the cross-product algorithm.

its efficient implementation requires that the matrix be reduced to a simpler form. THE SINGULAR VALUE DECOMPOSITION 215 3.and postmultiplying X by Householder transformations.SEC.3. REDUCTION TO BIDIAGONAL FORM The classical algorithm for computing the singular value decomposition is a variant of the QR algorithm. we determine a Householder H transformation to introduce zeros in the hatted elements in the following Wilkinson diagram: Then H X has the form We next choose a Householder transformation K such that HXK has zeros in the hatted elements of (3. 3.23) to give a matrix of the form Vol II: Eigensystems . Like its counterparts for general and symmetric matrices. which we will write as The reduction As usual let X be an nxp matrix with n > p. The reduction to bidiagonal form proceeds by alternately pre. In this case the form is an upper bidiagonal form. For the first step.

k+I:p]. 6T. A?:p] . n—1} to 1 by —1 a = X[A.?A:-M:ri aT = 6T*F[Ar+l:p. The transformations from the left are accumulated in the n xp matrix U. Ar+l:p] . 1. 6. 21. 19. 16. it(k<p-l) for k — min{p.p] if(n = p)rf[p] = A'[p. k+l:p] = X[k:n.a*6T end if housegen(X[k.k+l:p] = bT a = X[k+l:n. 27.a*6T end if end for k e\p-l] = X\p-l. 17.k+l:p] X[k:n. e. Accumulate V V = Ip 25. fork = Hop if (k < porn > p) housegen(X[k:n. 22. 18. 8. 4.22).:7i. 20. BiReduce reduces X to the bidiagonal form (3. BiReduce(X. 24. 26. 3.py 9. k+l:p] .k+l:p]*b X[k+l:n.k]. 23. 5.2: Reduction to bidiagonal form fork = p-2tolby-l .6*aT end for k end BiReduce Algorithm 3.k] = a b1 = aT*X[k:n. 10. 11. A?+l:p] F[fc+l:p. 28. 13. 29. a.fe+l:p]= V[A?+l:p. k:p] = C/"[/^:n. k+l:p] = X[k+l:n.fi Accumulate U V=( /' ) \Un-p.^:p] ?7[jfc:ra. U. 6T = A"[A. On return. 15. d[k}) X[k:n. 14. V) Reduce X 2. 12. and the transformations from the right in the pxp matrix V. d. 30. THE SYMMETRIC EIGENVALUE PROBLEM Given an nxp matrix X with n > p. 1.216 CHAPTER 3. e[k]) X[k.a*6T end for A.ri. k+l:p] .Ar] 6T = aT*C/[^:n. the array X contains the generators of the Householder transformations used in the reduction.

3. This saves some operations.SEC. If n = p. THE QR ALGORITHM In this subsection we are going to show how the QR algorithm can be used to compute the singular values of a bidiagonal matrix B. it is about |p3 flam. We can insure this by transforming the complex bidiagonal form into a real form. the tridiagonal form will also be complex. Real bidiagonal form When X is complex. Algorithm 3. THE SINGULAR VALUE DECOMPOSITION 217 The reduction proceeds recursively on the matrix X[2:n. the algorithm sets the stage for the subsequent computation of the singular value factorization (3. Here are some comments. What distinguishes the bidiagonal singular value problem from the other problems we have considered is that small relative changes in the elements of B make only small relative changes in the singular Vol II: Eigensystems .3). It alternately scales the rows and columns of the bidiagonal matrix. When n >• p this part of the computation becomes overwhelmingly dominant. and the operation count will be reduced if the rotations are real. Algorithm 3. Chapter 2. performs the reduction to real form. Thus if n > p the total work is about 4np2 flam. real. p3 flam flam for the accumulation of V. This fact will prove important in §3. Algorithm 3. • The first column of the matrix V is ei.2.2 implements this reduction. the algorithm does not accumulate the transformations as they are generated. Our subsequent manipulations of this form will use plane rotations. If we accumulate the entire U. • By accumulating only the first p columns of the transformations from the left. 2:p\. • The algorithm is backward stable in the usual sense.2 requires 2np2 — -p3 flam for the reduction of X. the operation count rises to 2n2p—np2. we can save the transformations that are generated and apply them whenever we need to form a product of the form Ux. 3.4.4. o 2np2 — p3 fl flam for the accumulation of U. it is convenient to divide the labor. • As in Algorithm 2. • In deriving an operation count for the algorithm. which is an analogue of Algorithm 1. 3. the row scalings making the di real and the column scalings making the e. Instead it saves the generating vectors in X and accumulates the transformations after the reduction beginning with the last transformations. • If we require the effect of the full matrix U.2.

2 and transforms the bidiagonal matrix to real bidiagonal form. The transformations are accumulated in U and V. THE SYMMETRIC EIGENVALUE PROBLEM This algorithm takes the output of Algorithm 3.218 CHAPTER 3. Algorithm 3.3: Complex bidiagonal to real bidiagonal form .

we may expect to see e p _i also converge to zero. determine a plane rotation PI 2 such that the (1. We then treat the problem of when the superdiagonal elements of B can be considered negligible. this column is easily computed: Thus the cosine and sine of the initial rotation PI 2 are determined by the numbers d\ a2 and d\ei. Consequently. Since C = BTB = VTP?2BTBP12V = VTP?2CP12V. we can mimic a QR step on BTB while working with B. Throughout this subsection: B will be an upper bidiagonal matrix represented in the form The bidiagonal QR step The matrix C = B^B is easily seen to be symmetric and tridiagonal. shows that C is the matrix that would have resulted from applying one step of the QR algorithm with shift cr2 to C.6). we could apply the tridiagonal QR algorithm to B^B to find the squares of the singular values of B.2)-element of (C .3. Vol II: Eigensystems . Given a shift <r. 1. This means that we must take care that our convergence and deflation procedures do not introduce large relative errors. THE SINGULAR VALUE DECOMPOSITION 219 values of B. 2. the reasoning following (3. 3. Since the repeated application of this algorithm (with suitable shifts) will cause C P _I )P to converge to zero. We begin with the basic QR step and then show how to compute the shift. Determine an orthogonal matrix U and an orthogonal matrix V whose first column is a multiple of ei such that B = U^BP^V is bidiagonal. some scaling should be done to prevent overflow and underflow. we need the first column of C. Unfortunately. However.SEC. To implement this procedure. Since B is bidiagonal. The solution provides us with the wherewithal to describe a back search in the spirit of Algorithm 3. Since these numbers are composed of products. which we have shown to have substantial drawbacks. in which case the problem deflates.<72/)Pi2 is zero. Chapter 2. The procedure goes as follows. Chapter 2. this procedure is a variant of the cross-product algorithm.

Specifically. THE SYMMETRIC EIGENVALUE PROBLEM We could use by Algorithm 3.2) element.3)-element.1.220 CHAPTER 3. The entire procedure is illustrated in Figure 3.2)-plane to get the matrix The bulge has now moved to the (1.4 requires The step is backward stable in the usual sense.4 implements a QR step between rows il and i'2. We can remove this bulge by premultiplying by a plane rotation U\i in the (1. However. Specifically: Algorithm 3. l)-element. . the matrix BPi2 has the form Note that BP bulges out of its bidiagonal form at the hatted (2.3)-plane: The bulge is now at the (3. We can eliminate it by postmultiplying by a rotation ¥23 in the (2. • The accumulation of the transformations accounts for most of the work. Here are some comments. Algorithm 3.2 to reduce PuB to bidiagonal form. We can continue in this manner until the bulge is chased out the bottom of the matrix. the fact that B is bidiagonal can be used to simplify the algorithm. at least when p is large. In practice the QR step will not be applied to the entire matrix.

SEC. 3.4: Bidiagonal QR step Vol II: Eigensystems . The transformations are accumulated in U and V. THE SINGULAR VALUE DECOMPOSITION 221 Given an upperbidiagonal matrix represented as in (3. BiQR performs a QR step between rows il and i2.24) and a shift cr. Algorithm 3.

The following theorem gives an explicit formula for this singular value. THE SYMMETRIC EIGENVALUE PROBLEM Figure 3.1: A bidiagonal QR step Computing the shift The shift a in Algorithm 3. then the smallest singular value is given by Proof.6.222 CHAPTER 3.4 is generally computed from the trailing principal submatrix B[il—l:ilj il—l:il]. The eigenvalues of the matrix . Theorem 3. Let Then the largest singular value ofT is given by the formula If T •£ 0. A reasonable choice is to take a to be the smallest singular value of this matrix.

Unfortunately. g. which is not greater than crmax. Let the quantities TV be defined as follows. are roots of (3. • In practice the formulas must be scaled and rearranged to avoid overflows and harmful underflows. there is a real danger of compromising the relative accuracy of the singular values.6 not only produce singular values to high relative accuracy. we must decide when we can set a superdiagonal element to zero. Fortunately. the singular values of T can be computed by inspection. • Here are three comments on this theorem. The formula (3. THE SINGULAR VALUE DECOMPOSITION 223 are the squares of the singular values of T. and r are zero. and r.SEC. the following result can be used to tell when we can safely ignore superdiagonals. This is a special case of an important fact about the singular values of bidiagonal matrices.26) follows from the fact that <rmax<7min = \pq\. and hence 0"max and cTmin are the largest and smallest singular values of T. VolII: Eigensystems . For more see the notes and references. Since this is an absolute change.27). • If at least one of p. in testing convergence. The characteristic equation of this matrix If we set then it is easily verified that Thus cr^ax and cr^. Small relative perturbations in the elements of a bidiagonal matrix cause only small relative perturbations in its singular values. q. 3. • The formulas yield approximations to the singular values that have high relative accuracy. Negligible superdiagonal elements The formulas in Theorem 3. but they are themselves insensitive to small relative perturbations in the quantities p.

30) the parameter tol is approximately the relative error we can expect from setting the current superdiagonal to zero. then To see what (3. it works well on graded matrices. il = i'2). THE SYMMETRIC EIGENVALUE PROBLEM However.29) means.e. then the relative error is bounded by e. Let B be a bidiagonal matrix in the form (3.7 (Demmel-Kahan). A reasonable value is the rounding unit. Back searching We can combine (3..7 to get Algorithm 3. provided the grading is downward. Let and consider the intervals If<?i lies in the connected union ofm intervals Ik that are disjoint from the other intervals.224 It can be shown that CHAPTER 3. Some of these are discussed in the notes and references. By (3. which searches back for a block in which the problem deflates. It follows that Thus the result of setting ej to zero can introduce a relative error of no greater than me in any singular value of B.7 as it proceeds backward through the matrix. Theorem 3. the real significance of the number r. and for fixed j let B be the matrix obtained by setting ej to zero. Nonetheless.28) and the results of Theorem 3. the sequence of rt. take the exponential to get If me is small.is restarted.5. there are further embellishments that can improve the algorithm. Final comments The algorithmic pieces we have created in this and the previous section can be assembled to produce a reliable algorithm for computing the singular values of a general matrix. is contained in the following theorem. It generates the rt from Theorem 3. Although no theorems can be proved. Note that whenever the problem deflates (i. . then e±mc = 1 ± me. If the singular value in question is isolated and e is sufficiently small.24). Let c?i and <j{ denote the singular values ofB and B arranged in descending order.

24).SEC.5: Back searching a bidiagonal matrix VolII: Eigensystems . THE SINGULAR VALUE DECOMPOSITION 225 This algorithm takes a bidiagonal matrix B represented as in (3. 3.12 < t such that one of the following conditions holds. and an index i (1 < t < p) and determines indices /7. Algorithm 3. a tolerance tol.

If instead. 3.4 were presented in [269. • The matrix Q will generally be computed as the product of the Householder transformations used to reduce X to triangular form (§1:4. • When n > p. tion^Y = UXVH. For more historical material see [263] and §1:1. For ordinary singular values the second-order term makes no essential difference.1 can be found in §1:1.1874].6: Hybrid QR-SVD algorithn 3. where the expansion of the singular value is carried out to the second order. A HYBRID QRD-SVD ALGORITHM The algorithm sketched in the last subsection is reasonably efficient when n is not much greater than p.1873] and Jordan [137.6]. Specifically. Algorithm 3. compute the singular value decomposition of R. however. Here are three comments. Missing proofs in §3. • The computations can be arranged so that Q and then U overwrites X. we save np2 flam and in the bargain have an implicit basis for both the columns space of X and its orthogonal complement. which requires 0 (p3) operations.6 implements this scheme. bringing the total operation count to 3np2 flam.5. An alternative is to first compute the QR factorization of A. NOTES AND REFERENCES The singular value decomposition The singular value decomposition was established independently and almost simultaneously by Beltrami [19. The price to be paid is that one must access that basis through subprograms that manipulate the Householder transformations. statement 2.1.4.6. the QR decomposition of X can be computed in Inp2 flam while the generation of U requires np2 flam. and multiply the left singular vectors intoQ. but it shows that small singular . we store the transformations themselves and retain the matrix W.4. contributes only negligibly to the total operation count. THE SYMMETRIC EIGENVALUE PROBLEM Let X G R nxp .2). one pays a high price for the accumulation of plane rotations in U.4. where n > p. Perturbation theory Parts of Theorem 3. When n > p. Algorithm 3. This algorithm computes the singular value factoriz. which is 0(np2).5.226 CHAPTER 3. Theorem 1.

The derivation given here may be found in the Handbook contribution [91].is negligible if |e.SEC. where tol is the relative error required in the singular values. This algorithm is used whenever the relative accuracy tol-g_ required of the smallest singular value is less than the error CTCM that can be expected from the shifted QR algorithm (here a is an estimate of the largest singular value). Our implementation of the shifted bidiagonal QR algorithm proceeds from the top of the matrix to the bottom. This permits a cheap test for negligibility of the €i. a standard transformation is to subtract column means. Subsequent analyses are cast in terms of C. THE SINGULAR VALUE DECOMPOSITION 227 values tend to increase in size under perturbation. fcin]"1!]^1. Owing to limitations of space we cannot do the paper full justice in this chapter. • Bottom to top QR. For example. it is possible to compute a lower bound g_ for the smallest singular value. Whenever an analysis involves the eigenvalues and eigenvectors of C. • Additional tests for negligibility. which is fine for matrices graded downward. and form a cross-product matrix C called the correlation matrix. For matrices that are graded upward there is are variants of both the shifted and unshifted algorithm that proceed from bottom to top. The cross-product algorithm The cross-product algorithm is widely and unknowingly used by statisticians and data analysts. The analysis of the cross-product algorithm appears to be new. given a data matrix X. • Zero shifted QR. From this. If the current block extends from il to /2. Reduction to bidiagonal form and the QR algorithm The reduction to bidiagonal form is due to Golub and Kahan [88. et. It is a curious fact that the QR algorithm for the singular values of bidiagonal matrices was first derived by Golub [86. Namely. method for choosing among these methVolII: Eigensystems . though not foolproof. The authors present a forward recurrence that computes the quantity ||J9[fc:n. 3. it amounts to an implicit use of the cross-product algorithm. a natural. If the shift in the QR algorithm is zero.| < tol-a. much data is noisy enough that the errors introduced by passing to C are insignificant—at least when computations are performed in double precision. Fortunately.1965]. 1968] without reference to the QR algorithm. Much of the material on the implementation of the QR algorithm for bidiagonal matrices is taken from an important paper by Demmel and Kahan [58]. But here are some additional details. the computations can be arranged so that all the singular values retain high relative accuracy. normalize. although partial analyses and instructive examples may be found in most texts on numerical linear algebra— often with warnings never to form the cross-product matrix. The derivation of the first-order bounds given here is new. The lesson of the analysis given here is that one should not condemn but discern.

let V and V be plane rotations in the (i. Their paper also contains a summary of other divide-and-conque algorithms with extensive references. Divide-and-conquer algorithms Gu and Eisenstat [101] give a divide-and-conquer algorithm for the bidiagonal singu lar value problem. j). i)-elements ol U^XV are zero—i.and (j. For some * < j. solve the singular value problem for the embedded 2x2 matrix applying the transformations to rows and columns % and j of X. 35]. due to Kogbetliantz [149. An interesting problem is to update th( singular value decomposition after a row has been added or removed. j)-plane such that the (t. Fernando and Parlett [73] have devised a nev variant of the shifted qd algorithm that calculates them efficiently to high relative ac curacy. as well as references to previous work. this transformation increases the sum of squares of the diagonal elements of X and decreases the sum of squares of the off-diagonal elements by the same amount. Dovvndating a singular value decomposition The singular value decomposition is useful in applications where X is a data matrb with each row representing an item of data.1955] works with square X.228 CHAPTER 3. THE SYMMETRIC EIGENVALUE PROBLEM ods is to use the top-down method if d[il] > d[i2] and otherwise use the bottom-up method. The latter prob lem is called downdating and comes in several varieties. The differential qd algorithm When only singular values are desired. Jacobi methods Jacobi methods (see p. Higham [115] has shown that row and columi pivoting in the computation of the QR decomposition improves the accuracy of th( singular values. . The two-sided method. Code based on these ideas may be found in the LAPACK routine xBDSQR. depending on what parts o: the singular value decomposition we have at hand. A typical iteration step goes as follows.. The algorithm has been implemented in the LAPACK routine SLASQ1. As with the Jacob algorithm. 202) can be generalized in two ways to find the singular value decomposition of a matrix X. Gu and Eisenstat [102] give three algorithms for the downdating problem.e. The QRD-SVD algorithm The idea of saving work by computing a singular value decomposition from a QR de composition is due to Chan [34.

i. The S/PD Eigenvalue Problem 229 An interesting and work-saving observation about this method is that if X is upper triangular and elements are eliminated rowwise. Mathematically it is a generalization of the symmetric eigenvalue problem and can.109. In §4. 4. 107. As usual in this chapter.. THE SYMMETRIC POSITIVE DEFINITE (S/PD) GENERALIZED EIGEN VALUE PROBLEM A generalized eigenvalue problem Ax = \Bx is said to be symmetric positive definite (S/PD) if A is symmetric and B is positive definite. The problem arises naturally in many applications. Definition 4. does not yield an unconditionally stable algorithm. 203]. abbreviated S/PD. The slash in S/PD is there to stress that two matrices are involved—the symmetric matrix A and the positive definite matrix B (which is by definition symmetric). VolII: Eigensystems . XTBX). 67.2 we present Wilkinson's algorithm. including some modifications to handle the case where B is ill conditioned with respect to inversion.1. then the result of a complete sweep is a lower triangular matrix. all matrices are real. in fact. The fundamental theorem We begin with a definition.1. 4. The one-sided method does not require X to be square. The following theorem shows that we can diagonalize A and B by such transformations. B) is said to be SYMMETRIC POSITIVE DEFINITE. For more see [8. and the purpose of this subsection is to describe the problem and its solution. which is due to Wilkinson. We begin by developing the theory of the S/PD generalized eigenvalue problem. 75. Choose two columns i and j of X and determine a rotation V such that the ith and j>th columns of XR are orthogonal.e. 37. Let A be symmetric and B be positive definite. 108. It is easy to see that the rotation V is the same as the rotation that would be obtained by applying a Jacobi step to X^X with pivot indices i and j. where X is nonsingular. 4. Then thepencil(A.Sec. transformations of the form ( X ^ A X . Although this reduction. it is the best we have. THEORY Here we establish the basic theory of S/PD pencils. be reduced to an equivalent symmetric eigenvalue problem. In transforming S/PD pencils it is natural to use symmetry-preserving congruences. There is an analogous two-sided method in which rotations are also applied from the left. The techniques that are used to analyze the Jacobi method generally carry over to the variants for the singular value decomposition.

we have proved that A can be diagonalized by a i?-orthogonal transformation. Let be its spectral decomposition. Thus. Set Then by construction X'1AX is diagonal. When B — /. Then there is a nonsingular matrix X such that and where Proof. Since B is positive definite and hence nonsingular.2. THE SYMMETRIC EIGENVALUE PROBLEM Theorem 4. it is natural to say that X is ^-orthogonal. Moreover.B) has real eigenvalues and a complete set of jB-orthonormal eigenvectors. In this sense the S/PD eigenvalue problem is a generalization of the symmetric eigenvalue problem. B) and X{ is the corresponding eigenvector. It follows that so that Aj = Cti/Pi is an eigenvalue of (A. • Since X^ BX = /. then Similarly. In this language. This implies that an S/PD pencil cannot have infinite eigenvalues. Bxi = [3iX~iei. There are four comments to be made about this theorem. Let (A>B) be an S/PD pencil. is also diagonal.230 CHAPTER 3. But other normalizations are possible. the numbers fa are positive. Z is also nonsingular. the theorem says A can be diagonalized by an orthogonal transformation. • The proof of the theorem gives eigenvectors Xi normalized so that xjBxi = 1. The matrix Z~TAZ~l is symmetric. our theorem states that the S/PD pencil (A. • If we write AX = X~^D^ and denote the ith column of X by Xi. • Whatever the normalization. . Let Z be any matrix of order n satisfying For example Z could be the Cholesky factor of B.

Let VolII: Eigensy stems .. We begin with the algorithm itself and then consider modifications to improve its performance in difficult cases. The S/PD Eigenvalue Problem Condition of eigenvalues 231 Let (A.13. lf(A+E. B) with ||x||2 = 1. Chapter 2. Let a = x T Axand/3 = xTBx so that (a. (However. The bound (4. This is generally true of the small eigenvalues. B) be an S/PD pencil and let be the Cholesky decomposition of B—i.e. see Example 4. This happens when both a. can be ill conditioned. as Example 3.5). the well-conditioned eigenvalues of an S/PD pencil are not necessarily determined to high relative accuracy. for E and F sufficiently small there is an eigenvalue A of the perturbed pencil such that Thus (a2 + ft2) 2 is a condition number for the eigenvalue A in the chordal metric. WILKINSON'S ALGORITHM In this subsection we consider an algorithm due Wilkinson for solving the S/PD eigenvalue problem. and fti are small. consider the problem If the (2.) On the other hand. 4. then by Theorem 4. oo). 4. For example.2. Chapter 2. the eigenvalue 105 becomes |-105. However. R is upper triangular with positive diagonals (see §A. or) be a simple eigenvalue of the S/PD pencil (A.2./?) is a projective representation of A. The algorithm Wilkinson's algorithm is based on the proof of Theorem 4.Sec. Let (A.9.12. Chapter 1. shows.B+ F) is a perturbed pencil. in the problem the eigenvalue 2 is generally insensitive to perturbations bounded by 10 .2)-element of B is perturbed to 2-10~5. unlike the eigenvalues of a Hermitian matrix. while perturbations of the same size can move the eigenvalue 1 anywhere in the interval [0. well-conditioned large eigenvalues can also be ill determined in the relative sense. For example.1) shows that the eigenvalues of an S/PD problem.

. See page 159. The one given here proceeds by bordering. . B) of order ra.1: Solution of the S/PD generalized eigenvalue problem be the spectral decomposition of R T AR l. Algorithm 4. If A is not needed later.1 implements this scheme.1. its upper triangle may be overwritten byR.232 CHAPTER 3. If B is not needed later.g. which is determined in part by C. Computation of R~T AR~l The computation of C = R~^ AR~l is tricky enough to deserve separate treatment. • The columns Xj of X may be formed from the columns Uj of U by solving the triangular systems The calculations can be arranged so that X overwrites U [cf. Then the matrix diagonalizes the pencil.2. this algorithm produces a matrix X such that Algorithm 4. there is more than one algorithm.7)]. • The exact operation count for the algorithm depends on how long the method used to compute the spectral decomposition of C takes. Here are some comments on this algorithm.2. If the QR algorithm (§1) or the divide-and-conquer algorithm (§2. its upper triangle may be overwritten by the upper triangle of C. In fact. the operation count is 0(n3) with a modest order constant. (1:2. Algorithm 1:3.2) is used. THE SYMMETRIC EIGENVALUE PROBLEM Given an S/PD pencil (A. • We will treat the computation of C in a moment. • Only the upper triangles of A and B need be stored. • The Cholesky factor R can be computed by a variant of Gaussian elimination— e.

• The bulk of the work in this algorithm is performed in computing the product of the current principal submatrix with a vector. if ll^lbll-B" 1 !^ ls l^S6 (see §I:3-3)—then B~1A will be calculated inaccurately. We begin by rewriting (4. but they are designed to allow Algorithm 4.2.. A defect of this approach is that if B is ill conditioned with respect to inversion—i.. This will enable us to solve our problem by successively computing the leading principal majors of R~T AR~l of order 1. Three comments. we will show how to compute c and 7.. Simply replace all references to C by references to A.2) in the form Then by direct computation we find that and Thus we can compute 7 and c by the following algorithm. For a scheme in which the bulk of the work is contained in a rank-one update see the LAPACK routine XSYGS2. The S/PD Eigenvalue Problem Specifically. • The algorithm takes |n3 fla • The arrangement of the computations in (4. since the large eigenvalues of (A..2 implements this scheme.3) may seem artificial. Ill-conditioned B Wilkinson's algorithm is effectively a way of working with B~l A while preserving symmetry. consider the partitioned product 233 Assuming that we already know C = R~TAR 1. n. Algorithm 4.Sec.2 to overwrite A by C. There is no complete cure for this problem. B) are ill determined..e. 4. A partial cure—one which keeps the errors confined to the larger eigenvalues—is to arrange the calculations so that the final matrix C is Vol II: Eigensystems .

This produces a decomposition of the form where P i s a permutation matrix and R is graded downward. THE SYMMETRIC EIGENVALUE PROBLEM Given a symmetric matrix A and a nonsingular upper triangular matrix R. If we set where f is the cross matrix. As might be expected by analogy with the Hermitian eigenproblem. B).5. then C will be graded downward. much more can be said about the eigenvalues and eigenvectors of S/PD pencils. As we have observed (p. then the diagonal elements of A are eigenvalues of (A. B).3). Algorithm 4. 108ff) the reduction to tridiagonal form and the QR algorithm tend to perform well on this kind of problem. NOTES AND REFERENCES Perturbation theory We have derived the perturbation theory for S/PD pencils from the perturbation theory for general pencils. this algorithms computes C = R~TAR~l. then the diagonal elements of A are eigenvalues of (A.3. . and the corresponding eigenvectors are the columns of X = PR~lfU. If we then compute the spectral decomposition C = t/AJ7T. One way to produce a graded structure is to compute the spectral decomposition B = VMVT.If we define C = M _ 1 V T AVM — 2 and compute its spectral decomposi2 tion C = £/A{7T. 4. where the diagonal elements of M are ordered so that ^i < ^ < — • • • < Mn. and the corresponding eigenvectors are the columns of X = VM~^U.2: The computation of C = R~T AR~l graded downward.234 CHAPTER 3. A less expensive alternative is to use a pivoted Cholesky decomposition of B (see §A.

an algorithm for problems of the form ABx — Ax. It seems. Since each deflation step requires the computation of a spectral decomposition. They also reduce the eigenproblem B Ax — Xx. Chandrasekaran [36] has noted that the large eigenvalues and corresponding eigenvectors of C (obtained from the spectral decomposition of B) are well determined. where A is symmetric and B is positive definite. although he does not mention the necessary sorting of the eigenvalues. even when A and B are banded. To take advantage of the banded structure Wilkinson notes that Gaussian elimination on A — \B gives a count of the eigenvalues less than A. 4. that only a very few deflation steps are required. Wilkinson's algorithm Wilkinson presented his algorithm in The Algebraic Eigenvalue Problem [300. just as in Algorithm 2. §§6872. Wilkinson mentions. have given significantly improved bounds. starting from a different perspective. giving a smaller pencil containing the remaining eigenpairs. He shows how to deflate these well-determined eigenpairs from the original pencil. Wilkinson discusses the problem of ill-conditioned B and suggests that using the spectral decomposition of B instead of its Cholesky decomposition to reduce the problem will tend to preserve the small eigenvalues. however. where A is symmetric and B is positive definite. Thus the eigenvalues of (A. The definite generalized eigenvalue problem A Hermitian pencil (A. Eigenvectors can then be computed by the inverse power method suitably generalized. B) is definite if A definite pencil can in principle be reduced to an S/PD pencil. B) can be found by interval bisections.Sec. 5]. The LAPACK routinesxxxGST reduce the S/PD eigenproblem Ax — XBxto standard form. Li and Mathias [163] have revisited this theory and. Band matrices Wilkinson's method results in a full symmetric eigenvalue problem. The S/PD Eigenvalue Problem 235 This theory is surveyed in [269. §IV. the algorithm potentially requires 0(n4} operations. consider the set Vol II: Eigensystems . Code for both problems may be found in the Handbook [170]. Ch. Specifically.4. Recently.3]. but does not derive.

1. UY. Now if 7( A. of course. For an implementation see [43].3].3. then f( A. VI. it can be rotated to lie in the upper half of the complex plane. G) is called the field of values of A + iB.4. which is equivalent to Be being positive definite. with W~l on the right-hand sides. Specifically. Crawford and Moon [44] describe an iterative algorithm for computing a 9 such that BO is positive definite (provided.n > p. and the algorithm can be 0(n4). and it can be shown to be convex. The generalized singular value decomposition is related to the CS decomposition of an orthonormal matrix (see [202.236 CHAPTER 3. THE SYMMETRIC EIGENVALUE PROBLEM The set f(A. B) does not contain the origin. §§VI. For another approach see [40]. let be a QR factorization of (JfT yT)T.6).4) ranges only over 7£n.2. and V. Then there are orthogonal matrices Ux and Uy and a nonsingular matrix W such that AND where EX and Sy are diagonal. For more on definite pairs see [269.258] or §1:1.1981]. [It is a curious fact that the above argument does not work for n = 2 and real symmetric A and B when the vector x in (4. For much the same reasons as given in §3.1976] and in the above form by Paige and Saunders [202. such that . B) > 0. If we define then the field of values TQ of Ag + Be is just the set jF rotated by the angle theta. Each step requires a partial Cholesky factorization of the current B$. a computational approach to the generalized singular value decomposition through cross-product matrices is deprecated. and since it is convex. However. that the original pencil was definite). whereTO. by Van Loan [283. This generalized singular value decomposition was proposed. It is easily seen that AND so that the generalized singular value decomposition is related to the generalized eigenvalue problem in the cross-product matrices just as the singular value decomposition is related to the eigenvalue problem for the cross-product matrix.] The above argument is nonconstructive in that it does not give a means for computing a suitable angle 0. The generalized singular value and CS decompositions Let X £ E nxp and Y G R m X p . Then the CS decomposition exhibits orthogonal matrices Ux.

260] and Van Loan [284].) If we combine (4. 4.6) and (4. whence the name CS decomposition. Algorithms for computing the CS decomposition and hence the singular value decomposition have been proposed by Stewart [259.7) and set J£T = V^Ry the result is the generalized singular value decomposition (4. Vollt: Eigensystems . The LAPACK routine xGGSVD uses Van Loan's approach. (This last relation implies that the diagonal elements of E^ and Sy can be interpreted as cosines and sines.Sec. Paige [200] has given a method that works directly through X and Y.5). The S/PD Eigenvalue Problem 237 where EX and Sy are diagonal matrices with S^ + Sy = /.

This page intentionally left blank .

we will not be able to store all its elements. and in §2 we present the analytic theory—residual analysis and perturbation bounds. This limitation is not as burdensome as it might appear. With appropriate data structures we can get away with storing only the latter.4 EIGENSPACES AND THEIR APPROXIMATION The first half of this book has been concerned with dense matrices whose elements can be stored in the main memory of a computer. Instead we must be satisfied with computing some subset of its eigenpairs. their zero elements greatly outnumber their nonzero elements. if the order of the matrix in question is sufficiently large. this chapter is devoted to the basic facts about eigenspaces and their approximations. In §3 we treat one of the most popular ways of generating subspaces—the Krylov sequence—and in §4 we analyze the RayleighRitz method for extracting individual eigenvectors from a subspace. cannot be computed directly. our chief concern is with large eigenproblems. We now shift focus to problems that are so large that storage limitations become a problem. Unfortunately. The space spanned by a set of eigenvectors is an example of an eigenspace. in consequence. Second. First. the eigenvectors of a sparse matrix are in general not sparse. and. It will often help to think ofn as being very large. most algorithms for large sparse eigenvalue problems confine themselves to matrix-vector multiplications or perhaps the computation of an LU factorization. Because eigenspaces. We begin with the basic algebraic theory of eigenspaces. It follows that if the matrix is large enough we cannot afford to store its entire eigensystem. Therefore: Unless otherwise specified A will denote a matrix of order n. these data structures restrict the operations we can perform on the matrix. 239 . and it turns out that algorithms for large matrices are best explained in terms of eigenspaces. that is. like eigenvectors. Consequently. This problem has two aspects. Many large matrices arising in applications are sparse. Although all of the results of this chapter apply to matrices of any size. most algorithms for large problems proceed by generating a sequence of subspaces that contain increasingly accurate approximations to the desired eigenvectors.

A less trivial example occurs in the proof of the existence of real Schur decompositions (Theorem 3. There we showed that if (A.1. the space X — 7l[(y z)} is an eigenspace of A. a matrix that does not have a complete system of eigenvectors may instead have a complete system of eigenspaces of low dimension corresponding to groups of defective eigenvalues.y + iz] is a complex eigenpair of a real matrix A. This construction illustrates the utility of eigenspaces.1) concerns a specific basis for the subspace in question. then for some real 2x2 matrix L.1).1. Eigenbases The definition of eigenspace was cast in terms of a subspace of C n . EIGENSPACES 1. while the expression (1. x) is captured more economically by the relation (1.240 CHAPTER 4. DEFINITIONS We have already observed (p. 1. Let A be of order n and let X be a subspace ofCn. Definition 1. . Then X is an EIGENSPACE Or INVARIANT SUBSPACE of A if The space spanned by an eigenvector is a trivial but important example of an eigenspace. or) = (\. In §1. They are also important because one can use them to isolate ill-behaved groups of eigenvalues. EIGENSPACES In this subsection we are going to develop the theory of subspaces that remain invariant when acted on by a matrix. Chapter 2). For example. In the first subsection we will present the basic definitions and properties. The information contained in the set of complex eigenpairs (A. Since the column space of (y z]L is a subset of the column space of (y z). which consists entirely of real quantities. Moreover. The following theorem summarizes the properties of a basis for an eigenspace.2 we will consider simple eigenspaces—the natural generalization of a simple eigenvector. the eigenvalues of L are A and A. x} and (A. In computational practice we will always have to work with a basis. We have already noted that these invariant subspaces—or eigenspaces as we will call them — are useful in the derivation and analysis of algorithms for large eigenvalue problems.1. The notion of an eigenspace is a generalization of this observation. Eigenspaces can be used to bring together combinations of eigenvalues and eigenvectors that would be difficult to treat separately. 2) that the subspace spanned by an eigenvector is invariant under multiplication by its matrix.

e. the relation is straightforward. 1.SEC. yi can be expressed uniquely as a linear combination of the columns of X. for some unique vector t{. Xu) is an eigenpairofA. then Now let (A. Proof. However. then (A. ElGENSPACES 241 Theorem 1.. Since y\ 6 X and X is a basis for X. for the important case of a simple eigenspace (Definition 1. i. #) be an eigenpair of A with x G X. The theorem also shows that there is a one-one correspondence between eigenvectors lying in X and eigenvectors of L. u] is an eigenpairofL. If we set L — (i\ • • • ^).e. The existence of a unique representation L associated with a basis X for an eigenspace X justifies the following extension of the notion of eigenpair. then AX = XL. then (A. Let be partitioned by columns. a matrix satisfying X1X — I. If L does not have a complete system of eigenvectors.. In particular L has a complete set of eigenvectors if and only if the eigenvectors of A lying in X span X. If (A. Xu) is an eigenpair of A. Hence and the relation Lu = Xu follows on multiplying by X1. if (A. Then there is a unique vector u such that x — Xu.7). Vol II: Eigensystems . As we will see. Then there is a unique matrix L such that The matrix L is given by where X1 is a left inverse ofX —i. If X1 is a left inverse of X. if Lu = Xu. as required. • We say that the matrix L is the representation of A on the eigenspace X with respect to the basis X.2. then so that (A. Conversely. however. x) is an eigenpairofA with x £ X. Conversely. u = Xlx. Let X be an eigenspace of A and letX be a basis forX. the relation between the eigensystem of A and that of L can be very complicated. The theorem shows that L is independent of the choice of left inverse in the formula L = X1AX. Xlx) is an eigenpairofL.

A) be a multisubset of the eigenvalues of A. Theorem 1. Consequently. Existence of eigenspaces Given that every eigenspace has a unique set of eigenvalues. we say that the eigenpair (L. then (L. IfY G Cnxk has linearly independent columns and YHA = LYH. Its proof. . . we say that (L. Theorem 1. we get .5.W~1X) is an eigenpair of"W~l AW.. EIGENSPACES Definition 1. LetC = { A i . we can speak of the eigenvalues of A associated with an eigenspace.. An important consequence of this theorem is that the eigenvalues of the representation of an eigenspace with respect to a basis are independent of the choice of the basis. . The following theorem shows that indeed it has. then y = T^(Y] is a left eigenspace of A in the sense that The following theorem shows how eigenpairs transform under change of basis and similarities. it it natural to ask if any set of eigenvalues has a corresponding eigenspace. IfW is nonsingular. Y) is a LEFT EIGENPAIR of A. X) is orthonormal. Proof Let be a partitioned Schur decomposition of A in which TH is of order k and has the members of C on its diagonal.3.242 CHAPTER 4. V) is a left eigenpair of A. . we say that (L. A^. Then there is an eigenspace X of A whose eigenvalues are AI. . Note that if (L. . If X is orthonormal. IfU is nonsingular.4. then the pair (U~1LU. Let A beofordern. Let (L.X) be an eigenpair of A. ForX € Cnxk andL € Ck*k.XU) is also an eigenpair of A.X) is an EIGENPAIR OF ORDER & Of RIGHT EIGENPAIR OF ORDER k of A if We call the matrix X an EIGENBASIS and the matrix L an EIGENBLOCK. which is simple and instructive. A^} C A(. is left as an exercise. Computing the first column of this relation.

Vol II: Eigensystems . • The relation (1. Unfortunately. and H. Then where Here are three comments on this theorem.2) implies that Thus (M. Theorem 1.2) of A deflates the problem in the usual sense. Specifically. for computational purposes. • The construction of Theorem 1.. For numerical reasons. Chapter 2) we essentially used the eigenspace corresponding to a pair of complex conjugate eigenvalues to deflate the matrix in question. • Thus a matrix A of order n with distinct eigenvalues has 2n distinct eigenspaces — one corresponding to each subset of A( A). • The similarity transformation (1. Let(i. X) bean orthonormaleigenpairofA andlet(X Y) be unitary. then for some matrices L. The proof of the following theorem is left as an exercise. can be treated as separate eigenvalue problems. then y is a left eigenspace of A corresponding to the eigenvalues not associated with X. it is preferable to use unitary transformations to deflate an eigenspace. which.2 is not the most general way to deflate an eigenspace. M.6.1. however. Deflation In establishing the existence of the real Schur form (Theorem 3. 1. This is a special case of a general technique.e. Otherwise put: If X is an eigenspace of A and y is the orthogonal complement of X. if A has multiple eigenvalues. the members of C.SEC. We call y the COMPLEMENTARY LEFT EIGENSPACE OF X. The spectrum of A is the union of spectra of L and M. the eigenspaces of A are not as easily described. if X is any basis for an eigenspace of A and (X Y)is nonsingular. ElGENSPACES 243 Hence the column space of U\ is an eigenspace of A whose eigenvalues are the diagonal elements of the triangular matrix TH —i. We will return to this point in the next subsection. See the discussion on page 10. Y) is a left eigenpair of A.

X\) be a simple orthonormal eigenpair of A and let(Xi. Unfortunately. disjoint from the other eigenvalues of A. Block diagonalization and spectral representations We turn now to the existence of left eigenspaces corresponding to a right eigenspace. Then X is a SIMPLE EIGENSPACE of A if In other words.7. Let (L\.2. then it has two linearly independent eigenvectors.8. counting multiplicities.2). Computationally. an eigenspace is simple if its eigenvalues are. SIMPLE EIGENSPACES Definition We have seen (Theorem 1. Then there is a matrix Q satisfying the Sylvester equation such that if we set where then . then there is an eigenspace X of A whose eigenvalues are the members of L. Hence we make the following definition.5) that if £ is a subset of the spectrum of A. Definition 1. it is impossible to pin down an object that is not unique. We will approach the problem by way of block diagonalization. EIGENSPACES 1. The problem is not as simple as the existence of left eigenvectors.Y<2) be unitary so that (Theorem 1. and any nonzero vector in the plane spanned by these two vectors spans a one-dimensional eigenspace corresponding to A. For example. Let X bean eigenspace of A with eigenvalues C. since we cannot use determinantal arguments as in the second comment following Theorem 1.244 CHAPTER 4. this eigen space need not be unique. Chapter 1. if A is a double eigenvalue of A that is not defective. Theorem 1.2. The above example suggests that the nonuniqueness in eigenspaces results from it not having all the copies of a multiple eigenvalue.

and we call it the complementary right eigenspace.to(fc-fl)x(fc+l) block diagonal form. Chapter 1. Combining these facts with the observation that the orthogonal complement of 7l(X\) is the complementary left eigenspace of A. for example.7) is a spectral representation of A is a convenient shorthand for (1. • Theorem 1. • If we write the second equation in (1. We will return to this point later when we consider canonical angles between subspaces (see Corollary 2. The formulas (1. in which the first Vol II: Eigensystems . If. and LI by S~lLiS. then there is a left eigenvector y corresponding to x. to reduce A to block diagonal form.12).6) we can write We will call this formula a spectral representation of A. YI ) is a left eigenpair of A.5). Saying that (1. However.8 uses a single simple eigenspace to reduce a matrix to 2x2 block diagonal form. which can be normalized so that yHx — I [see (3. In establishing the spectral representation we assumed that Xi and Y% were orthonormal. The eigenbases in a spectral representation are not unique. then we may replace X\ by XiS.4) and (1. an obvious induction shows that if we have k simple eigenspaces then we can reduce v4. so that 7£( YI) is the left eigenspace corresponding to Tl(Xi).6) in the form we see that AX? = X^L^ so that (L<2. • From (1.6). We say that the bases Xi and YI are biorthogonal. we have the following corollary. Then X has a corresponding left eigenspace and a complementary right eigenspace. • Here are some observations on this theorem. YI by Yi5~ H . apply Theorem 1. Likewise (Li.5) result from accumulating the transformation in the successive reductions to block triangular form and block diagonal form.X<2) is an eigenpair of A. which gives the formula for X^ in (1. We will call any such spectral representation a standard representation. The fact that the eigenbasis Y\ for the left eigenspace corresponding to Xi is normalized so that Y^Xi = I generalizes this result. Let X be a simple eigenspace of A.SEC. EIGENSPACES 245 Proof. Corollary 1. Chapter 1]. S is nonsingular.18. The eigenspace 7Z(X2) is complementary to the eigenspace 1Z(Xi) in C n .3). The orthogonal complement ofX is the complementary left eigenspace.9. Beginning with (1. For example. x) is a simple eigenpair of A. • If (A. 1.5).

246 CHAPTER 4. Since L has only zero eigenvalues it is nilpotent (Theorem 1. it can be decomposed by diagonalization into eigenspaces corresponding to its individual eigenvalues. M). Thus X is block diagonal. it is nonsingular. and the spaces X and X are the same. F2i = 0. EIGENSPACES k blocks are eigenblocks of the eigenspaces and the last eigenblock is the eigenblock of the complement of their union. . If the latter are unique. which without loss of generality we may assume to be zero. Now let X be a second eigenspace corresponding to the eigenvalue zero. Then Computing the (2. M )X — diag(i.8) to get Since MkXM is nonsingular.2)-block of this relation. the result is not easy to prove. Let Y = X~l. Let X be a simple eigenspace corresponding to a single eigenvalue. Then we can diagonalize A as in Theorem 1. M). Since diag(Z/. For if a simple eigenspace has more than one eigenvalue.7). we get Since M is nonsingular. In analogy with (1. M).0. so are X^ and ¥22 • Now compute the (1. Chapter 1).13. M) are both similar to A. It will be sufficient to prove the uniqueness for a simple eigenspace corresponding to a single eigenvalue.2)-block of (1.8 to get the matrix diag(Z. We can then use this eigenspace to diagonalize A to get diag(Z/. Let Jf~1diag(Z. they are similar to each other. there is a corresponding spectral representation of the form Uniqueness of simple eigenspaces We turn now to the question of whether a simple eigenspace is uniquely determined by its eigenvalues. Although it seems obvious that it is. Similarly Yi2 . then so is the original space. where again L is nilpotent and M is nonsingular. and write the similarity transformation in the partitioned form Let k be an integer such that Lk = Lk — 0. Since M cannot have zero eigenvalues. M) and diag(Z/.

The generalization of the notion of eigenpair as well as the terms "eigenbasis" and "eigenblock" (Definition 1. it is easy to verify that an equation that corresponds to our reduction to block diagonal form. which collapses any subspace into {0}. it can be shown that the spectral projection corresponding to a set £ of eigenvalues is given by the complex integral where F£ is a closed path in the complex plane that contains £ and no other eigenvalues. Such matrices project a vector onto its column space along its null space. Spectral projectors can be used to decompose a matrix without transforming it. A trivial reason for preferring "eigenspace" over "invariant subspace" is that the latter need not be truly invariant under multiplication by A — an extreme example is A = 0. For a systematic development of this approach see [146]. Spectral projections From the spectral representation we can form the matrices Pi = XiY^. Uniqueness of simple eigenspaces That a simple eigenspace is determined by its eigenvalues seems to be taken for granted in the literature on matrix computations. A more substantial reason is that we can use the same terminology for generalized eigenvalue problems. It is easily verified that Pi is idempotent—i. NOTES AND REFERENCES Eigenspaces 247 Although the term "eigenspace" can be found in Kato's survey of perturbation theory [146]. Vol II: Eigensystems . 2. The proof given here is new. Pf = Pi. such spaces are usually called invariant subspaces. An alternative approach is to use the resolvent defined by For example.e.3) are new.SEC. PERTURBATION THEORY 1. The resolvent Our approach to eigenspaces through block triangularization and diagonalization reflects the way we compute with eigenspaces.3. For example. where there is no one matrix to operate on the subspace. and for that reason the Pi are called spectral projectors. For more on spectral projections see [146]..

Hence C = YHX = WUCV and C are unitarily equivalent and have the same singular values. Much of this theory parallels the perturbation theory in §3. and consider the matrix C = Y^X. This justifies the following definition. We then turn to the analysis of residuals. Definitions Let X and y be subspaces of the same dimension. y) for the ith canonical angle in descending order and define We will also write Q(X. EIGENSPACES 2.1.H2/|/(||z||2||t/2||) [see(3. For if X and y are other bases. Let X and Y be orthonormal bases for X and y.248 CHAPTER 4.1] and can be regarded as cosines of angles.1. Finally. where V = XHX and W = YEY are unitary. Chapter 1.. Let X and y be subspaces ofCn of dimension p and letX and Y be orthonormal bases forX and y. In §2. we can write X = XV and Y = YW. we begin with preliminary material on the canonical angles between subspaces. To avoid certain degenerate cases. y). however.3 we use these residual bounds to derive a perturbation theory for simple eigenpairs. Definition 2. We have Hence all the singular values of C lie in the interval [0. We write Oi(X. in §2. Y) for 6(*. Chapter 1].11).5. PERTURBATION THEORY In this subsection we will treat the perturbation of eigenspaces and their associated eigenblocks. CANONICAL ANGLES We have defined the angle between two nonzero vectors as the angle whose cosine is |a. In this subsection we extend this definition to subspaces in C n .4 we present some residual bounds for Hermitian matrices. 2. Specifically. For a more complete development of this subject see §1:4. . we will first present the analysis of residuals and then go onto perturbation theory proper. These angles are independent of the choice of bases for X and y. because of the important role played by residuals in algorithms for large problems. Then the CANONICAL ANGLES between X and y are the numbers where 7.are the singular values of YUX. we will restrict the dimensions of our subspaces to be not greater than ^.

if the canonical angle is small.1. However.2. 2. see the notes and references. For otherwise there would be a vector # in A" that is orthogonal to every vector in y and the Vol II: Eigensystems . Let VH(CHC)V = T2 be the spectral decomposition of CHC. • In this chapter we will use Theorem 2. The largest such angle 9(x) is the largest canonical angle.12. Specifically. Chapter 1. this procedure will give inaccurate results. IfO < 10~8. and we will conclude that 9 = 0. the largest canonical angle has the following characterization: Thus for each x in X we look for the vector y in y that makes the smallest angle 9(x) with x. PERTURBATION THEORY 249 Geometrically.SEC.2 as a mathematical device for bounding canonical angles But it also provides the means to compute the sines of canonical angles numerically. for small 6. Subspaces of unequal dimensions We will occasionally need to compare subspaces X and y of differing dimensions. Let Since this matrix is orthonormal. A cure for this problem is to compute the sine of the canonical angles. Proof. provides the wherewithal. and let Y_\_ be an orthonormal basis for the orthogonal complement ofy. Hence the singular values of S — Y±X are the sines of the canonical angles between X and y. cos 9 = 1-|02. The statement that X is near y makes sense only when dim(A') < dim(J^). The following theorem. For details. Let X and Y be orthonormal bases for X and y. Then the singular values of Y^X are the sines of the canonical angles between X and y.7?—the squares of the sines of the canonical angles. Theorem 2. which is a generalization of Lemma 3. then cos 9 will evaluate to 1 in IEEE double-precision arithmetic. the diagonal elements of F are the cosines of the canonical angles between X and y. Computing canonical angles In principle we could compute canonical angles from the characterization in Definition 2. Since the eigenvalues of CHC are the squares of the singular values of C. But It follows that E2 is diagonal and its diagonal elements are 1 .

The squares of the nonzero singular values of Q are the nonsingular eigenvalues of Q^Q.2. by Theorem 2. When dirn(A') < dim(3. LetOi denote the nonzero canonical angles between 7Z(X) andlZ(Z).1. Then and Proof. These canonical angles can be calculated as in Theorem 2. and the squares of the nonzero singular values of Z are the nonzero eigenvalues of . where we will have to know the relations between the singular values of Q and X -f YQ. and the norm at the minimum is \\x_i ||2.250 CHAPTER 4. Let (Y Y±) be unitary with 7£(F) = y. Let X and Y be ortho-normal matrices with XHY = 0 and let Z = X + Y Q. In this case. EIGENSPACES angle between that vector and y could only be ^. Let o~i and ^ denote the nonzero singular values ofZ and Q arranged in descending order.). Theorem 2. is sin Z(z. Then Proof. Hence This norm is clearly minimized when y — x. Let y^y and consider the transformed vector In this new coordinate system we must have y± — 0. The following theorem contains a useful characterization of the the angle between a vector and a subspace. The following theorem does the job. the number of canonical angles will be equal to the dimension of X.3. we can define canonical angles as in Definition 2. also arranged in descending order. • Combinations of subspaces In Theorem 1. Let x be a vector with \\x\\2 — 1 and let y be a subspace. y).2.4.8 we expressed the left eigenspace corresponding to a simple eigenspace in the form X + YQ. We will meet such expressions later. which. Theorem 2.

Finally.1).8.SEC. Chapter 2. It is natural to choose this block to make the residual R = AX . Let X bean orthonormal basis fora simple eigenspace X of A and let Y be a basis for the corresponding left eigenspace y of A normalized so that YHX = I. If Rk is zero. converge to an eigenspace of A.XL as small as possible. The relation tan 0t = <r. then (Lk. X) as an approximate eigenpair. we obtain the following corollary. Thus the cosines of the canonical angles are the singular values of which by the above observation are the reciprocals of the singular values of Z. Optimal residuals Most of our algorithms will furnish an approximate eigenbasis X but will allow us the freedom to choose the eigenblock. Hence cos &i — (T~l. where Lk is suitably chosen. Vol II: Eigensystems . Xk) is an eigenpair of A.= 0{. First we will show how to compute an optimal block L — a generalization of Theorem 1. it is hoped. or sec 0. One such basis is where (/ + QHQ) 2 is the inverse of the positive definite square root of / + QHQ. Chapter 2. which generalizes (3.3. we will develop residual bounds for the error in (Z. where Z is any orthonormal basis for K(Z). Then the singular values ofY are the secants of the canonical angles between X and y. RESIDUAL ANALYSIS Most methods for computing eigenspaces of a matrix A produce a sequence Xk of bases whose spans. It therefore seems reasonable to stop when Rk is sufficiently small.4. In this subsection we will consider the implications of a small residual.12).. We will then show that a small residual implies a small backward error— a generalization of Theorem 1. the only way we have for determining convergence is to look at the residual Rk = AXk — XkLk. Chapter 1. • If we combine this theorem with Theorem 1. now follows from (2. PERTURBATION THEORY Hence 251 By definition the cosines of the canonical angles between K(X) and 7£( Z) are the singular values of XHZ. 2.5. We will actually prove a little more. The following theorem shows how to make this choice. 2. Corollary 2. In particular.2. In practice.

Moreover. where we consider the Rayleigh-Ritz method. . then the Rayleigh quotient is an eigenblock.252 CHAPTER 4. by Theorem 1. the proof for S being similar.2 we may expect that if u is an eigenvector of L then Xu will be near an eigenvector of A. LetR = AX-XLandSH = XHA-LXH. Then the quantity X1 A X is a RAYLEIGH QUOTIENT of A.suggests the following definition. Consequently. if X is near an eigenbasis of A. Chapter 2.2. but in fact the division is implicit in X1. Then (Y^X)'1 YH is a left inverse of X and (YHX)-1YH AX is a Rayleigh quotient. Let X be of full column rank and letX1 be a left inverse ofX. If X is an eigenbasis of A. We will prove the result for J?. Definition 2. Let(X XL) beunitary. EIGENSPACES Theorem 2. We will give substance to these expectations in §4.6. To make it explicit.5. It may seem odd to have a quotientless Rayleigh quotient. Theorem 1.7. let X be of full rank and let YHX be nonsingular. we may expect the Rayleigh quotient to contain good approximations to the eigenvalues of A. Then in any unitarily invariant norm \\R\\ and \\S\\ are minimized when in which case Proof. Set Then The Rayleigh quotient The quantity L = XHAX is a generalization of the scalar Rayleigh quotient of Definition 1. The matrix X^AX is not the most general matrix that will reproduce an eigenblock when X is an eigenbasis.

x) is an exact eigenpair of a perturbed matrix A + E.3. but for it not to be (£. Proof.SEC. • The import of this theorem is discussed just after Theorem 1. X) must be ill conditioned—that is. Chapter 2. sensitive to small changes in A + E.6). then (L. then (A. The following theorem generalizes this result. Chapter 2. we saw that if r = Ax . The singular values of this matrix are ±<T. the 2-norm of this matrix is the largest singular value of R. if the residual is small. which can be effected by observing that in the notation of the proof of Theorem 2. where the az are the singular values of G.Ax. where E is of a size with r.3.6. The only tricky point is the verification of (2. Theorem 2. This does not guarantee that (I. Since the singular values of G and R are the same. Thus basing convergence on the size of the residual guarantees that wellconditioned eigenpairs will be determined accurately. X) is a good approximation to the eigenpair. 2. Briefly. VolII: Eigensystems . and the square of the Frobenius norm is twice the sum of squares of the singular values of R. PERTURBATION THEORY Backward error 253 In Theorem 1. The proof is a generalization of the proof of Theorem 1.. Chapter 2. Moreover. X) is an exact eigenpair of a sightly perturbed matrix A -f E. Let X be orthonormal and let Then there is a matrix such that an/"f If A is Hermitian and then there is a Hermitian matrix satisfy ing (2 A}.3.8.

theright-handside of (2. Consequently. .254 CHAPTER 4. We will see in §2.10 in Chapter 1 with Theorem 2. A(Z) C A(A+ E).6 we know that in any unitarily in variant norm ||-||.\\G\\.. Specifically. Specifically.l\ be the eigenvalues of L. We will derive bounds by nudging it into full block triangularity. Let i\. . Corollary 2.4 that if we know more about the eigensystem of A then we can get tighter residual bounds. Let (X X±) be unitary and let Then by Theorem 2. cannot be separated by more than \\E\\2.8) is almost block triangular. which can be written in the form .9.. Since the eigenvalues of A + E and A. Then there are eigenvalues A 9 1 . arranged in descending order. the question remains of how accurate the approximate eigenspace is. consider the similarity transformation For the (2.. . ||J?|| = ||^4^— XL\\ .1) block of the right-hand side to be zero. EIGENSPACES Residual bounds for eigenvalues of Hermitian matrices By combining Theorems 3. Block deflation Although the small backward error associated with a small residual is a powerful reason for using the residual norm as a convergence criteria.8 we can obtain residual bounds on the eigenvalues of a Hermitian matrix. . we have the following corollary.8 and 3. .PHP. We will derive such bounds by a process of block deflation. we must have PL — MP = G . if R is small.. A.7) can be eliminated (see the notes and references). of A such that and The factor \/2 in (2.

in terms of the original matrix A in (2. However. later. The theorem shows that (L + HP. Consequently. if || • || is the spectral norm. X) is a simple eigenpair of B with complementary eigenblock M .9) is nonlinear in P and has no closed form solution. Theorem 2. Let || • || denote a consistent norm.8).4 the singular values of P are the tangents of the canonical angles between X and X. under appropriate conditions its solution can be shown to exist and can be bounded.PH. Let Then ( X .10. In the matrix let where S is defined by If there is a unique matrix P satisfying PL — MP = G — PHP and such that The spectra of L + HP and M — PH are disjoint. PERTURBATION THEORY 255 where S is the Sylvester operator Equation (2. the largest canonical angle Q\ between X and X satisfies Vol II: Eigensystems . We will first discuss this theorem in terms of the matrix B and then. 2. By Theorem 2.SEC. L) is an approximate eigenpair of B with residual norm \\G\\. For the proof of this theorem see the notes and references.

£ < ) then and M = diag(MI . is a generalization of the quantity sep(A. M) = US"1]]"1. where S is the Sylvester operator defined by (2. It is useful in assessing the separation of diagonalizable matrices. .sep(£. it is nonzero if and only if the spectra of L and M are distinct.15.e. the larger the canonical angle between X and X. .11. Chapter 1. Finally. Ifsep(Z. It says that small changes in L and M make like changes in sep. 1. Item 1 provides an alternative definition of sep.t It is a (not entirely trivial) exercise in the manipulation of norms to establish these properties. M + F) . Thus the poorer the separation. |sep(£ + £.256 The quantity sep CHAPTER 4. then sep(£. when we derive perturbation bounds for eigenspaces.Z/. Item 2 is a continuity property. .17. Then the following statements are true. 3. \\A\\ = |||A|||. The quantity sep has a number of useful properties.10). Chapter 1. If L and M are Hermitian. sep(i. Ifthenorm \\ • \\ is absolute (i.11). which appears in (2. 2. M} can be much smaller than the physical separation. M)| < \\E\\ + \\F\\. so that sep is a lower bound for the physical separation 6 of the spectra of L and M. as Example 3. EIGENSPACES The quantity sep(. Items 3 and 4 exhibit sep for matrices of special form.11. . shows. This fact will be important later. By Theorem 1. . N) introduced in Theorem 3. Chapter 1. item 5 shows how far sep may be degraded under similarity transformations. .M) > 0. . Theorem 2. 4. . then in the Frobenius norm 5. Let sep be defined by a consistent norm \\-\\. By Corollary 1. which are stated without proof in the following theorem..16. M m ). . M).and L = diag(ii . Unfortunately. Chapter 1.

We shall see later that we can approximate a condition number for an eigenblock.12. Let X be an approximate eigenbasis for A and let(X X±_) be unitary. or the quantity sep(£. we cannot compute M. 2. PERTURBATION THEORY 257 Residual bounds We now return to our original matrix A and derive residual bounds for the approximate subspaces. If we now apply Theorem 2.6 \\R\\ = \\G\\ and \\S\\ = ||#||.14) follows from Theorem 2. M).10 accumulating transformations. Vol II: Eigensystems . this theorem does exactly what we need.X) approximates an actual eigenpair of the matrix in question. In large eigenproblems.X^AX± be the Rayleigh quotients corresponding to X and X±.13) follows on taking the norm of the difference of L and L + HP. X) of A satisfying and Proof. Theorem 2.12) is satisfied then there is an eigenblock of A of the form (L + HP. R. one must have additional information about the disposition of the eigenvalues to use the theorem. Although. It provides conditions under which a purported eigenpair (L. and S easily enough. where The bound (2. we see that if (2. • From a mathematical standpoint. but when it comes to eigenspaces of very large matrices. The bound (2. however. Let L = XHAX and M . which is large and generally dense. the size n of the matrix will be very much greater than the number k of columns of the matrix X.4.SEC. we can compute L. Write Then by Theorem 2. and let If in any unitarily invariant norm then there is a simple eigenpair (£. provided we have an approximation to the corresponding left eigenbasis. X + A^P).

The spectra ofL and M are disjoint Proof. we can do this by regarding the exact eigenspace of A as an approximate eigenspace of A and applying Theorem 2.11 the quantity 6 is a lower bound on sep(L+Fu. . PERTURBATION THEORY CHAPTER 4. right and left eigenpairs of A.16). By item 2 of Theorem 2.10 guarantees the existence of a matrix P satisfying (2.258 2.19) and (2. we obtain better bounds by proceeding indirectly through a spectral decomposition of A. • We will now examine the consequences of this theorem. we wish to determine when there is a nearby eigenspace of a perturbation A = A + E of A and to bound the angles between the two spaces. In principle. In practice.Hence if (2.3. In this subsection we will treat the problem of bounding the perturbation of an exact eigenspace when the matrix changes.12.13. The perturbation theorem Given a simple eigenspace of a matrix A. M-\^22). simple. Theorem 2.17) is satisfied.18) that block triangularizes the matrix (2.20) follow upon accumulating the transformations that reduce A to block triangular form. The expressions (2. Theorem 2. Let A have the spectral representation Let A = A + E and set Let || • || be a consistent family of norms and set If then there is a matrix P satisfying such that and are complementary. EIGENSPACES In the last subsection we considered the problem of bounding the error in an approximate eigenspace.

In the 2-norm. no single choice of spectral representation that is suitable for all applications. M. There is. 245).10) there is a matrix Q such that The singular values ofQ are the tangents of the canonical angles between X^ = H(Xi) andyi = n(Yi) (Theorem 2. and Fij also change. these bounds can be conveniently written in the terms of 0xy defined in (2. We will set Bounds in terms of E The bounds of Theorem 2. The variants do not say the same things. In the remainder of this subsection we will therefore make the following assumptions. What we are likely to have is a bound on some norm oi E. since spectral representations are only partially unique (see p.21). In Theorem 2. we will seldom have access to these matrices. the matrices Xi and ¥2 are assumed to be orthonormal. In consequence (Theorem 2.18) becomes Thus how X\ and y\ are situated with respect to one another affects the condition for the existence of the perturbed eigenspace. but the one that comes closest is the the standard representation in which X\ and Y2 are orthonormal. In practice.13.SEC. With these bounds. 2. we may redefine and replace the condition (2.13 are cast in terms of the matrices Fij because they are the primary quantities in the proof of the theorem.13) actually represents a continuum of results. however. unfortunately. In this case we must weaken the bounds by replacing \\Fij\\ by ||y^H||||£'||||Xj||. Vol II: Eigensystems . PERTURBATION THEORY Choice of subspaces 259 Theorem (2.4). Specifically. but it does not affect the final bound on \P\\2.17) by so that the bound (2. since the matrices L.

Angles between the subspaces In the relation Xi = Xi -f X^P. It is natural then to ask how this measure relates to the canonical angles between the two spaces.24).260 The Rayleigh quotient and the condition of L Let CHAPTER 4. Unfortunately. the size of P is a measure of the deviation of X\ = K(Xi) from Xi. the matrix L is a Rayleigh quotient in the sense of Definition 2. If Xi were orthonormal and Y^Xi were zero. from the above bounds Thus the difference between the Rayleigh quotient and the exact representation goes to zero as the square of ||£"||2We also have Hence as E —» 0 Thus the secant of the largest canonical angle between X^ and y\ is a condition number for the eigenblock L that determines how the error in E propagates to the representation L.21) we have .7. But with a little manipulation we can find a basis for X\ to which Theorem 2. Chapter 1. EIGENSPACES Since Y^Xi = /. we could apply Theorem 2.4 to obtain this relation. In fact. From (2.4 does apply. Thus we have the following generalization of the assertion (3. This Rayleigh quotient is an especially good approximation to the eigenblock of A. neither condition holds.

On taking norms and using (2. PERTURBATION THEORY 261 the last inequality following from (2.27) we have the following corollary. 6 itself approaches sep(i. M) is a condition number for X\. for E small enough the singular values of P are approximations to the tangents of the canonical angles between X\ and X\. it follows from Theorem 2.28). M)""1 is a condition number for the eigenspace K(Xi). • The denominator of (2. to obtain the following bound: However. Corollary 2. which is necessarily positive.SEC. so that when E is small the bound is effectively 11E \ \ 2 / 8. Consequently. Then where Here are two comments on this corollary. Vol II: Eigensystems .4 that the tangents of the canonical angles between X\ and X\ are the singular values of P(I + QP}~1. In summary: Let A have the standard spectral representation X\LY^ + X^MY^. In fact. since for small 9 we have 6 = tan 0. M) with sep(L.14. approaches 6 as E —»• 0.13.QP is nonsingular and We can therefore write Now the subspace spanned by Xi(I+QP) l is still X\.22) (remember sec2 0xy = 1 + tan2 #xy). the singular values approximate the angles themselves. AI) be the largest canonical angle between X\ and X\. Then in the 2-norm the quantity sep(jL. Thus sep(Z. M). M) we can generalize the proof of Theorem 3. Moreover. It follows that / -}.23) and (2. Since X\ and Y2 are orthonormal and Y^Xi = 0. this approach has the disadvantage that it assumes rather than proves the existence of X\. —»• 0 the matrix P(I + QP)~l —> P. • If we are willing to replace sep(L. • As E. Chapter 1. Let 0xy be the largest canonical angle between X\ and 3^i and let Q(X\. 2.

If then The separation hypothesis (2.15.4.A(M)| > 0. RESIDUAL BOUNDS FOR HERMITIAN MATRICES The residual bound theory of §2. The proofs.2 addresses two logically distinct problems: the existence of certain eigenspaces of A and the bounding of the error in those eigenspaces. suppose that the Hermitian matrix A has the form and that 6 = min |A(£) . For Hermitian matrices we can often say a priori that the required eigenspaces exist. EIGENSPACES 2. Let the orthonormal matrix Z be of the same dimensions as X. As usual. will be omitted. the eigenvalues of A must divide into two groups—those that are within ||G||2 of A(L) and those that are within ||<7||2 of A(M). Here the separation is between the certain eigenvalues of A (those of L) and eigenvalues associated with the . which are not needed in this work. Chapter 1. For example. There we required that the spectra of two Rayleigh quotients corresponding to the approximate subspaces be disjoint. and let where N is Hermitian.12. The fact that we know these eigenspaces exist before we start enables us to get direct bounds that are simpler and more useful than the ones developed above. The eigenspaces corresponding to these two groups of eigenvalues are candidates for bounding. Residual bounds for eigenspaces We begin with the simpler of the two results on eigenspaces treated here. Let the Hermitian matrix A have the spectral representation where (X Y) is unitary. y) will denote the the diagonal matrix of canonical angles between the subspaces X and y. Theorem 2. If ||G||2 is less than f. The purpose of this subsection is to state theorems containing such direct residual bounds.262 CHAPTER 4. then by Theorem 3. ©(A'.30) is different from that required by the residual bounds of Theorem 2.8.

9 we showed that if R — AX — XL is small then there are eigenvalues of A lying near those of L.30) requires only that the eigenvalues values of M and N be decently separated.The following shows that if we can specify a separation between the eigenvalues of L and a complementary set of eigenvalues of A then we can reduce the bound to be of order \\R\\2Theorem 2.SEC. Theorem 2. For individual angles the bound will generally be an overestimate.17.. and the final bound is smaller—the benefits of being Hermitian.15 than that of Theorem 2.16. the square root of the sum of squares of the sines of the canonical angles. Let (Z Z_\_] be unitary. Moreover. In the notation of Theorem 2.32) bounds the largest canonical angle between the two spaces. The natural norm to use is the 2-norm. We can resolve this problem by restricting how the eigenvalues of M and N lie with respect to one another. Let Let VolII: Eigensystems . Residual bounds for eigenvalues In Corollary 2. they are free to interlace with eigenvalues of M lying between those of N and vice versa. the separation criterion is less stringent in Theorem 2. where Z has k columns.e.12. 2. The price we pay is that we can only bound the Frobenius norm of sin 0(7£(JQ. The bound is now in an arbitrary unitarily invariant norm. The condition (2. Otherwise. 7£(Z)) — i. The bound is of order \\R\\. in which case (2. PERTURBATION THEORY 263 approximating subspace (those of N). The theorem reflects an important tradeoff in bounds of this kind.15 suppose that and for some 8 > 0 Then for any unitarily invariant norm The conditions on the spectra of N and M say that the spectrum of N lies in an interval that is surrounded at a distance of 6 by the spectrum of M.

264 CHAPTER 4.1973] stated the result explicitly for nonHermitian matrices and gave the formula (2. the eigenvalues £ and A(i) can be paired so that they are within O(||J?||2) of each other.6 for Hermitian matrices is due to Kahan [142. However. Ch. 2.1875]. Stewart [252.3. then for R small enough. .33) by \\R\\%/(6 . In this case Yi is very large. Canonical angle The idea of canonical angles between subspaces goes back to Jordan [138. and has often been rediscovered.2) for the optimal residual R. 264] proved the result for vectors.£_L|} = <!>. EIGENSPACES If then The theorem basically says that if A has a set £ of eigenvalues that are close to A(£) and that are removed from A(M) by 6.31) on the spectra. Wilkinson [300. 1967] (his proof works for non-Hermitian matrices). In our applications n will be very large compared with the common dimension k of the subspaces. I].1. NOTES AND REFERENCES General references For general references on perturbation theory see §3. This theorem has the great advantage that we do not have to put restrictions such as (2. Chapter 1. Earlier. and it is impossible to compute Y±X directly.2|| J2||2).2). These transformations can then be applied to X to give Y±X in the last n—k rows of the result. given a basis for y one can compute an orthogonal matrix whose first k columns span y as a product of k Householder transformations (Algorithm 1:4. But if we know that (in the notation of the last paragraph) there is a set C± of eigenvalues of A with rnin{|A(M) .R|| < 6 and righthand side of (2. The definitive paper on computing canonical angles is by Bjorck and Golub [26]. For a more complete treatment of this subject see Volume I of this series or [269.1965. who recommend computing the sines of the angles. p. Optimal residuals and the Rayleigh quotient Theorem 2.5. then we can write Hence the condition \\R\\2 < # can be replaced by the condition 3||. in sparse applications it has the disadvantage that we do not know N. However.

who prove much more in their ground-breaking paper. Theorem FV. The result can be generalized to unitarily invariant norms [269.24) is accurate up to terms of order \\E\\2 shows that not all Rayleigh quotients are equal.7). The residual bounds for eigenvalues are due to Mathias [174] and improve earlier bounds by Stewart [261]. Parlett. A treatment of other theorems of this kind may be found in [206].1967]. Residual eigenvalue bounds for Hermitian matrices Corollary 2. 244]. who showed that the factor \/2 can be removed from the right-hand side of (2.9 is due to Kahan [142.4. which used an orthogonal transformation. in which the transforming matrix is triangular.XL and 5H = YUA ~ Z H y H it is natural to ask if there is a perturbation that makes X a right eigenbasis and Y a left eigenbasis. The approach taken here.1970].29) is due to Ipsen [126]. The fact that the Rayleigh quotient (2. Residual bounds for eigenspaces and eigenvalues of Hermitian matrices The residual bounds for eigenspaces are due to Davis and Kahan [54. a result of Dennis and More [63. For more on the history of such bounds see [269. Perturbation theory The passage from residual bounds to perturbation bounds was given in [254]. is accurate only up to terms of order \\E\\.16) is an off-diagonal perturbation of the block diagonal matrix consisting of the two Rayleigh quotients Y^AXi and F2H AX2The bound (2. and of course they have trivial variants that deal with left eigenspaces. is simpler than the original.14]. Kahan. 3.11]). Vol II: Eigensystems . If we have small right and left residuals R = AX . for example. The theorems stated here all have to do with approximate right eigenspaces. Alternatively. 1977] shows that (2.SEC. we can regard the greater accuracy as due to the fact that the matrix (2. who also introduced the quantity sep. KRYLOV SUBSPACES Backward error 265 For Hermitian matrices in the vector case. It is worth noting we can go the other way.5) is optimal (also see [64. 1971]. The greater accuracy of Y^AX\ is due to the fact that we use approximations to both the left and right eigenbases in its definition. Given a residual. and Jiang [144] produce perturbations that are optimal in the spectral and Frobenius norms. Block deflation and residual bounds The use of block triangularization to derive residual bounds is due to Stewart [250. we can generate a backward error and apply a perturbation theorem to get a residual bound. Theorem 9. p. The Rayleigh quotient X^AXi.

which amounts to throwing away the information contained in the previously generated vectors. Chapter 2) is based on the observation that if A has a dominant eigenpair then under mild restrictions on u the vectors Aku produce increasingly accurate approximations to the dominant eigenvector. We begin with a definition.952. as it must be. Then the sequence is a KRYLOV SEQUENCE BASED ON A AND u. 3.2. The second stage extracts approximations to the eigenvectors from the subspace generated in the first stage. Let A be of order n and let u ^ 0 be an n vector. reducing the error by eight orders of magnitude.95.. Finally. they may be used to restart the first stage. It turns out that this information is valuable.2.. the vectors uj. KRYLOV SUBSPACES Most algorithms for solving large eigenproblems proceed in two stages. However. EIGENSPACES 3... The convergence is much more rapid... In this section we will be concerned with the most widely used method for computing subspaces containing eigenvectors—the method of Krylov sequences. at each step the power method considers only the single vector Aku. We call the matrix . In the first subsection we will introduce Krylov sequences and their associated subspaces. in §3.95. — Aku\ were generated.1.266 CHAPTER 4.. of which more later. Definition 3. as the following example shows. The solid line in Figure 3.) The swift convergence of an approximation to the dominant eigenvector in the subspace spanned by ui. (The dash-dot line shows the convergence to the subdominant eigenvector. We will then consider the accuracy of the approximates produced by the method in §3..1. Ak~lu\ justifies studying such subspaces.. since the ratio of the subdominant eigenvalue to the dominant eigenvalue is 0.1 shows the tangent of the angle between Akui and the dominant eigenvector of A. The first stage generates a subspace containing approximations to the desired eigenvectors or eigenspaces.953. If the approximations are not satisfactory. It is seen that the convergence is slow. . Uk.3 we will introduce block Krylov sequences.95"... The dashed line gives the tangent between the dominant eigenvector and the space spanned by «i. A diagonal matrix A of order 100 was generated with eigenvalues 1. Starting with a random vector ui. KRYLOV SEQUENCES AND KRYLOV SPACES Introduction and definition The power method (§ 1.. Example 3... Aui..

Let A and u ^ 0 be given. Then 1. we will drop them in our notation for Krylov matrices and subspaces.3. The sequence of Krylov subspaces satisfies and 2. The space is called the kth KRYLOV SUBSPACE. Vol II: Eigensystems . KRYLOV SUBSPACES 267 Figure 3.1: Power method and Krylov approximations the kth KRYLOV MATRIX. Elementary properties The basic properties of Krylov subspaces are summarized in the following theorem. Thus we may write ICk(u) or simply /C^.SEC. I f t r ^ O . The proof is left as an exercise. When A or u are known from context. Theorem 3. 3.

then v G K. The third says that it is invariant under shifting. Conversely. The polynomial connection Krylov subspaces have an important characterization in terms of matrix polynomials. and In other words the Krylov sequence terminates in the sense that the Krylov vectors after the first one provide no new information. we will say that a Krylov sequence terminates at t if I is the smallest integer such that The following theorem states the basic facts about termination.k(A. EIGENSPACES 4.( A. u) can be written in the form Termination If (A. the fourth says that the space behaves in a predictable way under similarity transformations. if v — p(A)u for any polynomial of degree not exceedingfc—1. Finally. In particular if A is Hermitian or nondefective. The space ICk(A. u] can be written in the form Hence if we define the matrix polynomial p( A) then v = p(A)u. u). we may diagonalize it—a strategy that sometimes simplifies the analysis of Krylov sequences. For any K. then Aku — Xku. More generally. . CHAPTER 4. u) is an eigenpair of A. Thus we have proved the following theorem. then The second item in the above theorem states that a Krylov subspace remains unchanged when either A or u is scaled.4.268 3. By definition any vector v in £*. If W is nonsingular. Theorem 3.

2) follows from (3. Experience has shown that these eigenvectors have eigenvalues that tend to lie on the periphery of the spectrum of the matrix. On the other hand if (3. which is impossible. and write u in the form where a. CONVERGENCE We have seen in Example 3. then for some I < m. As we shall see. it contains exact eigenvectors of A. then so are the members of the sequence w..4 any member of fCk can be written in the form p(A)u.SEC. that the Krylov subspace converges to the vector.5.2) is satisfied. On the other hand. ifu lies in an eigenspace of dimension m.. = x^u. In this subsection we will give mathematical substance to this observation—at least for the Hermitian case. Au. Since p(A)x{ = p(\i)x{. A Krylov sequence terminates based on A and u at I if and only iff. By Theorem 3.£ is an invariant subspace of A. (3. 3..1).. then so that K. n).1 that a sequence of Krylov subspaces can contain increasingly accurate approximations to the certain eigenvectors of the matrix in question..2. if a Krylov sequence terminates.. Am~l u. where p(X) is a polynomial of degree not greater than k. Let u be a starting vector for a Krylov sequence. it stops furnishing information about other eigenvectors. In this case we will say. • Termination is a two-edged sword. Equation (3.1) follows from the fact that £/ C /Q+i. we have Vol II: Eigensystems . KRYLOV SUBSPACES 269 Theorem 3. The general approach In what follows. Now if the sequence terminates at I > m it follows that m = dim(/Q) > dim(/Cm) = dim(A'). then /Q is an eigenspace of A of dimension t... Xi) (i = 1. On the one hand. If u is in an invariant subspace X of dimension ra. we will assume that A is Hermitian and has an orthonormal system of eigenpairs (A. If the sequence terminates at I. the sequence terminates at I. is the smallest integer for which If the Krylov sequence terminates at I. 3. On the other hand. Proof. the sword is especially difficult to wield in the presence of rounding error.. somewhat illogically.

5) does not depend on the scaling of p. In the above notation. assume that . EIGENSPACES Thus if we can find a polynomial such that p(\i) is large compared withp(Aj) (j ^ i). Hence Thus if we can compute as a function of k./C&).3) then |aj|/||w||2 is the cosine of the angle 9 between u and Xi and -y/Sj^i |aj |2/||w||2 is the sine of 0.6. X{) is an upper bound for tanZ(xj.4) that To apply this theorem note that the right-hand side of (3. we will have a bound on the accuracy of approximations to X{ in the sequence of Krylov subspaces. Hence It follows from (3. this problem is too difficult. As it stands. if x^u / 0 andp(Xi) / 0. We can therefore assume that p(Xi) = 1 and write If deg(p) < k — 1. then p(A)u £ fCk and tan /. then p( A)u will be a good approximation to X{. The following theorem quantifies this observation. We begin by observing that if u has the expansion (3.270 CHAPTER 4. then Proof. Theorem 3.(p(A)u. To simplify it.

8). Vol II: Eigensystems . Then the following assertions are true. Chebyshev polynomials The Chebyshev polynomials are defined by Chebyshev polynomials are widely used in numerical analysis and are treated in many sources. we collect them in the following theorem. Thus we are lead to the problem of calculating the bound This is a well-understood problem that has an elegant solution in terms of Chebyshev polynomials. Rather than rederive the basic results. Let Ck(i) be defined by (3. to which we now turn. Then 271 is an upper bound for (3.7. 3. KRYLOV SUBSPACES and that our interest is m the eigenvector x\.6). Theorem 3.SEC.

EIGENSPACES Convergence redux We now return to the problem of bounding tan /. Let theHermitian matrix A have an orthonormal system ofeigenpairs (Af.k and write • As with the power method. but it simplifies—at least asymptotically.10) to compute <7jt. which by (3. Theorem 3. /C^).t Then Three comments.272 CHAPTER 4. • The bound appears to be rather complicated. xi). AI] to [-1. with its eigenvalues ordered so that T. For if 77 is small. However.1].8.7) is equivalent to finding We can apply (3. For k large we can ignore the term with the exponent 1 . provided we change variables to transform the interval [A n .p. In view of (3. we have the following theorem.9). the bound hp. It is easily verified that the change of variables is The value of/^ at AI is Hence the value of a\^ is l/Cfc_i(//i). a nearby subdominant eigenvalue will retard convergence.rnmps .(x\. in this case the bound gives us a bonus.

£3. In Example 3.8 can be used to bound convergence to the eigenvector corresponding to the smallest eigenvalue of A.(t > 2) can be derived by considering a polynomial of the form where A is the Lagrange polynomial that is zero at AI . suppose that Aj = \2 > AS. it is important to keep in mind that the smallest eigenvalues of a matrix—particularly a positive definite matrix—often tend to cluster together.. A.1.5/0.. KRYLOV SUBSPACES 273 The square root increases the influence of rj on the bound. to derive a bound for x 2 . .. The square root of this number is about 0. • By item 2 in Theorem 3. let where q(t) is a polynomial of degree not greater than k — 2 satisfying q( A 2 ) — 1. so that the bound will be unfavorable.1.SEC. Let u be given. so that the bound on the convergence rate is about 0.3..33-10~4. and the bound on the convergence rate is 1/(1 + 2-0. For example.8 can be relaxed.5) that where Bounds for A. consider the matrix in Example 3. the Krylov subspaces of A remain unchanged when A is replaced by —A. 3. There 77 = 0. Hence Theorem 3.. and expand u in the form Vol II: Eigensystems . The general approach can be used to derive bounds for the remaining eigenvectors z 2 .053.23. However.95 = 0.97. the value of 77 corresponding to the smallest eigenvalue is about 0.95 for the power method. To see this._i and one at A. Note thatp(Ai) = 0 andp(A 2 ) = 1-It: tnen follows from (3. Thus the square root effect gives a great improvement over the rate of 0. — For example.69.23) = 0. Multiple eigenvalues The hypothesis AI > A 2 in Theorem 3.

but then they drop off. Simply take a ruler and draw a line from the common initial point that is tangent to the dashed line (at about k = 12). This phenomenon is not accounted for by our theory. The dot-dash line in Figure 3. The same procedure applied to the dot-dash line yields an optimal bound of about 6-10~3. the bounds provide support for the empirical observation that Krylov sequences generate good approximations to exterior eigenvalues. which does not reflect the true behavior of the Krylov approximations. and 0.8 continues to hold when AI is multiple provided we adopt the convention that only one copy of AI is listed in (3. The largest eigenvalue was recomputed .55 to 0. Thus part of the problem is with the mathematical form of our bounds.1 is tan Z(£ 2 . To summarize. The last three convergence ratios are 0. Then This shows that of all the vectors in the two-dimensional invariant subspace that is spanned by xi and #2. Immediately after the first step they range from 0. if we retain only one copy of AI and A2. Actually. the bound (3.5-10"7.9. There are two reasons.22. we may not have a complete system of eigenvectors to aid the analysis. the spaces ICk(A. the graphs show that any exponentially decreasing bound of the form apk must be weak. otiXi + #2X2) were a simple eigenpair of A. The problem is that the theory predicts a convergence ratio of 0. a poor prediction.0-10"2 — once again. The final value is 5. The reason is that the bounds must take into account the worst-case distribution of the eigenvalues.(x\.1 shows that the convergence ratios are steadily decreasing. where the actual value is 1. Non-Hermitian matrices The theory of Krylov subspaces for non-Hermitian matrices is less well developed. u) contain only approximations to ai^i -f 0:2X2.31.0-10"3. but they can underestimate how good the approximations actually are — sometimes by orders of magnitude.1. the geometry of the eigenvalues is generally more complicated. if the matrix is nonnormal.274 CHAPTER 4. Example 3.75. I* is just as though (Ai. EIGENSPACES where the Xi are the normalized eigenvectors corresponding to A ? . Second. We will return to this point in a moment. First. whereas the bound predicts a value of 4.2-10"6./C 20 ).0-10~3. = 25isabout7-10~ 4 .11). Similarly. Assessment of the bounds It is instructive to apply the bounds to the Krylov sequence generated in Example 3. Thus Theorem 3. A normal (actually diagonal) matrix was constructed whose eigenvalues were those of a random complex matrix.67. The bound predicts a value for tan Z. while a glance at the graph in Figure 3. Its value at A.0.12) continues to hold when A! and A 2 are multiple.26. The following example shows that a change in geometry can have a major effect on the rates of convergence. £20) °f 2. which is the best we can expect and compares favorably with the bound 2.

1. The convergence is not much better the power method in Example 3. so that the matrix is nowHermitian. The convergence is swift as in Example 3. Since it converges more slowly.2 shows the tangent of the angle between the kth Krylov subspace and the dominant eigenvector. the eigenvalues from the first experiment were replaced by their absolute values.SEC.1). the eigenvalues were altered so that the ratio of the absolute values of the successive eigenvalues became 0. To vary the geometry.95 white their arguments remained the same. The convergence. In a third experiment.2: Power method and Krylov approximations so that the absolute value of its ratio to the second largest eigenvalue was 0. has exactly the same bounds. Vol II: Eigensystems . The example does have something to say about the convergence bounds from Theorem 3. It is clear that geometry matters and perhaps that exponentially decaying eigenvalues are a good thing. KRYLOV SUBSPACES 275 Figure 3.9. in which the matrix is Hermitian.1.8. The solid line in Figure 3.95 (as in Example 3. It would be foolish to conclude too much from these examples. There is a polynomial connection for non-Hermitian matrices. illustrated by the dashed line. Recall that the bound failed to predict the swift convergence to xi for the matrix from Example 3. is intermediate. The dash-dot line shows how well the Krylov subspace approximates the eigenvector.1. the bounds cannot predict swifter convergence. 3. It turns out that the third experiment in Example 3.

. in general. In other words. . EIGENSPACES Theorem 3. Let where Then Proof. since the geometry of the eigenvalues can be complicated. Suppose. This implies that we cannot generalize the approach via Chebyshev polynomials to defective matrices. Actually. Then for p(L) to be zero. By Theorem 2.6. in which case the bound becomes (after a little manipulation) where A 2 . The theorem also illustrates the difficulties introduced by defectiveness and nonnormality. . an easy task. For more see the notes and references. it is not sufficient to simply make p small at //. This is not. /j ^ A is an eigenvalue of A with a Jordan block of order ra. . it is a continuum of theorems. the polynomialp(r) must have the factor (r . if A is nondefective we can take L to be diagonal and the columns of X to have norm one. What results exist suggest that under some conditions the Krylov sequence will produce good approximations to exterior eigenvectors.276 CHAPTER 4. The theorem suggests that we attempt to choose polynomials p with p(\) = 1 so that p(L) is small. we have This theorem is an analogue of Theorem 3.n)m. for example.10.3 Hence for any polynomial p of degree not greater than k-1 with p(\i) = 1. An are the eigenvalues of A other than A. Let A be a simple eigenvalue of A and let be a spectral representation. since spectral representations are not unique. . For example.

11. Of course. Specifically. BLOCK KRYLOV SEQUENCES We saw above [see (3.2X2. with the eigenvalues ordered so that and assume that the multiplicity of AI is not greater than m. 3. there are two compensating factors (aside from curing the problem of multiple eigenvalues).3. First. say Then so that ICk(A. unless we are unlucky the two approximations will span the required two-dimensional invariant subspace. If VolII: Eigensy stems . Second.U) spanned by the first k members of this sequence is called the kth block Krylov subspace. A cure for this problem is to iterate with a second vector. The foregoing suggests that we generate a block Krylov sequence by starting with a matrix U with linearly independent columns and computing The space JCk(A.Since the sequence in u produces approximations to otiXi -}.0. passing to block Krylov sequences improves the convergence bound.13)] that when it comes to multiple eigenvalues. Let A be Hermit/an and let (Az-. Xi) be a complete system oforthonormal eigenpairs of A. say. For example. KRYLOV SUBSPACES 277 3. it frequently happens that the computation of the product of a matrix with a block U of. it has to be retrieved only once to form AU. Theorem 3. a Krylov sequence has tunnel vision. if AI is double and the generating vector is expanded in the form the Krylov sequence can only produce the approximation a\x\ -f 0:2^2 to a vector in the eigenspace of AI. what we would like to approximate is the entire invariant subspace corresponding to that eigenvalue. v) can approximate the linear combination faxi + /32^2.SEC. Although it requires more storage and work to compute a block Krylov sequence than an ordinary Krylov sequence. ra vectors takes less than ra times the time for a matrix-vector product. if the matrix is stored on disk. as the following theorem shows.

.12).v) C ICk(A. • If the multiplicity of A! is less than m.. . An alternative to block Krylov methods is to deflate converged eigenpairs. we can get convergence rates for the vectors #2. so that initially the bound may be unrealistic. then 77 defined by (3. • Three comments.. so that the eigenvalue following AI is A ra+ j. The two approaches each have their advantages and drawbacks. However.278 is nonsingular and we set then CHAPTER 4. so that the Krylov sequence can converge to a second eigenpair corresponding to a repeated eigenvalue. there is no convergence theory analogous to Theorem 3.8. In general the latter will be smaller. the order constant in the bound istanZ(xi. xm. m) had been deleted.15). it is as if we were working with a matrix in which the pairs (A.8 to get (3. • • • .16) is larger than the value of 77 in Theorem 3. • By the same trick used to derive the bound (3.) not tanZ(xi.11 predicts faster convergence for the block method. x3-> — The argument concerning multiple eigenvalues suggests that we can use block Krylov sequences to good effect with non-Hermitian matrices. By construction v€K(U).11 for non-Hermitian block Krylov sequences.. Hence JCk(A.. However. But the reader should keep in mind that there are important block variants of many of the algorithms we discuss here. m)..t. xt-) (i = 1. We may now apply Theorem 3. Since deflation is vital in other ways to many algorithms.U)so that From the definition of v it is easily verified that xfv = 0 (i = 2. EIGENSPACES where Proof. we will stress it in this volume. Hence the Krylov subpace /Cjt(A..{7). Thus Theorem 3.-. v) contains no components along #2..

<^-i such that It can then be shown that the polynomial divides the characteristic polynomial of A. • • • . in which he proposed the algorithm that bears his name (Lanczos himself called it the method of minimized iterations). Since this book is concerned with eigensy stems. these books contain historical comments and extensive bibliographies.4. Parlett [206]. and. What makes the Krylov sequence important today is that it is the basis of the Lanczos algorithm for eigenpairs and the conjugate gradient method for linear systems as well as their non-Hermitian variants. 1931] introduced the sequence named for him to compute the characteristic polynomial of a matrix. 3. The term "Krylov sequence" is due to Householder. Krylov sequences are based on u. To maintain consistency with the first half of this book. NOTES AND REFERENCES Sources 279 Three major references for Krylov sequences are The Symmetric Eigenvalue Problem by B. Krylov sequences According to Householder [121. KRYLOV SUBSPACES 3. The first hint of the power of Krylov sequences to approximate eigenvectors is found in Lanczos's 1950 paper [157]. and Eigenvalues of Matrices by F.1)]. obtaining ra successive eigenvalues and principle axes by only m+1 iterations. Numerical Methods for Large Eigenvalue Problems: Theory and Algorithms^ Y. The solution of eigenvalue problems by forming and solving the characteristic equation is a thing of the past. to avoid underusing a good letter. if the Krylov sequence terminates at i [see (3.SEC. Krylov [153. he states Moreover.1]. then there are constants ao. N. §6. eigenvectors are denoted by x. Convergence (Hermitian matrices) It seems that Krylov had no idea of the approximating power of the subspaces spanned by his sequence. In addition to most of the results treated in this subsection. A word on notation. Vol II: Eigensystems . In the conclusion to his paper. Chatelin [39]. In much of the literature the eigenvectors of the matrix in question are denoted generically by u (perhaps because u is commonly used for an eigenfunction of a partial differential equation). the good convergence of the method in the case of large dispersion makes it possible to operate with a small number of iterations. Saad [233]. the focus of this section has been on the spectral properties of Krylov sequences. and Krylov sequences are often based on a vector v. Specifically.

6.14) was stated without proof by Saad in [230. bounds analogous to those of Theorem 3. Block Krylov sequences Block Krylov sequences — limited to small powers of A — were introduced by Cullum and Donath [45. EIGENSPACES It is unfortunate that the first convergence results for Krylov spaces were directed toward the Lanczos algorithm.D. Theorem 3. The Rayleigh-Ritz method approximates that best approximation.11. The result is Theorem 3. Thus when an eigenvalue lies outside an ellipse containing the rest of the spectrum.8. incidentally.10). who gave the first analysis of the method. started by using Chebyshev polynomials to bound the eigenvalues produced by the Lanczos method and then using the eigenvalue bounds to produce bounds on the eigenvectors. and the related bound (3. One is free to use them—or not use them—in combination with other polynomials.280 CHAPTER 4. 204]. Con- . Block Krylov sequences in their full generality were introduced by Underwood in his Ph. Theorem 3. 4. Kaniel [145. 1980]. For more see [229]. using a technique related to the construction in the proof of Theorem 3. p. 1974] in connection with a restarted Lanczos algorithm. correcting and refining his eigenvector bounds. who gave a direct proof later [233.10 removes this restriction. 1966].1971] followed Kaniel's approach. It is known that Chebyshev polynomials defined on a suitable ellipse asymptotically satisfy an inequality like (3.8 are possible. thesis [280. This is the approach we have taken here. These results raise the possibility of using polynomials to get eigenvalue bounds. In his important thesis Paige [196. which is a combination of Krylov sequences and the Rayleigh-Ritz method to be treated in the next subsection. Convergence (non-Hermitian matrices) The inequality (3. First there is a best approximation to the desired eigenspace in the larger subspace. Underwood established eigenvalue bounds in the style of Kaniel. This bound. The result requires that A be diagonalizable. It is important to keep in mind that there are two levels of approximation going on here. The purpose of this subsection is to describe and analyze the most commonly used method for extracting an approximate eigenspace from a larger subspace—the Rayleigh-Ritz method.12). In 1980 Saad [229] showed that more natural bounds could be derived by first bounding the eigenvector approximations contained in the Krylov sequence itself and then analyzing the Rayleigh-Ritz procedure separately. shows that there is nothing sacrosanct about Chebyshev polynomials. But it is one thing to know that an approximation exists and another to have it in hand.11 itself is due to Saad [229]. Theorem 3. 1975] to generalize the Lanczos algorithm (also see [92]). RAYLEIGH-RITZ APPROXIMATION We have seen how a sequence of Krylov subspaces based on a matrix A can contain increasingly accurate approximations to an eigenspace of A.

We begin with a description of the method and then turn to its analysis. It may and often will be a Krylov subspace. LetU be a subspace and let U be a basis for U. although its use in practice is straightforward. but that is irrelevant to what follows. We will motivate the method by the following theorem. the last two subsections are devoted to ways to circumvent this problem. Then from the relation we obtain so that (L. The analysis shows that unless a certain regularity condition is satisfied. No other assumptions are made about U. RAYLEIGH-RlTZ METHODS The essential ingredients for the application of the Rayleigh-Ritz method is a matrix A and a subspace U containing an approximate eigenspace of A. Vol II: Eigensystems .1. the analysis of the method is somewhat complicated. By continuity. W) is an eigenpair of B. 4. This suggests that to approximate an eigenpair corresponding to X we execute the following procedure.1. the method may not produce satisfactory approximations. Thus we can find exact eigenspaces contained in U by looking at eigenpairs of the Rayleigh quotient B. we might expect that when U contains an approximate eigenspace X o f A there would be an eigenpair (!•. RAYLEIGH-RlTZ APPROXIMATION 281 sequently. Let V be a left inverse ofU and set If X C It is an eigenspace of A. Consequently. UW) is an eigenpair of A with U(UW) = X. which treats the case when U contains an exact eigenspace of A. AW) is an approximate eigenpair of A with Tl(AW) = X. 4. W) ofB such that (L. W) of 7? such that (L. then there is an eigenpair (L. we will confine ourselves to the approximation of eigenvectors and will point out where the results generalize to eigenspaces. To keep the treatment simple. Theorem 4. In this section we will develop the theory of the method and examine its practical implications.SEC. X) be an eigenpair corresponding to X and let X — UW. Proof. Let (L.

1. UW) a Ritz pair. W) (see Theorem 1.. When the concern is with eigenvalues and eigenvectors so that the Ritz pair can be written in the form (A. we obtained a matrix B = (UTU)~l UTAU of the T . Thus the method can fail. The second difficulty has no easy answer. however. and UW its Ritz basis.g. Example 4. W) in statement 3. It is therefore natural to collect the eigenvalues of B lying in the same chosen part of the spectrum and use them to compute (M. even though the space U contains an exact approximation to the desired eigenvector. Uw). the eigenpair (M.2.282 CHAPTER 4. we have no guarantee that the result approximates the desired eigenpair of A. Second. The space 1Z(UW) will be called the Ritz space. any nonzero vector p is an eigenvector corresponding to the part of the spectrum near zero. we will call A a Ritz value. W) is chosen by looking at the eigenvalues of B. — 1) and suppose we are interested in approximating the eigenpair (0. We will call W itself the primitive Ritz basis. Instead we must specify the part of the spectrum that we are interested in—e. We will call the pair (M. we T might end up with p = (1 1) . in which case our approximate eigenvector becomes (1 \/2 -\/2) » which is completely wrong. For example. This procedure. Of course. will work only if we can insure that there are eigenvalues in B that approximate those of A in the the desired part of the spectrum. But nearly zero is bad enough. by perturbing U by a matrix of random normal deviates with standard deviation of 10~4. and w a primitive Ritz vector. EIGENSPACES We will call any procedure of this form a Rayleigh-Ritz procedure. Let A = diag(0.5).1) has two difficulties. M its Ritz block'. First. as the following example shows. Assume that by some method we have come up with the basis Then U is its own left inverse and we may take Since B is zero. Two difficulties The procedure listed in (4. Let us examine these difficulties in turn. in real life we would not expect B to be exactly zero. Since B itself gives no clue as to which vector to choose. ei). Uw a Ritz vector. In practice. a group of eigenvalues with largest real part. we have not specified how to choose the eigenpair (M. Recall that in large eigenproblems we cannot expect to compute the entire spectrum.

If it is not.3813e-01)T and the corresponding Ritz vector is This comes nowhere near reproducing the order 10 know lies in Tl(U}. RAYLEIGH-RITZ APPROXIMATION form 283 The eigenvector corresponding to the smallest eigenvalue ofB (the primitive Ritz vector) is (9. provided of course that VHU is nonsingular. The orthogonal Rayleigh-Ritz procedure In its full generality the algorithm (4.1) presupposes that V is a left inverse of U..4).4110e-01 3. If (//. This form of the method has important applications (e. 4 approximation to GI that we The problem arises because the process of forming the Rayleigh quotient creates a spurious eigenvalue of B near zero. In Vol II: Eigensystems . then and hence Thus the primitive Ritz pairs are the eigenpairs of the generalized eigenvalue problem (4. But in the overwhelming majority of applications U is taken to be orthonormal and V is taken equal to U. Symbolically: We will flesh out this equation in the next subsection. we can convert it to a left inverse by replacing it with V = V (FH f/)~ H . 4. to the bi-Lanczos algorithm). w) is a primitive Ritz pair. Rayleigh-Ritz and generalized eigenvalue problems The algorithm (4.1) is called an oblique Rayleigh-Ritz method because the subspaces spanned by U and V can differ. This suggests that we may have to postulate a separation between the approximations to the eigenvalues we desire and the other eigenvalues of the Rayleigh quotient.g.SEC. It turns out that such a postulate combined with the convergence of the eigenvalues in the Rayleigh quotient allows us to prove that the Ritz spaces converge as the approximations in U converge.

and try the RayleighRitz process again. we return to our Umachine. Proof. CONVERGENCE This subsection is concerned with the accuracy of the approximations produced by the Rayleigh-Ritz procedure. We call this procedure the orthogonal Rayleigh-Ritz method. We will treat the orthogonal variant and for simplicity will only consider the case where we are attempting to approximate a simple eigenvalue and its eigenvector—by far the most common use of the method. Uepe) • If the pair is unsatisfactory. Setting the scene Consider the use of the Rayleigh-Ritz procedure in real life. . so that the Ritz basis X — UW is also orthonormal. Let (M. ask it to grind out another UQ with a smaller value of 0. When A is Hermitian. However.g. we have 4. Theorem 4. x) of A. the primitive Ritz basis W is taken to be orthonormal.6 all we need to show is that M = XHAX. By Theorem 2. the orthogonal Rayleigh-Ritz method has the obvious advantage that it preserves symmetry throughout the procedure. Then is minimal in any unitarily invariant norm. First we will show that the Rayleigh quotient contains a Ritz value /J.2. UQ] is small. We will treat this problem in two stages. We will then treat the more delicate question of when the Ritz vector x$ converges. we must show that the process in some sense converges as 0 -»• 0.284 CHAPTER 4. EIGENSPACES addition. But if W is the (orthonormal) primitive Ritz basis.Q that converges to A.3. We apply the Rayleigh-Ritz process to obtain a Ritz pair (fj. To justify this process.2). then M = WHBW (see Theorem 1. We want to approximate the eigenpair (A. the following theorem shows that for any matrix the orthogonal method also gives minimum residuals. X) be an orthogonal Rayleigh-Ritz pair with respect to U. Since X = UW. By some method—call it the U-machine—we obtain an orthonormal basis U0 for which 0 — Z(ar.

Hence ||2:||2 = sin 9 and hence or it follows from (4.1.Q ofBa such that where m is the order of Be.SEC. Corollary 4. RAYLEIGH-RlTZ APPROXIMATION 285 Convergence of the Ritz value Our approach to the convergence of Ritz values is through a backward perturbation theorem. Proof.5).4. Chapter 1) to give the following corollary. There is an eigenvalue U. Let (Ue U±_) be unitary and set Now Uj_ spans the orthogonal complement of the column space of UQ. Theorem 4. Then there is a matrix EQ satisfying such that A is an eigenvalue of Be + EQ.6) that Now define Then clearly We can now apply Eisner's theorem (Theorem 3. 4. Vol II: Eigensystems . Let BQ be the Rayleigh quotient (4.5.

U&WQ) be a Ritz pair for which U. Then the corollary assures us that we only have to wait until 0 is small enough for a Ritz value to emerge that is near A.6. which uses a sequence of Krylov spaces in which m can become rather large. The key is the following theorem. To apply the theorem. Let (//<?. Convergence of Ritz vectors We now turn to the convergence of Ritz vectors.7) by \\E\\? (see Theorem 3. NO) in (4. • If A is Hermitian. In this case we do not have to assume m is bounded. we may replace the right-hand side of (4. we have sin Z(x. let (fig. This has important implications for the Lanczos algorithm.8). U0p$) -* 0 along with 0.8. Chapter 1). we) be a primitive Ritz pair and let the matrix (we We) be unitary.5) for which IJLQ —*• A. EIGENSPACES • Assume the number m of columns of U& is bounded. The factor H-E^H™ in the bound (4.11). If there is a constant a > 0 such that then asymptotically . But as we shall see later. NO) remains bounded below. so does the sep(A.Q -» A. CHAPTER 4. we have Consequently. suppose we have chosen a Ritz pair (whose existence is guaranteed by Corollary 4. if sep(/£0. In the above notation. the convergence will be 0(0).7. at least in the limit. so that j_ Then For the proof of the theorem see the notes and references. under a certain separation condition. Then by the continuity of sep (Theorem 2. Theorem 4. Since \\hg\\2 < \\A\\2.286 Two comments. which relates Ritz vectors to the angle 0.7) suggests that we may have to wait a long time. Thus we have the following corollary. Corollary 4.

where y 1. x and ||t/||2 = 1. Ng) is uniformly bounded below. If we write XQ — 70. Let the eigenvalues of A be ordered so that and let the Ritz values—the eigenvalues of the Hermitian matrix U AU—be ordered so that Vol II: Eigensystems . which accounts for the failure of the Ritz vector to reproduce the eigenvector ei. in the second part of Example 4.5 suggests that the convergence of the Ritz value can be quite slow.9) the uniform separation condition. We will now use the fact that the Ritz vector converges to derive a more satisfactory result. For example. it contains the convergence hypothesis that JJLQ approaches A and the separation hypothesis that the quantity sep(//0. Specifically.#).9-10"4. by Corollary 4. What is unusual for results of this kind is that it consists entirely of computed quantities and can be monitored.SEC. XQ] be the Ritz approximation to (A. The conclusion is that the sine of the angle between the eigenvector x and the Ritz vector Uepe approaches zero. Let (fig. If the uniform separation is satisfied. x). we can say more. x) = O(9).3) to the effect that eigenvalue convergence plus separation equals eigenvector convergence. By direct computation of the Rayleigh quotient x^Axg we find that Thus the Ritz value converges at least as fast as the eigenvector approximation of x in K(U8). 4.7 we have a\ = sin Z(z0. then (7! = cosZ(#0.:r) and \cr\ = sin/(£#. We call the condition (4. Convergence of Ritz values revisited The bound in Corollary 4. RAYLEIGH-RITZ APPROXIMATION 287 This corollary justifies the statement (4. Hermitian matrices When A is Hermitian. even though it is present in U to accuracy of order 10~4. is the Rayleigh quotient formed from XQ and A. -f <jy.2 the separation of the eigenvalues of B is about 1. Then by construction.

EIGENSPACES Then by the interlacing theorem (Theorem 3. Consequently. Consequently.10) we have x^Av = 0.7 all have natural analogues for Ritz blocks and Ritz bases. it must converge from above. and. with the exception of Corollary 4.. Let T = Ax — f i x . in the last analysis we cannot know 0 and hence cannot compute error bounds. Chapter 1).288 CHAPTER 4. For more see the notes and references. Similarly. x) and let Then Proof.#) = Y^x.8.11) is a spectral decomposition.7. Let (^. if m converges to A. we must look to the residual as an indication of convergence. Because (4. if /^converges to A n _ m+ . and hence Convergence of Ritz blocks and bases Theorem 4. Thus. Theorem 4. Let A have the spectral representation where \\x\\% — 1 and Y is orthonormal. Theorem 4. and Corollary 4. Thus sin/(#. Corollary 4.4. x) be an approximation to (A. The following residual bound actually applies to any approximate eigenpair. Y is an orthonormal basis for the orthogonal complement of the space spanned by x. For in (4. Residuals and convergence The results of this subsection have been cast in terms of the sine of the angle 9 between an eigenvector x and its approximating subspace U. Then . as always.6.then it must converge from below. the proofs are essentially the same. Although these results give us insight into the behavior of the Rayleigh-Ritz method. The Ritz values converge more quickly in the Hermitian case. we can use the Rayleigh-Ritz method to approximate eigenspaces with the same assurance that we use it to approximate eigenvectors.5.6.

3. • Since A is assumed to be simple. First let us return to Example 4. A REFINED RITZ VECTOR is a solution of the problem We will worry about the computation of refined Ritz vectors later. The second inequality follows from the first and the fact that sep(/z. which showed how the Rayleigh-Ritz method could fail to deliver an accurate approximation to an accurate eigenvector contained in a subspace. if we solve the problem using the smallest eigenvalue of B in (4. as Example 4. that residual must approach zero.\^i .12) follows. This suggests the following definition. We therefore turn to two alternatives to Ritz vectors. Let U.2 shows. L) > sep(A. Definition 4. REFINED RITZ VECTORS Definition By hypothesis. By Corollary 4.5.2) we get the vector which is an excellent approximation to the original eigenvector ei of A. Hence if we define xg to be the vector whose residual Axe .9. RAYLEIGH-RlTZ APPROXIMATION 289 It follows that from which the first inequality in (4. L) . the space U$ contains increasingly accurate approximations XQ to the vector in the simple eigenpair (A. this theorem says that: Unfortunately. It follows that the residual Axe . x).2.X\ (see Theorem 2.11).fJ>0%9 approaches zero. there is a Ritz value pe that is approaching A. Specifically. Convergence of refined Ritz vectors We now turn to the convergence of refined Ritz vectors. spurious Ritz values can cause the residual to fail to converge. 4. Vol II: Eigensystems .Q be a Ritz value associated with U$.Ve&d has the smallest norm.SEC. The following theorem gives quantitative substance to this assertion. 4. We have argued informally that the convergence of the refined Ritz value u-o to the simple eigenvalue A is sufficient to insure the convergence of the refined Ritz vector.

Let U be an orthonormal basis for U and let a: = y+z.6. EIGENSPACES where ||z||2 = 1 and Y is orthonormal. the residual norm \\Axg . then Proof. the Rayleigh quotient fa = x jAxg is likely to be a more accurate approximation to A. • By Corollary 4. by Theorem 2. • Although the Ritz value //# was used to define the refined Ritz vector XQ.15) that sin Z(ar.5 we know that fig -» A.10. Let A have the spectral representation CHAPTER 4.\fj. • Here are two comments on this theorem. Moreover. Let be the normalized value of y.8. Let fig be a Ritz value and XQ the corresponding refined Ritz vector.g — A| > 0. In other words. . unlike ordinary Ritz vectors. Moreover since y and z are orthogonal. XQ) —> 0. It follows from (4. \\y\\\ = 1 — sin2 9. We have Hence By the definition of a refined Ritz vector we have The result now follows from Theorem 4. L) . where y = UUHx. Then \\z\\2 = ^U^x^ = sin#. refined Ritz vectors. are guaranteed to converge.jji0x0\\2 will be optimal. If sep(A.290 Theorem 4.

RAYLEIGH-RlTZ APPROXIMATION 291 The computation of refined Ritz vectors The computation of a refined Ritz vector amounts to solving the optimization problem Let U be an orthonormal basis for U. are to be computed. Chapter 3. which requires O(np2) operations. the cross-product Vol II: Eigensystems . The formation of V requires p multiplications of a vector by A. On the other hand.I)U. As with the formation of block Krylov sequences. The next nontrivial step is the computation of the singular value decomposition of W. This suggests the following algorithm. it will often happen that the formation of AU takes less time than p individual matrix-vector multiplications. It is natural to compare this process with the computation of an ordinary Ritz vector. A cure for this problem is to use the cross-product algorithm (Algorithm 3. Thus all Ritz vectors can be calculated with 0(np2) operations. In this case. Specifically. Then any vector of 2-norm one in U can be written in the form Uz. The situation is different when several vectors. the computation of a single refined Ritz vector and a single ordinary Ritz vector require the same order of work.2. Hence the computation of m refined Ritz vectors will take longer than the computation of m ordinary Ritz vectors by a factor of at least m. 4. We will assume that U is nxp. the solution of this problem is the right singular vector of (A . which is also an 0(np2) process.SEC.fj. say m.1. Thus. This computation must also be done by the Rayleigh-Ritz method. Chapter 3) to compute the singular value decompositions. while the computation of each refined Ritz vector requires O(np2) operations. the Rayleigh-Ritz method requires only that we compute the eigensystem of H. we will have to compute m singular value decompositions to get the refined vectors. albeit with a smaller order constant. On the other hand. Thus we need to solve the problem By Theorem 3. where \\z\\2 = 1. The computation is dominated by the formation of V and the computation of the singular value decomposition. the Rayleigh-Ritz method must calculate H = UEV.fjLl)U corresponding to its smallest singular value. where n ~^> p. Thus we can compute refined Ritz vectors by computing the singular value decomposition of (A .

2 lies between the eigenvalues 1 and —1 and for this reason is called an interior eigenvalue of A. Then by (3.292 matrix corresponding to A U — ull is CHAPTER 4. where u is not near an eigenvector of A. we can form at negligible cost.4. 4. Interior eigenvalues are particularly susceptible to being impersonated by Rayleigh quotients of the form uHAu. Uw) is a HARMONIC RITZ PAIR WITH SHIFT K if . can provide a complete set of approximating pairs while at the same time protecting pairs near a shift K against impersonators. Then (K + £. as we shall see in §1. we expect ap to be small.If &p-i is not too small — something we can check during the calculation — then the refined Ritz vector computed by the cross-product algorithm will be satisfactory. as we have seen. such an impersonation is the cause of the failure of the Rayleigh-Ritz method in the example.4. LetU be a subspace and U be an orthonormal basis for U. Chapter 5.17) we precompute the matrices then for any fj. Specifically. In fact. the computation of refined Ritz values can be considerably abbreviated. Definition 4. each refined Ritz vector must be calculated independently from its own distinct value of//. Chapter 3. HARMONIC RITZ VECTORS The eigenvalue 0 of A in Example 4.11. Our analysis of the cross-product algorithm applies here. denote the singular vectors of AU — ^1 by a\ > • • • > ap. There is no really good way of introducing them other than to begin with a definition and then go on to their properties. Finally. with i = p and j = p—ly we see that the sine of the angle between the computed vector and the true vector is of order (<J p _i/cri) 2 e M . on the other hand. and.20). the quality of the refined Ritz vector depends on the accuracy of ^. Unfortunately. EIGENSPACES If in the notation of (4. Thus the cost of computing an additional refined Ritz vectors is the same as computing a spectral decomposition of a matrix of order p. The use of refined Ritz vectors solves the problem by insuring that the residual Ax — fix is small — something that is not true of the spurious Ritz vector. Harmonic Ritz vectors. we note that if U is a Krylov subspace. If // is an accurate approximation to an eigenvalue A.

SEC. 4. RAYLEIGH-RITZ APPROXIMATION

293

We will refer to the process of approximating eigenpairs by harmonic Ritz pairs as the harmonic Rayleigh-Ritz method. This method shares with the ordinary RayleighRitz method the property that it retrieves eigenvectors that are exactly present in U. Specifically, we have the following result. Theorem 4.12. Let (A, x) be an eigenpair of A with x = Uw. Then (A, Uw) is a harmonic Ritz nair. Proof. Calculating the left- and right-hand sides of (4.18), we find

Hence Uw is an eigenvector of (4.18) with eigenvalue b — A — K; i.e., (A, Uw) is a harmonic Ritz pair. • This theorem leads us to expect that if x is approximately represented in U, then the harmonic Rayleigh-Ritz, like its unharmonious counterpart, will produce an approximation to x. Unfortunately, as of this writing the method has not been analyzed. But it is easy to see informally that it protects eigenpairs near K, from imposters. Specifically, on multiplying (4.18) by W H , we gel or

Consequently, if any harmonic Ritz value is within 5 of K the pair must have a residual norm (with respect to K) bounded by \S\. For example, if AC —> A and 6 —> 0, then it eventually becomes impossible for the harmonic Rayleigh-Ritz method to produce impersonators. It should be stressed that A does not have to be very near K for this screening to occur. If we return to Example 4.2 and compute the harmonic Ritz vector associated with the eigenvalue zero setting K = 0.3, we obtain the following excellent approximation to ei:

Computation of harmonic Ritz pairs When U contains a good approximation to an eigenvector whose eigenvalue lies near AC, (A - Kl)U in (4.18) will have a small singular value which will be squared when we form U^(A - K,I^(A - «/)£/, which can result in subsequent inaccuracies (see §3.2, Chapter 3). To circumvent this problem, we can compute the QR factorization

Vol II: Eigensystems

294

CHAPTER 4. EIGENSPACES

Then (4.18) assumes the form

Hence

This eigenproblem can be solved by the QZ algorithm (§4.3, Chapter 2). An alternative is to work with the ordinary eigenproblem

In this case a small singular value in R will cause R l and hence R 1Q U to be large. Fortunately, we are interested in harmonic Ritz values that are near K, i.e., values for which 6 is small. Thus we want eigenpairs corresponding to large eigenvalues in (4.23). Although the eigenvalue itself will not necessarily be accurately computed (the loss of accuracy occurs in forming R, not in foming R~1Q^U), if the eigenvalue is sufficiently well separated from it neighbors the eigenvector p will be well determined. The harmonic Ritz values are useful primarily in locating the harmonic Ritz vectors we need. When it comes to approximating the associated eigenvalue, we should use the ordinary Rayleigh quotient WH UAUw, which minimizes the residual norm. In computing this Rayleigh quotient we can avoid a multiplication by A by using quantities alreadv calculated. SDecificallv, from (4.20)

Hence

where x is the harmonic Ritz vector. When A is Hermitian, both the eigenproblems (4.22) and (4.23), which are essentially nonsymmetric, can give complex harmonic Ritz values in the presence of rounding error. However, we can use the same technique that we used for the S/PD problem (§4, Chapter 3). Specifically, we can transform (4.23) into the Hermitian eigenproblem

for the vector Rw. This approach has the slight disadvantage that having computed a solution g we must solve the system Rw = g to retrieve the primitive harmonic Ritz vector. However, the usual Ravleidi Quotient has the simpler form

SEC. 4. RAYLEIGH-RITZ APPROXIMATION

295

4.5. NOTES AND REFERENCES Historical In 1899, Lord Rayleigh [215] showed how to compute the fundamental frequency of a vibrating system. His approach was to approximate the fundamental mode by a linear combination of functions. He chose the linear combination x for which the quantity A = XT Ax/x^x was minimized and took A for the fundamental frequency. In 1909 Ritz [217] proposed a general method for solving continuous problems that could be phrased as the minimization of a functional. His idea was to choose a suitable subspace (e.g., of low-order polynomials) and determine coefficients to minimize the functional. His example of an eigenproblem—a second-order differential equation—does not, strictly speaking, fit into his general scheme. Although the mode corresponding to the smallest eigenvalue minimizes a functional, the other eigenvalues are found from the stationary points. The Rayleigh-Ritz method is sometimes called a Galerkin method or a GalerkinPetrov method. These methods solve a linear system Ax — b in the operator A by choosing two subspaces U and V and choosing x to satisfy the conditions

If U = V the method is called a Galerkin method; if not it is called a Galerkin-Petrov method. Now let V be the subspace spanned by V in (4.1). Then any Ritz pair (ju, x) satisfies the conditions

The similarity of (4.25) and (4.26) explains the nomenclature. The subject is not well served by this proliferation of unnecessary terminology, and here we subsume it all under the terms Rayleigh-Ritz method, Ritz pair, etc. The Rayleigh-Ritz method is also called a projection method because a Ritz pair (/x, x) is an eigenpair of the matrix F^APy, where P\> and PU are the orthogonal projections ontoZY and V. Convergence For earlier work on the convergence of the Rayleigh-Ritz method see the books by Parlett [206] for Hermitian matrices and by Saad [233] for both Hermitian and nonHermitian matrices. The convergence theory developed here is essentially that of Jia and Stewart for Ritz vectors [133] and for Ritz bases [134]. Theorem 4.6 is a generalization by Stewart of a theorem of Saad [231] for Hermitian matrices (also see [148]) and can in turn be extended to Ritz bases. For a proof see [268]. Refined Ritz vectors Refined Ritz vectors whose residual norms are not necessarily minimal were introduced by Ruhe [220] in connection with Krylov sequence methods for the generalVol II: Eigensystems

296

CHAPTER 4. EIGENSPACES

ized eigenvalue problem. Refined Ritz vectors in the sense used here are due to Jia [131], who showed that they converge when the Ritz vectors fail to converge. The proof given here may be found in [134]. The use of the cross-product algorithm to compute refined Ritz vectors was suggested in [132]. Harmonic Ritz vectors For a symmetric matrix, the (unshifted) harmonic Ritz values are zeros of certain polynomials associated with the MINRES method for solving linear systems. They are called "harmonic" because they are harmonic averages of the Ritz values of A~l. For a survey and further references see the paper [201] by Paige, Parlett, and van der Vorst, who introduced the appellation harmonic. Morgan [179] introduced the harmonic Ritz vectors as good approximations to interior eigenvectors. His argument (still for symmetric matrices) was that since eigenvalues near K become exterior eigenvalues of (A — Ac/)"1, to compute their eigenvectors we should apply the Rayleigh-Ritz method with basis U to (A — K I)"1. However, we can eliminate the need to work with this costly matrix by instead working with the basis (A — K,I)U. This leads directly to equation (4.18). But the resulting Ritz vector (A - K,I)Up will be deficient in components in eigenvalues lying near K. Hence we should return the vectors Up, which are not Ritz vectors, at least in the usual sense for symmetric matrices. Morgan also recommended replacing the harmonic Ritz value with the ordinary Rayleigh quotient. Since Morgan was working with Hermitian matrices, he was able to establish some suggestive results about the process. Harmonic Ritz values for non-Hermitian matrices were described by Freund [78], again in connection with iterative methods for solving linear systems. Harmonic Ritz vectors were introduced by Sleijpen and van der Vorst [240], who characterized them (in our notation) by the Galerkin-Petrov condition

The use of the inequality (4.19) to justify the use of harmonic Ritz vectors is new. The explicit reduction of the harmonic eigenproblem to the forms (4.22), (4.23), and the symmetric form (4.24) is new, although the symmetric form is implicit in the implementation of the Jacobi-Davidson algorithm in [243]. Another way of looking at these reductions is that when Tl(U} contains an eigevector of A corresponding to eigenvalue r, the pencil in the original definition (4.18) is not regular, while the transformed pencil (4.22) is. For the generalized eigenvalue problem Ax — \Bx, a natural generalization of a harmonic Ritz pair is a solution of

which was first introduced in [74]. An argument similar to that leading to (4.19) shows that this method also discourages imposters.

5
KRYLOV SEQUENCE METHODS

In this chapter and the next we will treat algorithms for large eigenproblems. There are two ways of presenting this subject. The first is by the kind of problem—the symmetric eigenproblem, the generalized eigenproblem, etc. This approach has obvious advantages for the person who is interested in only one kind of problem. It has the disadvantage that, at the very least, a thumbnail sketch of each algorithm must accompany each problem type. The alternative is to partition the subject by algorithms, which is what we will do here. In the present chapter we will be concerned with methods based on Krylov sequences— specifically, Arnoldi's method and the Lanczos method. In the first subsection we will introduce certain decompositions associated with Krylov subspaces. The decompositions form the basis for the the Arnoldi and Lanczos algorithms to be treated in §2 and §3.

1. KRYLOV DECOMPOSITIONS
For a fixed matrix A, not every subspace of dimension fc is a Krylov subspace of A. This means that any basis of a Krylov subspace must satisfy certain constraining relations, which we will call Krylov decompositions. It turns out that a knowledge of these decompositions enables us to compute and manipulate Krylov subspaces. Consequently, this section is devoted to describing the relations that must hold for a basis to span a Krylov subspace. The most widely used Krylov decompositions are those that result from orthogonalizing a Krylov sequence. They are called Arnoldi decompositions for non-Hermitian matrices and Lanczos decompositions for Hermitian matrices. We will treat these special decompositions in the first two subsections. Unfortunately, these decompositions are not closed under certain natural operations, and in §1.3 we introduce more general decompositions that are closed — decompositions that will be used in our computational algorithms.

297

298

CHAPTER 5. KRYLOV SEQUENCE METHODS

1.1. ARNOLDI DECOMPOSITIONS
An explicit Krylov basis of the form

is not suitable for numerical work. The reason is that as k increases, the columns of Kk become increasingly dependent (since they tend increasingly to lie in the space spanned by the dominant eigenvectors). For example, the following table gives the condition number (the ratio of the largest to the smallest singular values) for the Krylov basis Kk( A, e), where A is the matrix of Example 3.1, Chapter 4. k Condition 5 5.8e+02 10 3.4e+06 15 2.7e+10 20 3.1e+14 A large condition number means that the space spanned by the basis is extremely sensitive to small perturbations in the vectors that form the basis. It is therefore natural to replace a Krylov basis with a better-conditioned set of vectors, say an orthonormal set. Unfortunately, a direct orthogonalization of the columns of Kk will not solve the problem; once the columns have become nearly dependent, information about the entire Krylov subspace has been lost, and orthogonalization will not restore it. We therefore proceed indirectly through what we will call an Arnoldi decomposition. The Arnoldi decomposition Let us suppose that the columns of Kk+i are linearly independent and let

be the QR factorization of Kk+i (see §6, Chapter 2.3). Then the columns of Uk+i are the results of successively orthogonalizing the columns of Kk+i • It turns out that we can eliminate Kk+i from equation (1.1). The resulting Arnoldi decomposition characterizes the orthonormal bases that are obtained from a Krylov sequence. Theorem 1.1. Letu\ have 2-norm one and let the columns of Kk+i(A,ui) belinearly independent. Let Uk+i — (u\ • • • Uk+i) be the Q-factor ofKk+i- Then there is a (&+l)xAr unreduced upper Hessenberg matrix Hk such that

Conversely, ifUk+i is orthonormal and satisfies (1.2), where Hk is a (k+l)xk unreduced Hessenberg matrix, then Uk+i is the Q-f actor of Kk+i(A, ui).

SEC. 1. KRYLOV DECOMPOSITIONS
Proof. If we partition (1.1) in the form

299

we see that Kk = UkRk is the QR factorization of AV Now let Sk — Rk l. Then

where

It is easily verified that Hk is a (k+l)xk Hessenberg matrix whose subdiagonal elements are hi+iti = ri+ij+isa = n+i,z+i/ r n- Thus by the nonsingularity of Rk, Hk is unreduced. Conversely, suppose that the orthonormal matrix Uk+i satisfies the relation (1.2), where Hk i s a ( f c + l ) x f c unreduced Hessenberg matrix. Now if k = 1, then Au\ — huui 4- h,2iU2. Since /i2i ^ 0, the vector u2 is a linear combination of HI and Au\ that is orthogonal to u\. Hence (u\ u^} is the Q-factor of AY Assume inductively that Uk is the Q-factor of Kk- If we partition

then from (1.3)

Thus Uk+i is a linear combination of Auk and the columns of Uk and lies in JCk+iHence Uk+i is the Q-factor of Kk+i- • This theorem justifies the following definition. Definition 1.2. Let Uk+i G Cnx^k+l^ be orthonormal and let Uk consist of the first k columns ofUk+i • If there is a (&+1) x k unreduced upper Hessenberg matrix Hk such that

then we call (1.3) an ARNOLDI DECOMPOSITION OF ORDER k. If H is reduced, we say the Amoldi decomposition is REDUCED. Vol II: Eigensystems

300

CHAPTER 5. KRYLOV SEQUENCE METHODS

In this nomenclature, Theorem 1.1 says that every nondegenerate Krylov sequence is associated with an Araoldi decomposition and vice versa. An Amoldi decomposition can be written in a different form that will be useful in the sequel. Specifically, partition

and set

Then the Arnoldi decomposition (1.2) is equivalent to the relation

In what follows it will reduce clutter if we dispense with iteration subscripts and write (1.4) in the form

When we do, the compact form (1.3) will be written in the form

Thus the hats over the matrices U and H represent an augmentation by a column and a row respectively. Uniqueness The language we have used above—an Arnoldi decomposition—suggests the possibility of having more than one Amoldi decomposition associated with a Krylov subspace. In fact, this can happen only when the sequence terminates, as the following theorem shows. Theorem 1.3. Suppose theKrylov sequence Kk+\(A, u\) does not terminate at k+1. Then up to scaling of the columns of Uk+i, the Amoldi decomposition of Kk+i is unique. Proof. Let

be the Amoldi decomposition from Theorem 1.1. Then by this theorem Hk is unreduced and fa ^ 0. Let

SEC. 1. KRYLOV DECOMPOSITIONS

301

where Uk+i is an orthonormal basis for /Cjt+i(A, wi). We claim that U^Uk+\ = 0. For otherwise there is a Uj such that Uj = auk+i + Uka, where a ^ 0. Hence Aitj contains a component along Ak+l u and does not lie in /Cjt+i, a contradiction. It follows that H(Uk) = ^(Uk) and hence that Uk = UkQ, for some unitary matrix Q. Hence

or

On premultiplying (1.5) and (1.7) by Uk we obtain

Similarly, premultiplying by u^+l, we obtain

Now by the independence of the columns of Kk+i, both (3k and (3k are nonzero. It then follows from (1.8) that the last row of Q is a>£e£, where \u>k\ = 1. Since the norm of the last column of Q is one, the last column of Q is u^e^. Since Hk is unreduced, it follows from the implicit Q theorem (Theorem 3.3, Chapter 2) that Q = diag(u>i,... ,tjfc), where \MJ = 1 (j = 1,... , fc). Thus up to column scaling Uk = UkQ is the same as Uk- Subtracting (1.7) from (1.5) we find that

so that up to scaling Uk+i and tik+i are the same. •

This theorem allows us to speak of the Arnoldi decomposition associated with a nonterminating Krylov subspace. Perhaps more important, it implies that there is a one-one correspondence between Krylov subspaces and their starting vectors. A nonterminating Krylov subspace uniquely determines its starting vector. The nontermination hypothesis is necessary. If, for example, /Cfc+i = /C^+2, then AUk+i = Uk+iHk+i> and for any Q such that Q^Hk+iQ is upper Hessenberg, the decomposition A(Uk+\Q) = (Uk+iQ}(Q^Hk+iQ] is another Arnoldi decomposition. The Rayleigh quotient In the proof of Theorem 1.3 we used the fact that Hk = U^AUk- But this matrix is just the Rayleigh quotient with respect to the basis Uk (see Definition 2.7, Chapter 4). Thus the Arnoldi decomposition provides all that is needed to compute Ritz pairs. Vol II: Eigensystems

Proof.(Uk) and suppose that H k is unreduced. suppose that A has an eigenspace that is a subset ofR.(Uk) contains an eigenspace of A. . . Then Hk is reduced if and only ifR.2 we have required that the Hessenberg part of an Arnoldi decomposition AUk = Uk+iHk be unreduced. However.4. which we can write in the form x = UkW for some w. KRYLOV SEQUENCE METHODS Reduced Arnold! decompositions In Definition 1. Let the orthonormal matrix Uk+i satisfy where Hk is Hessenberg. . • This theorem. the result is a reduced Arnoldi decomposition.4) in the form Then from the orthonormality of Uk+i we have . If this procedure is repeated each time the sequence terminates. so that U is an eigenbasis of A. if hj+ij = 0. shows that the only way for an Arnoldi decomposition to be reduced is for the Krylov sequence to terminate prematurely. Let A be the corresponding eigenvalue and let Then Because H\ is unreduced. Conversely. Let H be the leading principal submatrix of Hk of order k and let U be the matrix consisting of the first j columns of Uk+i • Then it is easily verified that AU — UH. the sequence can be continued by setting wj+i equal to any normalized vector in the orthogonal complement of Uj.5. It follows that w = 0. Then lZ(Uk) contains an eigenvector of A. It is natural to ask under what conditions an Arnoldi-like relation can be satisfied with a reduced Hessenberg matrix. a contradiction. . along with Theorem 3. The following theorem answers that question. the matrix Uk+iH\ is of full column rank. Chapter 4.302 CHAPTER 5. say that hj+ij — 0. Theorem 1. Computing Arnoldi decompositions The Arnoldi decomposition suggests a method for computing the vector Uk+i from HI . Write the fcth column of (1. Uk. First suppose that Hk is reduced.

Reorthogonalization Although we derived (1. Chapter 2.2. where it is shown that in the presence of inexact arithmetic cancellation in statement 3 can cause it to fail to produce orthogonal vectors. 1. It is illustrated in the following extension of (1. The cure is a process called reorthogonalization. This algorithm has been treated extensively in §1:4. It is worthwhile to verify that this process is the same as the reduction to Hessenberg form described in §3.9) from the Arnoldi decomposition. KRYLOV DECOMPOSITIONS Since we must have 303 and These considerations lead to the following algorithm for computing an expanding set of Arnoldi decompositions. the extended algorithm therefore orthogonalizes it again. VolII: Eigensystems . accumulating any changes in h^.1. We call this procedure the Arnoldiprocess.9). the computation of wjt+i is actually a form of the well-known Gram-Schmidt algorithm.4.SEC. Since the vector v in statement 3 need not be orthogonal to the columns of Uk.

1 —uses a function gsreorthog (Algorithm 1:4. Hence.1: The Arnoldi step In general.304 Given an Arnoldi decompositioi CHAPTER 5. one can compute an Arnoldi decomposition of any order.g.. . KRYLOV SEQUENCE METHODS of order k.3) that handles exceptional cases and tests whether to reorthogonalize. . In general r will be one or two. we can often dispense with the reorthogonalization entirely. Practical considerations The work in computing an Arnoldi step is divided between the computation of the vector-matrix product Au and the orthogonalization. The user must provide a method for computing the former.2.1. Hence: To compute an Arnoldi decomposition of order k from scratch. However.1. . & — 1 . one must apply the algorithm for orders i = 1. The cost of the orthogonalization will be 2nkr flam. this algorithm is quite satisfactory—usually no more than the one reorthogonalization performed in statements 4-6 is needed.13). where r is the number of orthogonalization steps performed by gsreorthog. and its cost will vary. Algorithm 1. what to do when v in statement 3 is zero. e. For more see Example 1. . there are some tricky special cases.1 repeatedly. Chapter 2. Thus: . The algorithm uses an auxiliary routine gsreorthog to perform the orthogonalization (Aleorithml:4. our final algorithm—Algorithm 1. By applying Algorithm 1. More important. ExpandArn extends it to an Arnoldi decomposition of order k + 1.

Example 1.k+i must be almost orthogonal to it. Hence if there is an accurate approximation to any vector in /Cjt. KRYLOV DECOMPOSITIONS 305 It requires nk floating-point words to store the matrix U.2. restricts our ability to compute Arnoldi decompositions. For our example. then as k grows we will quickly run out of memory. where the concern is with one eigenpair. Orthogonalization and shift-and-invert enhancement We have seen in connection with the inverse power method (§1. this means that the first component of Uk will rapidly approach zero as good approximations to ei appear in £*.SEC. Similarly. Thus the effects observed in the figure are due to the combination of the large eigenvalue and the orthogonalization step. more than operations counts.12). the first component of Uk will not become exactly zero but will be reduced to Vol Hi Eigensystems . the second component of Uk will approach zero. Figure 1. the sequence will tend to produce good approximations to those large eigenvalues. With Krylov sequences. e)] versus k for several values ofi. although more slowly. The larger the value ofi. so that the effects of rounding error in the formation of the matrix-vector product AiUk are minimal. as the following example shows. and transfers between backing store and main memory will overwhelm the calculation. Of course we can move the columns of U to a backing store—say a disk. Thus shift-and-invert enhancement can be used to steer a Krylov sequence to a region whose eigenvalues we are interested in. where the concern is often with several eigenpairs. at least in part. Chapter 2) that replacing A by (A — K!)~I will make eigenvalues near K large. 1. But each Arnoldi step requires all the previous vectors. the Krylov subspaces /C&(A t -. If (A — ft/)""1 is used to form a Krylov sequence. then v. In fact. The convergence of sin 9 toward zero proceeds typically up to a point and then levels off.fck(Ai. is in the Arnoldi procedure. Chapter 4 suggests that it will also produce approximations to 62. but what seems to be happening is the following. Now in the presence of rounding error. In this example we have simulated a shift-and-invert transformation by giving A{ a large eigenvalue of 10*. This fact. For the inverse power method. If n is very large. the closer the shift is to the eigenvalue the better the performance of the method. we have taken A{ to be diagonal.1 plots sin 0. Let be of order 30 with the last 29 eigenvalues decaying geometrically. as good approximations to e2 appear. The problem. e) can be expected to quickly develop very good approximations to GI. the sooner the convergence stagnates. an excessively accurate approximation to one eigenvalue can cause the convergence to the other eigenpairs to stagnate. In any Krylov sequence the Arnoldi vector u^+i is orthogonal to /Cjfe.5. Moreover. The bound (3. where 0 = ^[G2. No formal analysis of this phenomenon has appeared in the literature. For i > 1.

let A be Hermitian and let be an Arnoldi decomposition. Thus we have effectively changed our rounding unit from 10~16 to 10l~16. KRYLOV SEQUENCE METHODS Figure 1.1: Stagnation in a Krylov sequence the order of the rounding unit—say 10~16. 1. When Auk is reorthogonalized. But this nonzero first component is magnified by 10* when Auk is formed. But since Tk = U^AUk.306 CHAPTER 5. this spurious first component will introduce errors of order 10*~16 in the other components. and we can expect no greater accuracy in the approximations to the other eigenvectors. LANCZOS DECOMPOSITIONS Lanczos decompositions When A is Hermitian the Amoldi decomposition and the associated orthogonalization process simplify. The moral of this example is that if you wish to use shift-and-invert enhancement to find eigenpairs in a region.2. It follows that Tk is tridiagonal and can be . Then the Rayleigh quotient Tk is upper triangular. do not let the shift fall too near any particular eigenvalue in the region. Tk is Hermitian. Specifically.

More generally. • The body of the loop requires one multiplication by A. 1. To see this. the Lanczos decomposition is essentially unique. from the jth column of (1.12) a Lanczos decomposition. and two subtractions of a constant times a vector. For more on variants of the basic algorithm..4).2. Both work well. • A frequently occuring alternative is to calculate This is the equivalent of the modified Gram-Schmidt algorithm (see §1:4. KRYLOV DECOMPOSITIONS 307 We will call the decomposition (1. see the notes and references. note that the first column of the Lanczos decomposition (1. which is implemented in Algonthm 1. two inner products. Hence: Vol II: Eigensystems . Here are three comments.SEC. w^. • Since /2j_i is taken to be real. we do not have to conjugate it in statement 5. provided that the matrix Tk is unreduced. This will always be true if the sequence does not terminate.. The Lanczos recurrence The fact that Tk is tridiagonal implies that the orthogonalization Uk+i depends only on Uk and t/fc-i. Like the Arnoldi..12) is or Moreover.12) we get the relation where This is the Lanczos three-term recurrence. from the orthonormality of HI and w 2 it follows that «i = v^Aui and we may take0i = \\Au\ — ^ui\\2. . in contrast to the Arnoldi procedure where Uk depends on u\.1.

However. we can discard the Uj as they are generated and compute Ritz values from the Rayleigh quotient T^. KRYLOV DECOMPOSITIONS An Arnoldi (or Lanczos) decomposition can be regarded as a vehicle for combining Krylov sequences with Rayleigh-Ritz refinement. Algorithm 1. where we will treat the Lanczos algorithm in detail. if we reorthogonalize Uj+\ in the Lanczos against its predecessors. there are formidable obstacles to turning it into a working method. it also permits the efficient computation of the residual norms of the Ritz pairs. This algorithm generates the Lanczos decomposition where I\ has the form (1. In the Arnoldi process we were able to overcome this problem by reorthogonalization. because of its uniqueness. As we shall see.11) for the Arnoldi process.308 CHAPTER 5. 1. A more serious problem is loss of orthogonality among the vectors Uj. It exhibits an orthonormal basis for a Krylov subspace and its Rayleigh quotient—the wherewithal for computing Ritz pairs. we lose all the advantages of the Lanczos recurrence. If only eigenvalues are desired.13). The above is less by a factor of about k than the corresponding count (1. We will return to this problem in §3. KRYLOV SEQUENCE METHODS Let HI be given. But if one wants to compute eigenvectors. we can do little more with it.2: The Lanczos recurrence The operation count for Algorithm 1. Although the simplicity and economy of the Lanczos recurrence is enticing.2 is k matrix-vector multiplications and 4nk flam. For . However.3. one must retain all the vectors u3.. and for large problems it will be necessary to transfer them to a backing store.

we can write IfUk+i is orthonormal. KRYLOV DECOMPOSITIONS 309 example. like an Arnoldi decomposition. Thus a Krylov decomposition has the general form as an Arnoldi decomposition.6. We call the H(Uk+i) the SPACE SPANNED BY THE DECOMPOSITION and Uk+i the basis for the decomposition. Let Vol II: Eigensystems .SEC. but the matrix B^ is not assumed to be of any particular form. Definition 1. In this subsection we will show how to relax the definition of an Arnoldi decomposition to obtain a decomposition that is closed under similarity transformations of its Rayleigh quotient. Specifically. Here is a nontrivial example of a Krylov decomposition. 1.16) that IT In particular. specifies a relation among a set of linearly independent vectors that guarantees that they span a Krylov subspace. It is uniquely determined by the basis Uk+i> For if (V v) is any left inverse for Uk+i. we say that the Krylov decomposition is ORTHONORMAL. Let u\. Bk is a Rayleigh quotient of A. Two Krylov decompositions spanning the same space are said to be EQUIVALENT. then it follows from (1. there is no unreduced Arnold! form whose Rayleigh quotient is triangular. • • . we make the following definition. Definition A Krylov decomposition. 1^2. Example 1.7. Uk+i be linearly independent and let A KRYLOV DECOMPOSITION OF ORDER k is a relation of the form Equivalently..

as the above example shows. Every Arnold! decomposition is a Krylov decomposition. However. Then on postmultiplying by Q we obtain the equivalent Krylov decomposition We will say that the two Krylov decompositions are similar.18) leaves the vector Uk+i unaltered. If the columns ofKk+i(A. which will play an important role in our treatment of the Arnoldi method (see §2. To establish these facts we will introduce two classes of transformations that transform a Krylov decomposition into an equivalent Krylov decomposition. every Krylov decomposition is equivalent to a (possibly unreduced) Arnoldi decomposition.310 CHAPTER 5. We have already met with a similarity transformation in the proof of the uniqueness of the Arnoldi decomposition [see (1. In that case. the transforming matrix Q necessarily satisfied \Q\ = I so that the two similar decompositions were essentially the same. HI) are independent. in which case we will say that the decomposition is a KrylovJordan decomposition.6)]. We can transform this vector as follows. however. Translation The similarity transformation (1. With a general Krylov decomposition. In the Krylov decomposition let . In this terminology an Arnoldi decomposition is an orthonormal Krylov-Hessenberg decomposition and a Lanczos decomposition is an orthonormal Krylov-tridiagonal decomposition. If the Arnoldi decomposition is unreduced. They are called similarity transformations and translations. Similarity transformations Let be a Krylov decomposition and let Q be nonsingular. then is a Krylov decomposition whose basis is the Krylov basis.2). KRYLOV SEQUENCE METHODS be of order k. we could reduce B to Jordan form. and the subspace of the original Krylov decomposition must be a nondegenerate Krylov subspace. For example. but not every Krylov decomposition is an Arnoldi decomposition. There is also a Krylov-Schur decomposition. however. then it is essentially unique. we are free to reduce the Rayleigh quotient B to any convenient form.

Let U = UR be the OR factorization of U. Then the translated decomposition is an equivalent orthonormal Krylov decomposition. and by Theorem 1.8 is constructive. Then is an equivalent decomposition. Then the equivalent decomposition is a possibly reduced Arnoldi decomposition. 1. Uk+i is constrained to have components along the original vector Uk+i. Next let be a vector of norm one such that UHu = 0. let Q be a unitary matrix such that 6HQ = ||&||2eJ and QHBQ is upper Hessenberg. Finally.8.4 71 (U) contains no invariant subspaces of A. • The proof of Theorem 1. Note that translation cannot produce an arbitrary vector Uk+i. these algorithms will be reasonably stable. We say that it was obtained from the original decomposition by translation. KRYLOV DECOMPOSITIONS 311 where by the independence of the vectors Uj we must have 7 ^ 0 . Equivalence to an Arnold! decomposition We will now use the transformations introduced above to prove the following theorem. There are two points to be made about this theorem. each step in the proof has an algorithmic realization.3 it is essentially unique. We begin with the Krvlov decomposition where for convenience we have dropped the subscripts in k and k+1.SEC. If the original basis Uk+i is reasonably well conditioned. Proof. in which the matrix U is orthonormal. To summarize: Vol II: Eigensystems .Instead. so that the two decompositions span the same space. Then it follows that Since K[(Uk Wfc+i)] = TZ[(Uk v>k+i)]> this Krylov decomposition is equivalent to the original. • If the resulting Amoldi decomposition is unreduced. Every Krylov decomposition is equivalent to a (possibly reduced) Arnoldi decomposition. then by Theorem 1. Given the original decomposition. Theorem 1.

. KRYLOV SEQUENCE METHODS Figure 1.. If n is large. transformations on the Rayleigh quotient are negligible .2. Chapter 2. so that (1. Let H\ be a Householder transformation such that 6HH\ = /tefc. The course of the reduction is illustrated in Figure 1. . This algorithm illustrates an important fact about algorithms that transform Krylov decompositions. We now reduce H\BHi to Hessenberg form by a variant of the reduction described in §2.19) is an Arnoldi decomposition. in which the matrix is reduced rowwise beginning with the last row. it is sometimes necessary to pass from an orthonormal Krylov sequence to an Arnoldi sequence.Then the Krylov decomposition has a Hessenberg Rayleigh quotient. Let the Krylov decomposition in question be where B is of order k. Algorithm 1. . Because the last rows of HI.Hk-i.312 CHAPTER 5.3 implements this scheme. H^-\ are ei. Now let Q = HiH2 • • . Reduction to Arnoldi form As we shall see.3. This can be done by Householder transformations as follows.2: Rowwise reduction to Hessenberg form Each Krylov sequence that does not terminate at k corresponds to the class of Krylov decompositions that are equivalent to its Arnoldi decomposition. It uses a variant ofhousegen that zeros the initial components of a vector.

KRYLOV DECOMPOSITIONS 313 Let be an orthogonal Krylov decomposition of order k. 1. The algorithm uses the routine housegen2(x. */). w. which takes an ra-vector x and returns a Householder transformation H — I — uvP such that Hx — ven [see (2. Algorithm 1. This algorithm reduces it to an Arnoldi decomposition.20).SEC. Chapter 2].3: Krylov decomposition to Amoldi decomposition Vol II: Eigensystems .

4.4. Chapter 4. where U is overwritten by U*Q.16). we will assume that is a orthonormal Krylov decomposition.3). Thus the bulk of the work in Algorithm 1. Chapter 4]. However.iiI)U whose singular value is smallest [see (4. Uw) is a harmonic Ritz pair if . Harmonic Ritz vectors By Definition 4. From (1. We could have accumulated the Householder transformations directly into U.11. Thus the computation of a refined Ritz vector can be reduced to computing the singular value decomposition of the low-order matrix B^. we introduced refined and harmonic Ritz vectors. This process requires nk2 flam. The purpose of this subsection is to give the details.314 CHAPTER 5. In general. 1.3 and 4. the pair (K + <!>.20) we have If we set we have Since (U u)is orthonormal. which would give the same operation count. Refined Ritz vectors If \i is a Ritz value then the refined Ritz vector associated with n is the right singular vector of (A . their computation simplifies.3 is found in the very last statement. In what follows. COMPUTATION OF REFINED AND HARMONIC RITZ VECTORS In §§4. the right singular vectors of (A — ^iI)U are the same as the right singular vectors of B^. But the possibilities for optimizing cache utilization are better if we calculate U*Q as above (see §1:2. if the defining subspace for these vectors is a Krylov subspace. both are somewhat expensive to compute. Chapter 4. KRYLOV SEQUENCE METHODS compared to the manipulation of U.

which is by far the most widely used. the Lanczos recurrence preceded the Arnoldi procedure.15) is an example.However. If we are willing to assume that (B — K/)H is well conditioned. we have opted for the slightly vaguer word "decomposition. he was aware that accurate approximations to certain eigenvalues could emerge early in the procedure. and indeed the relation (1.5. KRYLOV DECOMPOSITIONS and 315 It follows tha Thus we have reduced the problem of computing w to that of solving a small generalized eigenvalue problem. The two described here emerge with a clean bill of health.. whose algorithm [157. He called his method the method of minimized iterations because orthogonalizing Au against u\. . we can multiply by (B — K/)~ H to get the ordinary eigenvalue problem 1. 279). but instead used the recurrence to generate the characteristic polynomial of Tk.12) of the Lanczos decomposition is due to Paige [195]. but the others may not perform well.1951] seems to have been to generalize the onesided Lanczos process to non-Hermitian matrices. the factorization is concealed by the second form (1. As indicated above (p.. there are several variants of the basic three-term recursion. What we have been calling an Arnoldi decomposition is often called an Arnoldi factorization. Paige [197] has given a rounding-error analysis of these variants. noting that the Rayleigh quotient is Hessenberg.4).2) is a factorization of AUk. from which he computed eigenvalues. whose ground-breaking thesis helped turn the Lanczos algorithm from a computational toy with promise into a viable algorithm for large symmetric eigenproblems. which can be done by the QZ algorithm (see §4. As might be expected. 1. Uk minimizes the norm of the result. Consequently.. NOTES AND REFERENCES Lanczos and Arnold! Contrary to our order of presentation. of which (1." Vol II: Eigensystems .3. Lanczos did not compute approximations to the eigenvalues from the matrix Tk (which he never explicitly formed). Lanczos also gave a biorthogonalization algorithm for non-Hermitian matrices involving right and left Krylov sequence. Chapter 2).1950] has had a profound influence on the solution of large eigenproblems. Arnoldi's chief contribution [5. It is due to to Cornelius Lanczos (pronounced Lantzosh).SEC. The matrix form (1.

000. we can use Algorithm 1. THE RESTARTED ARNOLDI METHOD Let be an Arnoldi decomposition. In principle. proceeds via an orthonormal Krylov decomposition. Uwo requires almost 10 megabytes of memory to store in IEEE double precision. for large problems our ability to expand the Arnoldi decomposition is limited by the amount of memory we can devote to the storage of Uk.1 to keep expanding the Arnoldi decomposition until the Ritz pairs we want have converged. Unfortunately.1 and 2. in fact.2. the algorithm will spend most of its time moving numbers back and forth between the disk and main memory. Most notably.5 shows. An alternative is to restart the Arnoldi process once k becomes so large that we cannot store Uk.As k increases. as Example 1. some of these Ritz pairs will (we hope) approach eigenpairs of A.This. We then turn to practical considerations: stability. KRYLOV SEQUENCE METHODS Shift-and-invert enhancement and orthogonalization The fact that a shift that closely approximates an eigenvalue can degrade convergence to other eigenpairs does not seem to be noted in the literature. Moreover. then. convergence. . to be treated in the next section. see §2.316 CHAPTER 5. the shift has to be extremely accurate for the degradation to be significant. However. is the way the Arnoldi method is used in practice. Sorensen's implicitly restarted Arnoldi algorithm [246]. Since disk transfers are slow.5. One possible reason is that one does not generally have very accurate shifts.For example. 2. We have already noted that the Ritz pairs associated with Uk can be computed from the Rayleigh quotient Hk. These methods are the subjects of §§2. some of these ideas are implicit in some algorithms for large eigenproblems and large linear systems. and deflation. However. For more. which is then truncated to yield an Arnoldi decomposition. in order to orthogonalize Auk against the previous vectors we will have to bring them back into main memory. as are the systematic transformations of the resulting Krylov decompositions. if A is of order 10. In this chapter we will introduce two restarting methods: the implicit restarting method of Sorensen and a new method based on a Krylov-Schur decomposition. Krylov decompositions The idea of relaxing the definition of an Arnoldi decomposition is due to Stewart [267]. A possible cure is to move the vectors Uk to a backing store—say a disk—as they are generated.

and suppose that we are interested in calculating the first k of these eigenpairs.1. For example. We will begin by motivating this process and then go on to show how the QR algorithm can be used to implement it.. A n lie in an interval [a. If the eigenvalues \k+i. 1.. Filter polynomials For the sake of argument let us assume that A has a complete system of eigenpairs (A... it turns out.. Ajt lie without. THE IMPLICITLY RESTARTED ARNOLDI METHOD 317 The process of restarting the Arnold! method can be regarded as choosing a new starting vector for the underlying Krylov sequence. is a special case of a general approach in which we apply a filter polynomial to the original starting vector. . 2... The emphasis is on filtering out what we do not want. 6] and the eigenvalues AI./*m)« It should be observed that both choices accentuate the negative. There are many possible choices for a filter polynomial. . desired eigenvalues near the interval [a. and suppose that /ifc+i.. . VolII: Eigensystems . This. 6] in the first of the above alternatives will not be much enhanced.. while what we want is left to fend for itself. /zm.. but two suggest themselves immediately. 6]. Such a polynomial is called a filter polynomial.. Then u\ can be expanded in the form If p is any polynomial.. we have If we can choose p so that the values p(\i) (i = fc-f 1. 2. Let wi be a fixed vector. By the properties of Chebyshev polynomials (see Theorem 3... we can take p to be a Chebyshev polynomial on [a.. THE RESTARTED ARNOLDI METHOD 2. ... . jLtm correspond to the part of the spectrum we are not interested in.. Xj).7.SEC. Chapter 4). n) are small compared to the values p(A. A natural choice would be a linear combination of Ritz vectors that we are interested in. this polynomial has values on [a. 6]. then u will be rich in the components of the Xi that we want and deficient in the ones that we do not want. Then takep(i) = (t . .) (i = 1. k).pk+i) • • • ( * .. 6] that are small compared to values sufficiently outside [a. Suppose we have at hand a set of Ritz values fii.

let us look at how it will be used. If.1 to expand this decomposition to a decomposition of order m. we have where Note that H^. We begin with an Arnoldi decomposition We then use Algorithm 1. and the implicit restarting process is used to reduce the decomposition to a decomposition of order & with starting vector p( A )ui. Starting with the shifted decomposition we write where Q\R\ is the QR factorization of Hm — «i/. Specifically. p is of degree m—k. There are five points to note about this procedure. At this point a filter polynomial p of degreeTO—fcis chosen. The contraction phase works with a factored form of the filter polynomial. they are expensive to apply directly. let We incorporate the factor t — Ki into the decomposition using an analogue of the basic QR step [see (2. for example. Chapter 2]. This technique is called implicit restarting. This process of expanding and contracting can be repeated until the desired Ritz values have converged. However. we can use shifted QR steps to compute the Krylov decomposition of ICk[p(A)ui] without any matrix-vector multiplications. is the matrix that results from one QR step with shift KI applied to Hm. then p(A)ui requires ra—k matrixvector multiplications to compute. if HI is the starting vector of an Arnoldi decomposition of order m. KRYLOV SEQUENCE METHODS Whatever the (considerable) attractions of filter polynomials. Before describing how implicit restarting is implemented. The integer m is generally chosen so that the decomposition can fit into the available memory. Postmultiplyingby Qi.1). we get Finally. .318 Implicitly restarted Arnoldi CHAPTER 5.

We may now repeat this process with K^? • • • . 2.K\I).2.1) by ei and remembering that RI is upper triangular.SEC. The matrix Hm is Hessenberg. r^' is nonzero. The matrix Q\ is upper Hessenberg. 5. This follows from the fact that it can be written in the form where PJ-IJ is a rotation in the (j—1. The first column of Um = UmQ\ is a multiple of (A . and at most its last subdiagonal element is zero. only the last two components of b^m'+l are nonzero. For on postmultiplying (2. This follows from the proof of the preservation of Hessenberg form by the QR algorithm. THE RESTARTED ARNOLDI METHOD 319 1. Thus the decomposition (2. The matrix lib is orthonormal. Km-Jt. 4. j)-plane (see Figure 2.The result will be a Krylov decomposition with the following properties. 3. The vector b^f+l = e^Qi is the last row of Q and has the form i.3) has the Wilkinson diagram presented below for k = 3 and ra = 6: Vol II: Eigensystems .e. Chapter 2).. we get Since Hm is unreduced. which establishes the result. 2.

KRYLOV SEQUENCE METHODS where the q's are from the last row of Q. Implementation We have described the contraction algorithm for the implicitly restarted Arnoldi method in terms of the explicitly shifted QR algorithm.. KJ. it is natural to use the implicit double shift procedure. The only new feature is that a double shift adds an additional two subdiagonals to the matrix Q and hence moves the truncation index back by two.m consists of the first k columns of Um » HM cipal submatrix of order k of Hm~ »and g^m is from the matrb Hence if we set ls me leading prin- then is an Arnold! decomposition whose starting vector is proportional to (A—Ki 1} • • -(A— Ac m _jb/)Mi. . • The algorithm requires 0(m 2 ) work. Here are some comments.1 is an implementation of one step of filtering. we will only describe the implicit single step method..320 CHAPTER 5. However. but we also get its Arnoldi decomposition of order k for free. • The algorithm is a straightforward implementation of the single implicit QR step. For convenience we separate the final truncation from the implicit filtering. we can truncate the decomposition as above to yield a decomposition of order m—j. since when n is large the applications ofFilterArn will account for an insignificant part . After we have applied the shifts «!. For single shifts there is no reason not to use this procedure. We will call the integer m—j the truncation index for the current problem.. A precise operation count is not necessary. when A is real and the shift is complex. The elegance and economy of this procedure is hard to overstate. From this diagram it is easy to see that the first k columns of the decomposition can be written in the form where Uj. For simplicity. A bit of terminology will help in our description of the algorithm. We have thus restarted the Arnold! process with a decomposition of order k whose starting vector has been filtered by p(A). Algorithm 2. For large n the major cost will be in computing UQ. Not only do we avoid any matrix-vector multiplications in forming the new starting vector.

this algorithm reduces the truncation index by one. Algorithm 2. Given a shift AC. The transformations generated by the algorithm are accumulated in the raxm matrix Q.1: Filtering an Arnold! decomposition Vol II: Eigensystems . THE RESTARTED ARNOLDI METHOD 321 Let be an Arnoldi decomposition of order m.SEC. 2.

If we implement the product U*Q [:. The restarted Arnold! cycle We are now in a position to combine the expansion phase with the contraction phase. Algorithm 2. KRYLOV SEQUENCE METHODS and an integer m > k. However. so that the expansion phase requires between n(m2-k2) flam and 2n(ra 2 -fc 2 ) flam. the operation count is nm k flam. It is much cheaper to accumulate them in the smaller matrix Q and apply Q to U at the end of the contraction process. If this is taken into . In fact. we have ignored the opportunity to reduce the count by taking advantage of the special structure of Q. this algorithm expands the decomposition to one of order m and then contracts it back to one of order k. 1: k] naively.2: The restarted Arnoldi cycle of the work in the overall Arnold! algorithm. the elements of Q below the m-k subdiagonal are zero. This is done in Algorithm 2.2.322 Given an Arnoldi decomposition CHAPTER 5. The expansion phase requires m—k matrix-vector multiplications. the bulk of the work in this algorithm is concentrated in the expansion phase and in the computation of U*Q[l:k]. • Transformations are not accumulated in U. From (1. When n is large.10) we know that the Arnoldi step requires between Inj flam and 4nj flam.

The implicitly restarted Arnoldi method can be quite effective.SEC. but the conventional wisdom is to take m to be Ik or greater. tan Z (ei. VolII: Eigensy stems . .1. The implicitly restarted Arnoldi method was run with k = 5.95. A 5. whereas the former requires only 1.8e-02 4. 3.952. For purposes of exposition we have assumed that m and k are fixed. . Let A = diag(l. Yet the latter requires 2. The implicitly restarted Arnoldi method is almost as good as the full Krylov sequence. as the following example shows. These counts are not easy to interpret unless we know the relative sizes of m and k.500 words of floating-point storage.9e-03 8. Thus if we take m = 2k.. 1 :k] is formed. the Ritz vector from the restarted subspace).4e-03 8. 2. 4.3).0e-07 RV 1. as long as m is not so large that storage limits are exceeded. But experience has shown it to be the best general method for non-Hermitian matrices when storage is limited. the full Krylov subspace). . 2. In fact. Example 2. THE RESTARTED ARNOLDI METHOD 323 consideration the count can be reduced to n(km .0e-03 1. and starting vector e.f fc2). . we end up with the following table (in which RO stands for reorthogonalization). However.8e-08 IRA 9. .95") (the matrix of Example 3. Axu 10 15 20 25 1.7e-02 2. Chapter 4). There is no general analysis that specifies an optimal relation. m = 10..2e-01 5. tan Z(ei.8e-05 8.0e-07 The columns contain the following information: the number of matrix-vector multiplications.95 3 .9e-05 4.7e-05 8. tan Z(ei. so that in spite of the operation counts it may be faster (see §1:2. they can be varied from cycle to cycle. the naive scheme permits more blocking. the restarted Arnoldi subspace).1.. The results are summarized in the following table.000. The main conclusion to be drawn from this table is that it does not much matter how U*Q [:. It would be rash to take one offhand example as characteristic of behavior of the implicitly restarted Arnoldi method.

.e. .3) has the form where T^m *) is an upper triangular matrix with Ritz values pk+i. fJ. Pm on its diagonal. Corollary 2. Hk must contain the Ritz values that were not used in the restarting process. Because Hm~ is block triangular. Let {im.3. If the implicitly restarted QR step is performed with shifts /u m . It follows that the last row of RQ is zero. Hence H . Chapter 2.k+i > then the matrix Hm~ ' in (2. Hence the last diagonal element of R must be zero.2. Specifically.RQ -f p.1 has the form (2. If one QR step with shifty is performed on H.2 we get the following result. When this is done the Rayleigh quotient assumes a special form in which the discarded Ritz values appear on the diagonal of the discarded half of the Rayleigh quotient. .324 Exact shifts CHAPTER 5. The first part of the QR step consists of the reduction of (H . But H — fil is not of full rank... The matrix Hk in (2. that all but the last of the subdiagonal elements of RQ are nonzero. Moreover. KRYLOV SEQUENCE METHODS We have already mentioned the possibility of choosing Ritz values—the eigenvalues of H — as shifts for the implicitly restarted Arnoldi method. and let pbean eigenvalue ofH. This reduction is illustrated in Figure 2. This means that we can choose to keep certain Ritz values and discard others.f i l ) to triangular form.. fik+i be eigenvalues of Hk. This fact is a consequence of the following theorem. Let H bean unreduced Hessenberg matrix. from which it is evident that all but the last of the diagonal elements of R are nonzero.. Theorem 2. • • • . where HU is unreduced. then the resulting matrix H has the form where HU is unreduced.6). at the end of the expansion phase the Ritz values—i.8) is the Rayleigh quotient for the implicitly restarted Arnoldi algorithm.3. it follows from the representation of the formation of this product in Figure 2.. Proof. Chapter 2. the eigenvalues of Hm — are calculated and divided into two classes: . • By repeated application of Theorem 2..2.

KRYLOV-SCHUR RESTARTING The implicitly restarted Arnoldi algorithm with exact shifts is a technique for moving Ritz values out of an Arnoldi decomposition.3 is that in exact arithmetic the quantity H [k+l. Fortunately.2 is a classic example of the limitations of mathematics in the face of rounding error. H[k+l. This phenomenon.1. its (TO. TO—l)-element can be far from zero. We now turn to a discussion of this phenomenon. In this way the restarting process can be used to steer the Arnoldi algorithm to regions of interest. where an eigenvalue with a positive real part indicates instability. if a Krylov decomposition can be partitioned in the form Vol II: Eigensystems . In stability computations. In cannot generate instability in the Arnoldi process itself. it should be kept in perspective. A consequence of Corollary 2. we might retain the Ritz values having the largest real part. the matrix that one computes with rounding error can fail to deflate — i. Given an exact shift. First. if we are using shift-and-invert enhancement.2. has been and is used with great success to solve large eigenproblems. there is a danger of an unwanted Ritz value becoming permanently locked into the decomposition. In this subsection we describe a more reliable method for purging Ritz values. We must continue to work with the entire matrix H as in Algorithm 2. if a particular step is forward unstable.SEC.2 is zero. In the presence of rounding error. Although forward instability is a real phenomenon. k] in statement 11 of Algorithm 2. when n is large. and it is natural to retain the largest Ritz values. so that the computation of fl and u can be simplified. the Arnoldi relation AU = UH + flu will continue to hold to working accuracy. An important algorithmic implication of forward instability is that we cannot deflate the decomposition after the application of each exact shift. At worst it can only retard its progress. it is easier to move eigenvalues around in a triangular matrix than in a Hessenberg matrix. Second. however. the definitive implementation of the implicitly restarted Arnoldi method. The latter are used as shifts in the restarting algorithm.5). we cannot guarantee that the undesirable Ritz values have been purged from the decomposition. Forward instability Theorem 2. As we have seen. That is why ARPACK.. which is called forward instability of the QR step. 2. For example. In fact. 2.e. does not mean that we cannot use Algorithm 2. However.k] can be far from zero. forward instability in the QR step can compromise the progress of the algorithm (though not its stability). It is based on two observations. as we shall see later (Theorem 2. the work saved by deflation is negligible. we will be interested in the largest eigenvalues. In fact.2. THE RESTARTED ARNOLDI METHOD 325 those we wish to retain and those we wish to discard.

However.326 CHAPTER 5. it is not enough to be able to exchange just eigenvalues.2). and throw away the rest of the decomposition. . Let where Su is of order n. move the desired eigenvalues to the beginning. There are four cases to consider. let a triangular matrix be partitioned in the form where If we can find a unitary transformation Q such that then the eigenvalues su and s22 m the matrix will have traded places. If the eigenproblem in question is real. We will call the process Krylov-Schur restarting. then we can move any eigenvalue on the diagonal of a triangular matrix to another position on the diagonal by a sequence of exchanges. In other words a Krylov decomposition splits at any point where its Rayleigh quotient is block triangular. we will generally work with real Schur forms. Exchanging eigenvalues and eigenblocks Since the Krylov-Schur method requires us to be able to move eigenvalues around in a triangular matrix. More precisely. we will begin with a sketch of how this is done.(i = 1. The first thing to note is that to move an eigenvalue from one place to another it is sufficient to be able to exchange it with either one of its neighboring eigenvalues. Our restarting algorithm then is to compute the Schur decomposition of the Rayleigh quotient. KRYLOV SEQUENCE METHODS then AUi = U\B\\ + ub^ is also a Krylov decomposition. Thus we must not only be able to exchange eigenvalues but also the 2x2 blocks containing complex eigenvalues. If we have an algorithm to perform such exchanges. To derive our algorithms we will consider the real case..

Let Let y be a normalized left eigenvector corresponding to Su and let Q = (X y) be orthogonal. the two matrices must have the same eigenvalues (in fact. 2. Chapter 4. and let Q = (X Y) be orthogonal. and similarly for Su. Chapter 2].SEC. Then The last case. Then by Theorem 1. Then it is easily verified that Note that although Su need not be equal 611. The third case can be treated by the variant of Schur reduction introduced in connection with the QR algorithm [see (2. is more complicated. Vol II: Eigensystems .6. THE RESTARTED ARNOLDI METHOD 327 We will now sketch algorithms for each case. they are similar). X) be an orthonormal eigenpair.12. Chapter 1). Let x be a normalized eigenvector corresponding to 522 and let Q = (x Y) be orthogonal. Let the matrix in question be where Su is of order one or two. Let Let (522. HI = n^ — 2. The first two cases can be treated by what amounts to the first step in a reduction to Schur form (see the proof of Theorem 1. Since £22 is the Rayleigh quotient X^SX it must have the same eigenvalues as 622.2).

to compute P but instead note that (2. the matrix E must be set to zero. the matrix Q can be taken to be a Householder transformation (or a plane rotation when n\ — n<2 = 1). we can compute the QR decomposition In particular. Once the eigenvector has been computed. The Krylov-Schur cycle We will now describe the basic cycle of the Krylov-Schur method.328 CHAPTER 5. we get the Sylvester equation which can be solved for P. Consequently. Specifically. Chapter 2). To keep the exposition simple.9) is a set of four linear equations in four unknowns which can be solved by any stable method. Once we have computed P. Q — (X Y) can be represented as the product of two Householder transformations. The algorithms for the first three cases have stable implementations. The procedure is to compute a nonorthonormal eigenbasis and then orthogonalize it. they can be computed stably by solving a 2 x 2 system of equations. In this case we do not use Algorithm 1. is to compute the eigenbasis X. so that we do not have to be concerned with real Schur forms. The algorithm for the fourth case is not provably stable. For more details on these algorithms see the LAPACK subroutine xLAEXC. Chapter 1. . KRYLOV SEQUENCE METHODS The problem. Specifically let the eigenbasis be (PT /)T.9. For the other two. Then From the first row of this equation. For the first case. then. where P is to be determined. after the transformation the matrix Q^SQ will have the form To complete the algorithm. and it can be applied to the matrix in which S is embedded. although it works well in practice and its failure can be detected. we will assume that the matrix in question is complex. if E is too large—say ||£||oo > lOH^Hc^M— tnen me algorithm is deemed to have failed. the eigenvector can be computed by inspection.1. but replacing zero pivot elements with a suitably small number will allow the computation to continue (see Algorithm 2. These methods are stable in the usual sense. It is possible for this system to be singular.

the savings are not worth the trouble. one chooses a starting vector. the expanded Rayleigh quotient now has the form illustrated below for k = 4 and m = 8: Writing the corresponding Krylov decomposition in the form we now determine an orthogonal matrix Q such that Sm = Q^TmQ\s upper triangular and transform the decomposition to the form Finally. For large n the work in the contraction phase is concentrated in the formation of the product ?7*Q[:. Vol II: Eigensystems . Thus the implicitly restarted Arnoldi cycle and the Krylov-Schur cycle are essentially equivalent in the amount of work they perform. 2.3 implements this scheme. Algorithm 2. Here are two comments. In fact. • The work involved in the expansion phase is the same as for the Arnoldi process.10) S is already partially reduced. When n is large. • In the computation of the Schur decomposition in statement 7.SEC. However.5) shows this is about the same as for implicit restarting. THE RESTARTED ARNOLDI METHOD The Krylov-Schur cycle begins with a Krylov-Schur decomposition 329 where Uk+i is orthonormal and Sk is upper triangular.1) to implement it. we use the exchange algorithms to transform the eigenvalues we want to keep to the beginning of Sm and truncate the decomposition. As (2. expands it to an Arnoldi decomposition. To get such a decomposition initially. minor savings can be had in the initial reduction of S to Hessenberg form. The expansion phase is exactly like the expansion phase for the Amoldi method. !:&]. and transforms it by a unitary similarity to triangular form. one can use ExpandArn (Algorithm 1. since by (2.

Algorithm 2. this algorithm expands the decomposition to one of order m and then contracts it back to one of order k.330 CHAPTER 5. KRYLOV SEQUENCE METHODS Given a Krylov-Schur decomposition of order k and an integer ra > k.3: The Krvlov-Schur cycle .

Proof. l:k] and VQ[:. First note that the expansion phase results in equivalent decompositions. If the same Ritz values are discarded in both and those Ritz values are distinct from the other Ritz values.2 and statement 11 in Algorithm 2. Suppose that an implicitly restarted Arnoldi cycle is performed on P and a Krylov-Schur cycle is performed on Q. THE RESTARTED ARNOLDI METHOD The equivalence of Arnold! and Krylov-Schur 331 Given that the implicitly restarted Arnoldi method with exact shifts and the KrylovSchur method both proceed by moving Ritz values out of their respective decompositions. That is indeed the case. Now assume that both algorithms have gone through the expansion phase and have moved the unwanted Ritz values to the end of the decomposition. l:k] and Qib = <?[:. Vol II: Eigensvstems . l:k] are the same (see statement 14 in Algorithm 2.SEC. since ll(U) = 11>(V) and in both cases we are orthogonalizingthe the vectors Au.!:*]. it is natural to conclude that if we start from equivalent decompositions and remove the same Ritz values then we will end up with equivalent decompositions. For brevity. then the resulting decompositions are equivalent.4. Since V = UW for some unitary W. Thus it makes sense to say that both methods reject the same Ritz values. Then we must show that the subspaces spanned by UP[:. set Pk = P[:. ^m+i are the same up to multiples of modulus one. At this point denote the first decomposition by and the second by Note that for both methods the final truncation leaves the vector u unaltered. Theorem 2. Let be an unreduced Arnoldi decomposition and let be an equivalent Krylov-Schur form. In fact. we have Thus H and 5 are similar and have the same Ritz values. although it requires some proving. Let P be the unitary transformation applied to the Rayleigh quotient in P and let Q be the one applied to the Rayleigh quotient of Q. A2u. • • • .3). We must show that the subspaces associated with the final results are the same. 2. the vectors Uk+2 • • • ^m+i and Vk+2.

Hence there is a unitary matrix R such that Qk = W^PkR. We then have It follows that VQk and UPk span the same subspace. Likewise. and Algorithm 1.3 used to return to an Arnoldi decomposition. 2. Although this seems like a lot of work. In particular. For example. T^(Qk) spans the eigenspace Q of Schur vectors of S corresponding to the retained Ritz values. the matrix W^Pk spans Q.332 CHAPTER 5. This fact is not at all obvious from the algorithmic description of the Krylov-Schur method. The former can restart with an arbitrary filter polynomial. it effectively costs no more than a contraction step. The Krylov-Schur method can be used as an adjunct to the implicitly restarted Arnoldi method. Stability Provided we take due care to reorthogonalize. the algorithms of this section are stable in the usual sense. Since W^W = H. We will see in the next subsection that a similar procedure can be used to deflate converged Ritz pairs. On the other hand. if n is large and we delay multiplying transformations into the decomposition until the end of the process. the decomposition can be reduced to Krylov-Schur form. the Krylov-Schur method is restarting with a vector filtered by a polynomial whose roots are the rejected Ritz values. We will first treat the behavior of the methods in the presence of rounding error. we will present our results in the more general notation of the latter. . if the latter produces a persistent unwanted Ritz value. something the latter cannot do. KRYLOV SEQUENCE METHODS By construction Tl(Pk) spans the eigenspace P of Schur vectors of H corresponding to the retained Ritz values. when it comes to exact shifts the Krylov-Schur method is to be preferred because exchanging eigenvalues in a Schur form is a more reliable process than using implicit QR steps to deflate.3. Comparison of the methods The implicitly restarted Arnoldi method and the Krylov-Schur method each have their place. the offending eigenvalue swapped out. Finally. AND DEFLATION In this subsection we will consider some of the practicalities that arise in the implementation of the Arnoldi method. STABILITY. We will then discuss tests for the convergence of approximate eigenpairs. Since the Arnoldi and Krylov-Schur decompositions are both orthonormal Krylov decompositions. By hypothesis each of these eigenspaces are simple and hence are unique (page 246). • This theorem says that the implicitly restarted Arnoldi method with exact shifts and the Krylov-Schur method with the same shifts are doing the same thing. CONVERGENCE. we will show how to deflate converged eigenpairs from a Krylov decomposition.

This theorem has three implications. • Since B = UHAU is a Rayleigh quotient of A. • The generic constant 7 grows only slowly as the algorithm in question progresses.11) and (2. Let 333 be an orthogonal Krylov decomposition computed by the implicitly restarted Arnoldi method or the Krylov-Schur method.5. Thus we can take the output of our algorithms at face value. they will be accurately computed.13) — that depends on the size of the problem and the number of cycles performed by the algorithm. the eigenpairs (//.-) of B are primitive Ritz pairs of A + E. THE RESTARTED ARNOLDI METHOD Theorem 2. Vol II: Eigensystems . Alternatively. there is a matrix F satisfying such that For a sketch of the the proof. Then there is an orthonormal matrix Uk+i satisfying and a matrix E satisfying such that Here 7 is a generic constant—in general different in (2. see the notes and references. Thus the algorithm can be continued for many iterations without fear of instability. w. • The vectors Uk^i are near the Ritz vectors UkwkIt follows that unless the Ritz pairs are very sensitive to perturbations in A.SEC.. 2.

and the system must be solved by a direct sparse method or by an iterative method. then the condition number \\AK\\ \\A~l \\ will be large and will magnify the multiplier (7||AK||||A~1||) in (2. z] is that the residual r = Az — JJLZ . However.5. we can say that the computed solution will satisfy where Depending on the stability properties of the direct solver or the convergence criterion used by the iterative method. there is little in the literature to guide us. First. something we have already noted in Example 1. the constant 7 can be larger than its counterpart in (2. Unfortunately. it is possible for the constant 7 to be large. Second.334 CHAPTERS. To relate the backward error in (2. In most applications AK will be large and sparse. and we must take into account the errors generated in solving this system. can further magnify the error. All this suggests that one must be careful in using shifts very near an eigenvalue of A.15). is combined with the Arnoldi process.14) to A"1.14) propagates to A~l. the above analysis is quite crude. due to the shift being near an eigenvalue of A. To put things less formally. as we have already pointed out. Convergence criteria We have already stressed in connection with the power method (p. However.12).12). if K is very near an eigenvalue of A. The treatment of each of these methods is worth a volume in itself. KRYLOV SEQUENCE METHODS Stability and shift-and-invert enhancement When shift-and-invert enhancement with shift K. and in practice accurate shifts may be less damaging than it suggests. the vector v = (A — K!)~IU will be computed by solving the system where AK = A — K/. 61) that an appropriate criterion for accepting a purported eigenpair (/^.12) in two ways. instability in a direct solver or a lax convergence criterion in an iterative method can result in a comparatively large backward error in solving the system AKv = u: 111 conditioning of A K . which replaces A in (2. we must show how F in (2. From the first-order expansion we see that the backward error in AK l is approximately bounded by This bound compares unfavorably with (2.

if (M. not to mention the arithmetic operations involved in computing ZM. (2. where U is from an orthonormal Krylov decomposition. • If (M. so that 6H = /tejj-. the residual (2. It is easily verified that Since There are three comments to be made about this theorem. Thus if for some tolerance tol.16) would appear to be expensive to compute. • We have chosen to work with eigenspaces because that is what is produced by complex eigenvalues in a real Schur form.8. where the (normwise) relative error in A is tol. the residual is cheap to compute.SEC.19) reduces to ||6HW||2. and the expression (2. Theorem 2. Z) is an exact eigenpair of A. Moreover: Vol II: Eigensystems . Z) is an approximate orthonormal eigenpair with a small residual then by Theorem 2. However. Z) is an exact eigenpair of A + E.6. W) is a primitive Ritz pair. there is a matrix E satisfying such that (M. as the following theorem shows. if Z = UW. More generally. Z) = (M. then ||^||F is just the norm of the last row of W. Let be an orthonormal Krylov decomposition and let (M. then BW — WM = 0. THE RESTARTED ARNOLDI METHOD 335 be sufficiently small.18) provides a convergence criterion for determining if eigenpairs produced by an algorithm have converged. then (M. Chapter 4. For if Z has p columns. 2. If the decomposition is an Arnoldi decomposition. At first glance. we will be concerned with ordinary eigenvalues and their eigenvectors. But for the most part. Then Proof. Thus. the computation requires p matrix-vector multiplications. UW) be an orthonormal vair.

However.17) relates to the shifted matrix. then the formula (2. when checking the convergence of refined or harmonic Ritz vectors. the criterion becomes Convergence with shift-and-invert enhancement As in the case of stability. by Theorem 2. and from (2. not the refined or harmonic Ritz value. Thus we might say that a Ritz pair (Af. then In terms of the original matrix A.6.336 CHAPTER 5. if we are concerned with the approximate eigenpair (^. the approximate eigenvalue is K + // l. In practice. the norm of the Ritz block M is often a ballpark approximation to m||p. KRYLOV SEQUENCE METHODS where Uk is the last component ofw. However. Chapter 4.I\\p and ||A||F are of the same order of magnitude. Specifically. • If (M.21) for Ritz pairs . since we are generally concerned with exterior eigenvalues. z). Hence our convergence test becomes This is exactly the same as the criterion (2. when shift-and-invert enhancement is used—say with shift K — the residual R in (2. one should use the Rayleigh quotient.22) we have the residual It follows that for the relative backward error in A not to be greater than tol. we must have Now it is reasonable to assume that ||A — K. we may not be able to compute \\A\\p—especially if we are using shift-and-invert enhancement. W) is not a Ritz pair.19) cannot be simplified. it will be minimized when M is the Rayleigh quotient W^BW. In particular. UW) has converged if When the concern is with eigenvectors rather than eigenspaces.

The expansion phase does not change. we have AU\ = UiB\\. There are two advantages to deflating a converged eigenspace. and let (Af. Let (W W_\_) be unitary. If the order of BU is small. Let be an orthonormal Krylov decomposition. Z) — (M. Theorem 2. Of course. the savings will be marginal. The following theorem relates the residual norm of an approximate subspace to the quantities we must set to zero to deflate. this gives algorithms the opportunity to compute more than one independent eigenvector corresponding to a multiple eigenvalue. we will never have an exact eigenspace in our decompositions. and set and Then Vol El: Eigensystems . the savings become significant. so that U\\ spans an eigenspace of A.SEC. THE RESTARTED ARNOLDI METHOD Deflation 337 We say a Krylov decomposition has been deflated if it can be partitioned in the form When this happens. we can apply all subsequent transformations exclusively to the eastern part the Rayleigh quotient and to (C/2 Us). and we end up with a decomposition of the form Now since B\\ is uncoupled from the rest of the Rayleigh quotient. 2. The second advantage of the deflated decomposition is that we can save operations in the contraction phase of an Arnoldi or Krylov-Schur cycle. In particular. UW) be an orthonormalpair. but as its size increases during the course of the algorithm.7. First by freezing it at the beginning of the Krylov decomposition we insure that the remaining space of the decomposition remains orthogonal to it.

using (W W±). we may set #21 and 61 to zero to get the approximate decomposition Thus we have deflated our problem by making small changes in the Arnoldi decomposition.23) with equality if and only if M is the Rayleigh quotient W^BW. Stability of deflation The deflation procedure that leads to (2. • To see the consequences of this theorem.26). Now if the residual norm \\AZ . we have The theorem now follows on taking norms.20). Let CHAPTER 5.26) is backward stable. KRYLOV SEQUENCE METHODS From (2. we transform the Krylov decomposition AU — UB = ub** to the form tHEN BY (2.ZM\\p is sufficiently small. and. we get the following relation: . suppose that AZ — ZM is small. If we restore the quantities that were zeroed in forming (2.338 Proof.

p). write the deflated decomposition (2.21) if and only if M is the Rayleigh quotient ZHAZ = WR B W. Theorem 2. the residual for Z becomes VolII: Eigensystems .then(A + E)U = UB + utP.26) in the form Then there is an E satisfying such that Equality holds in (2. THE RESTARTED ARNOLDI METHOD If we write this decomposition in the form 339 then IfwesetE = -RUH. Nonorthogonal bases In practice we will seldom encounter a converging subspace unless it is a two-dimensional subspace corresponding to an imaginary eigenvalue in a real Schur decomposition.. 2. the residual R — AZ — ZM must be small because the individual residuals are small.8. This theorem says that the deflation technique is backward stable. and the vectors in these pairs cannot be guaranteed to be orthogonal. Unfortunately. .7. Instead we will be confronted with converged.. Deflating an approximate eigenspace with residual norm p corresponds to introducing an error of no more than p'm A. If we arrange the vectors in a matrix Z and set M = diag(//i. normalized Ritz pairs (/^-. (J.. Under the hypotheses of Theorem 2.. p) of one kind or another. Since the deflation procedure only requires an orthonormal basis for the approximate eigenspace in question...SEC. . Z{) (i = 1. We may summarize these results in the following theorem. we can compute a QR factorization of Z and use the orthonormal basis Z in the deflation procedure.

and 622 ls tne Schur form that remains after the contraction phase. If the vectors are nearly dependent. Deflation in the Krylov-Schur method The deflation procedure is not confined to eigenpairs calculated by a Rayleigh-Ritz procedure. However. Although the procedure is roundabout. \\S\\2 will be large. The procedure is to reduce #22 to Schur form. say. The resolution of this paradox is to remember that we are not deflating two vectors but the subspace spanned by them. if Ritz vectors are the concern. KRYLOV SEQUENCE METHODS where M = SMS~l. . it can be used to deflate harmonic Ritz vectors or refined Ritz vectors. we accumulate the transformations in an auxiliary matrix Q and apply them to U only at the end of the deflation process.340 CHAPTER 5. and use Algorithm 1. it is not expensive. It may seem paradoxical that we could have.3 to bring the decomposition back to an Arnoldi form. Deflation in an Arnoldi decomposition Deflating an Arnoldi decomposition proceeds by reducing it to a Krylov-Schur decomposition. two vectors each of which we can deflate. but which taken together cannot be deflated. Algorithm 2.4 implements this scheme. If. After a cycle of the algorithm. Consequently. For example. This procedure can be repeated as often as desired. this is an inexpensive algorithm. all we have to do to deflate is to verify that that component satisfies our deflation criterion. there is an easy way to deflate Ritz vectors in the Krylov-Schur method. we can exchange it into the (1. they must be very accurate to determine their common subspace accurately. if n is large. If some other diagonal element of 622 ls a candidate for deflation. at the end of the contraction phase the Arnoldi decomposition will have the form where #22 is upper Hessenberg.1)-element of £22 we must set the first component of 62 to zero. as usual. If the columns of Z are nearly dependent. To deflate the the Ritz pair corresponding to the (1. Specifically. let the current decomposition have the form Here U\ represents a subspace that has already been deflated. deflate as in the Krylov-Schur form. Imposition of 622 > and test as above. and the residual may be magnified—perhaps to the point where the deflation cannot be performed safely.

The criterion (2. Choose tol to be something that gives a backward error that one can live with. 2. The difficulty is that the choice is highly problem dependent and must be left to the user. The reason is that the computations are potentially so voluminous that it is unreasonable to continue them beyond the accuracy desired by the user. There are two possibilities for choosing tol. The reason is that it is comparatively inexpensive to do so. 1. where a is some convenient estimate of the size of A. their implementations seldom use an explicit parameter to determine convergence. allow both a user-specified option and a default opVol II: Eigensystems . In that case.SEC. Algorithm 2. on the other hand. 2. For some i > j. Instead they iterate until the limits set by finite precision arithmetic are reached.5). Algorithms for large eigenproblems. let the Ritz value sa be a candidate for deflation. The advantage of this choice is that one may save calculations by not demanding too much. usually allow the user to furnish a tolerance to decide convergence. since the operations in the algorithms already impose a backward error of the same order (see Theorem 2. THE RESTARTED ARNOLDI METHOD Let the Krylov-Schur AU = US + ubH have the fora 341 where Su is of order j. the eigenpair comes from a perturbed matrix with relative error tol.4: Deflating in a Krylov-Schur decomposition Choice of tolerance Although algorithms for dense eigenproblems are necessarily iterative. Good packages. such as ARPACK. Choose tol to be a modest multiple of the rounding unit. There is no point in choosing tol smaller.18) suggests stopping when the residual norm of the eigenpair in question is less than tol-a. This algorithm determines if the value can be deflated and if so produces a deflated decomposition. It is therefore necessary to say how such a criterion should be chosen.

4.8). This subsection is devoted to describing how this is done. when the desired part of the spectrum is spread out. However. One way of doing this is to restart with a new shift and a new vector. Krylov sequences. it is interesting to see what lies behind it. We will now describe an algorithm for doing this.K/2/)"1 without having to perform an inversion. Suppose we have a Krylov sequence If we set v = (A — Ki/) 1 u.342 CHAPTER 5. See the discussion on page 339. it may necessary to use more than one shift. However. it should be made somewhat larger to allow multiple vectors to be deflated. Chapter 4) It follows that Equation (2. the Amoldi decomposition may still contain useful information about the eigenvectors near the new shift—information that should be kept around. and shifts Although the computational algorithm is easy to derive. Thus the problem posed above amounts to restarting the Krylov sequence with the new vector w. 2. However.29) says that the Krylov subspace in (A — KI/) 1 with starting vector u is exactly the same as the Krylov subspace in (A — K^)"1 with a different starting vector w.3. It is a surprising fact that we can change a Krylov decomposition from one in (A . the tolerance for deflation should of the same order of magnitude as the tolerance for convergence. KRYLOV SEQUENCE METHODS tion based on the rounding unit. when the new shift is not far removed from the old.KI!)~I to one in (A . RATIONAL KRYLOV TRANSFORMATIONS The principle use of shift-and-invert transformations in Arnoldi's method is to focus the algorithm on the eigenvalues near the shift K. . inverses. then the sequence with its terms in reverse order is so that Moreover by the shift in variance of a Krylov sequence (Theorem 3. Since deflation is backward stable (see Theorem 2.

2. As usual. the application of R~l can magnify the residual error in the decompositions. the transformations to U can be saved and applied at the end of the algorithm. If R is ill conditioned. • Since /+ («i — KZ )H is upper Hessenberg. • The algorithm is quite economical involving no manipulations of A. the Q in the factorization in statement 1 can be computed as the product of k plane rotations. Algorithm 2. • The algorithm is not unconditionally stable. Then This is a Krylov decomposition. As one gets better approximations to an eigenvalue Vol II: Eigensystems . which can be reduced to an Arnoldi decomposition by Algorithm 1.K2H).SEC.5 implements this procedure. THE RESTARTED ARNOLDI METHOD The rational Krylov method Let us begin with an orthonormal Arnoldi decompositioi 343 Then Henc where To reduce this relation to a Krylov decomposition.3. In particular. Here are some comments. this structure makes it easy to preserve deflated vectors in the initial part of the decomposition. At first sight it might appear that the rational Krylov transformation is a good way of getting something for nothing. let be a QR decomposition of / + (KI .

Equivalent block methods for several eigenvalues exist. is was Saad [230. The former problem is relatively easy: just restart with the best available approximation to the desired eigenvector.344 Given an Arnold! decomposition CHAPTER 5. and filter polynomials Although the Arnoldi method is indeed due to Arnoldi [5]. Consequently. To put it another way. NOTES AND REFERENCES Arnoldi's method. pp. the poorer the new starting vector. §FV. 2. Nor does the method save the trouble of computing a factorization.2]. Once the eigenvector has been found it can be removed from the problem by various deflating techniques. 1980] who first presented it as an effective algorithm for computing eigenpairs of large nonHermitian matrices. See [45] and [93]. provided the number of eigenvalues is less than the block size. the closer the new shift is to an eigenvalue. this algorithm computes an Arnoldi decomposition Algorithm 2. one can use them as shifts and get a sequence of Krylov decompositions with increasingly effective shift-and-invert enhancers.5. The flaw in this argument is that the rational Krylov transformation does not change the Krylov subspace. 584-602] and [233. restarting. .5: The rational Krvlov transformation of A. KRYLOV SEQUENCE METHODS of order k and a shift ^2. the accuracy of eigenvector approximations is not affected. the argument goes. If one wants to continue the sequence. one must be able to apply (A . which have been surveyed in [300.K2/)-1 to a vector. It is important to distinguish between restarting the Arnoldi process to compute a single eigenpair and to compute several eigenpairs simultaneously.

One solution is for the caller to specify a subroutine to perform the operation in the calling sequence. but in 1990 [233. In an elegant paper Morgan [180. which can only be done with messy global variable structures. THE RESTARTED ARNOLDI METHOD 345 The problem is far different when it is desired to restart the Arnoldi process while several eigenvectors are converging. Lehoucq. Sorensen. Reverse communication All routines for solving large eigenvalue problems require that some operation be performed involving the matrix in question and a vector." In it he described the implicitly restarted Arnoldi algorithm as presented here. (For other convergence results see [160]. p. a well-coded and well-documented package that bids fair to become the standard for large non-Hermitian eigenproblems. a more complicated operation may be required — e. including exact and inexact shifts. 2. A happy exception is the implicitly restarted Arnoldi algorithm.) ARPACK New algorithms for large eigenproblems are usually implemented in MATLAB for testing but seldom end up as quality mathematical software. building on a design of Phuong Vu. The idea of following a sequence of Arnoldi steps with the application of a filter polynomial to purify the restart vector is due to Saad [232. An alternative — called reverse communication—is for the general-purpose routine to return to the calling program with a flag specifying the desired operation. Exchanging eigenvalues Stewart [256] produced the first program for exchanging eigenvalues in a real Schur form.1984]. the "implicit" in implicit restarting does not refer to its use of implicit QR steps but to the fact that the starting vector is chosen implicitly by the shifts in contraction procedure. performs the operation and returns the results via another call to the general-purpose routine. The problem with this procedure is that once again one must choose a vector to purify. have produced ARPACK [162].g.1996] showed why. 180] he pointed out that a good combination is hard to find. However. once this subroutine is called. The simplest example is the computation of Ax. Saad [230] proposed restarting with a linear combination of approximate eigenvectors from the Krylov subspace. however. Vol II: Eigensystems . The breakthrough came with Sorensen's 1992 paper [246] entitled "Implicit Application of Polynomial Filters in a fc-Step Arnoldi Method. To restart with a single approximate eigenvector would destroy the information accumulated about the others. In 1982.SEC. The method is to iterate QR steps with an exact shift until the process converges. (By the way.) He also gives convergence results for the case of a fixed filter polynomial and the case of exact shifts for a symmetric polynomial. The problem for a general-purpose routine is how to get the caller to perform the operations in question. it must get data from the original program.. for shift-and-invert enhancement. The calling program. and Yang. which knows its own data.

KRYLOV SEQUENCE METHODS The method is stable but can fail to converge when the separation of the eigenvalues in the blocks is poor. Sleijpen. To deal with the problems it poses for the iteratively restarted Arnoldi method. In fact. an unwanted Ritz pair can fail to be purged from the decomposition. unwanted Ritz pair. Fokkema. which shows that they can never yield a decomposition that is not based on a Krylov sequence. it is awkward to deflate converged Ritz pairs. But it is easily proved. From the properties of the Gram-Schmidt algorithm with reorthogonalization [51. Also see [181]. Lehoucq and Sorensen [158] propose a procedure for modifying the Krylov subspace so that it remains orthogonal to the left eigenvector of a converged. which is then truncated to become an Arnoldi decomposition. Stathopoulos. it is called the Krylov-Schur method. The algorithm described here is due to Bai and Demmel [11]. a special case of our Arnoldi-Schur decomposition. Second. and Wu [248] point out that since the unpreconditioned Jacobi-Davidson algorithm is equivalent to the Arnoldi algorithm. and van der Vorst [74] explicitly use Schur vectors to restart the Jacobi-Davidson algorithm. for symmetric matrices Wu and Simon [306] exhibit what might be called a Lanczos-spectral decomposition. Ch. It has been implemented in the LAPACK routine xTREXC. Stability Theorem 2. but the failures do not seem to occur in practice. both resulting from its commitment to maintaining a strict Arnoldi decomposition. It too can fail. What distinguishes the Krylov-Schur approach is the explicit introduction and systematic manipulation of Krylov decompositions to derive and analyze new algorithms. it is easy to show . Regarding the use of Schur vectors.5 has not appeared in the literature. Lehoucq [personal communication] has written code that uses Schur vectors in deflation. Closer to home. Although the method is new. The Krylov-^Schur method The iteratively restarted Arnoldi procedure has two defects. Saad. Since the method is centered around a Schur decomposition of the Rayleigh quotient. First.346 CHAPTER 5. This lead to attempts to formulate noniterative algorithms. The manipulations are justified by Theorem 1. solves both these problems by expanding the class of transformations that can be performed on the underlying decomposition.8. precursors of various aspects can be found in the literature. 3]. Forward instability The forward instability of the QR step has been investigated by Parlett and Le [207] for symmetric tridiagonal matrices and by Watkins [289] for Hessenberg matrices.117] and the general rounding-error analysis of unitary transformations [300. Schur vectors can also be used to restart the latter. The introduction of general orthonormal Krylov decompositions. the contraction step of the iteratively restarted Arnoldi method produces an orthonormal Krylov decomposition. due to Stewart [267].

THE LANCZOS ALGORITHM We have seen that if A is symmetric and is an Arnoldi decomposition. Convergence Converge criteria for the Arnoldi processes are discussed in [161]. THE LANCZOS ALGORITHM 347 that the residual AU ~ UB — w6H of the computed decomposition is of the order of the rounding error. Chapter 4. 3. Deflation Deflating converged Ritz values from an Arnoldi decomposition is a complicated affair [158]. and it is not clear that the procedure can be extended to arbitrary vectors. Shift-and-invert enhancement and the Rayleigh quotient Nour-Omid.2. The analysis of convergence criteria for shift-and-invert enhancement is new. this is the problem that lead me to orthonormal Krylov decompositions. The rational Krylov transformation The rational Krylov method is due to Ruhe [221].8. The deflation technique for Arnoldi-Schur decompositions is closely related to Saad's method of implicit deflation [233. such as refined and harmonic Ritz vectors.SEC. who described it for the generalized eigenvalue problem.3]. From this one constructs (7. and that the departure of U from orthonormality is of the same order. absorbs its error in the residual. 3. and throws the error back on A as in Theorem 2. Parlett. and Jensen [184] discuss the effect of accurate shifts on the Rayleigh quotient and conclude that the graded structure of the Rayleigh quotient makes it a nonproblem. then the Rayleigh quotient 1 k is a tndiagonal matrix of the form Vol II: Eigensystems . §VI. In fact. Ericsson. The fact that it may be impossible to deflate a group of several individually deflatable eigenvectors appears to be known but has not been analyzed before.

There are two problems associated with . In §3.348 CHAPTER 5. The alternative is to save the Lanczos vectors in a backing store and move them into main memory to reorthogonalize the new Lanczos vectors—presumably only when absolutely necessary. we will assume throughout this sftrtinn that: However. The purpose of this section is to describe the practical implementations of this method. and the chief problem in implementing a Lanczos method is to maintain a satisfactory degree of orthogonality among them. after which orthogonalization is performed. KRYLOV SEQUENCE METHODS This implies that the vectors Uk can be generated by a three-term recurrence of the form where When we combine this recursion with the Rayleigh-Ritz method. we get the Lanczos method. we reserve the right to insert a factor of \\A\\ in our bounds whenever it seems appropriate. however. A key to deriving Lanczos-based algorithms is knowing when quantities can be neglected. There are two solutions. Mathematically. when they are of the order of the rounding unit compared with A. they can lose orthogonality. a Lanczos decomposition. As in the Arnoldi algorithm.3 we will consider semiorthogonal algorithms. In both cases the result of the orthogonalization is something that is not. in which the orthogonality of the Lanczos basis is allowed to deteriorate up to a point. REORTHOGONALIZATION AND THE LANCZOS ALGORITHM Lanczos with reorthogonalization In finite precision arithmetic the vectors Uk produced by the Lanczos process will deviate from orthogonality. strictly speaking.e.. This fact is one of the main complications in dealing with the Lanczos algorithm in inexact arithmetic. To avoid cluttering our expressions with factors of \\A\\. i. the Lanczos vectors Uj computed by the Lanczos algorithm must be orthogonal. We will then treat the restarted Lanczos algorithm in which orthogonalization is performed at each step. In practice. 3.1. since the reorthogonalization causes the Rayleigh quotient to become upper Hessenberg instead of tridiagonal. The first is to reorthogonalize the vectors at each step and restart when it becomes impossible to store the Lanczos vectors in main memory. We will begin by describing a general form of the algorithm that encompasses most of the variants that have appeared in the literature. we can reorthogonalize a straying Lanczos vector Uk against its predecessors.

to restart or to deflate converged Ritz vectors. it obviously will have to stop at some point—e. the time taken to transfer Lanczos vectors from the backing store will dominate the calculation. Algorithm 3. The following comments should be read carefully. 1. • Depending on how the decision is made to reorthogonalize and the on details of the reorthogonalization process.1: Lanczos procedure with orthogonalization this procedure. since we have to save them anyway to compute Ritz vectors. we can make two compromises. we can agree to tolerate a prescribed degree of nonorthogonality so that we do not have to reorthogonalize at every step. • Although we have written a nonterminating loop. 3.. Second. 2. Full reorthogonalization: v is always reorthogonalized against all its predecessors. the preceding vectors must be stored and then brought back into the fast memory of the machine. Algorithm 3. To avoid this problem. we can reorthogonalize Uk only against those vectors to which it is actually nonorthogonal. They are a useful guide to what follows.g. THE LANCZOS ALGORITHM 349 Given a starting vector HI this algorithm computes a sequence of suitably orthogonalized Lanczos vectors. First.1 sketches a general procedure that encompasses most of the variants of the Lanczos algorithm. this code encompasses five common variants of the Lanczos method. No reorthogonalization: v is accepted as is at each iteration. But if a reorthogonalization is performed at each Lanczos step. First. VolII: Eigensystems .SEC. The actual storing of these vectors is not a problem.

Partial reorthogonalization: v is reorthogonalized against a subset of its predecessors when it fails a test for orthogonality. it does not affect the coefficients significantly: the loss of orthogonality corresponds to errors in ak and /^-i that are below the rounding threshold. Chapter 4. • Statements 9-11 attempt to enforce global orthogonality—that is. there are effective work-arounds for this problem. In what follows we will tacitly assume that this is done whenever necessary. See §3. The third problem is that reorthogonalization contaminates the Lanczos decomposition. We do this by keeping a running estimate of the inner products U^Uk (see Algorithm 3. 5.7. If eigenpairs outside this eigenspace are desired. the Krylov sequence must be restarted. This is not even an Arnoldi decomposition. which is done in statements 7-8. since with periodic. Nonetheless. Note that the coefficients o^ and (3k-i are not updated during this process. Selective reorthogonalization: v is reorthogonalized against a subspace of the span of its predecessors determined by the current Ritz vectors. partial. In the next two subsections we will treat full and periodic reorthogonalization.Thus we use the term "reorthogonalize" in statement 10 in a general sense—to be specified in the context of the variant in question.3. KRYLOV SEQUENCE METHODS 3.2). • It may happen that /3k becomes so small that it can be neglected. The second problem is how to reorthogonalize. and selective reorthogonalization the matrix Uk+i is not required to be orthogonal to working accuracy. the orthogonality of v against Uk-i • There are three problems here. so that instead of a Lanczos decomposition we have the relation where Hk is upper Hessenberg. Specifically. 4. In that case Uk effectively spans an eigenspace of A. but only when it fails a test for orthogonality. It will be useful to fix our nomenclature and notation at this point. Statements 7-8 attempt to enforce local orthogonality of v against Uk and u^-i. we simply have to to reorthogonalize Uk against Uk-i • Other variants require that we also reorthogonalize Uk-i against Uk-2. To distinguish .350 CHAPTER 5. Periodic reorthogonalization: v is reorthogonalized against all its predecessors. • The loss of orthogonality in the Lanczos procedure is of two types: local and global. the Rayleigh quotient becomes upper Hessenberg. For the other procedures see the notes and references. If we are performing full reorthogonalization. The matrix Hk is a Rayleigh quotient in the general sense of Definition 2. The cure is to reorthogonalize. The first problem is to determine when global reorthogonalization is needed.It turns out that the vector v in statement 6 can fail to be orthogonal to Uk and Uk-i to working accuracy. This is because in spite of the fact that reorthogonalization restores orthogonality.

Thus the main difference between the Lanczos algorithm with full orthogonalization and the Arnoldi algorithm is that the Rayleigh quotient for the former is symmetric (to working accuracy). Note that it is truly a Rayleigh quotient in the sense of Definition 2. Since the elements of Hk below the first subdiagonal are zero. then Hk = VTAUk. . one applies m—k implicit QR steps with shifts «t.2. Given the Lanczos decomposition and m—k shifts KI . We will denote this deviation by SL. This implies that the elements above the superdiagonal in the Rayleigh quotient are negligible and can be ignored.2). Implicit restarting In its larger details implicitly restarting a Lanczos decomposition is the same as restarting an Arnoldi decomposition (see §2. In this subsection we will briefly sketch the resulting simplifications. In what follows the deviation of Hk from the tridiagonal matrix (3. so that 3. Hence Hk is a small perturbation of a symmetric matrix. since if ((V v) T ) is a left inverse of (Uk Wfc+i). the elements above the first superdiagonal must be small.13) in Theorem 2. To see this.1) generated by Algorithm 3. THE LANCZOS ALGORITHM 351 it from the ordinary Rayleigh quotient Tjt we will call it the adjusted Rayleigh quotient. 3.1).SEC. If we set then the restarted decomposition is VolII: Eigensystems .1 will be of paramount importance. . Chapter 4. Ac m _A:.to give the transformed decomposition where Q is the matrix of the accumulated QR transformations. postmultiply the relation (2. FULL REORTHOGONALIZATION AND RESTARTING When the Lanczos algorithm is run in the presence of rounding error with full reorthogonalization the results are stable in the usual sense. The reader is assumed to be familiar with the contents of §2. .5 by Uk to get where Bk is now the matrix Hk in (3. .7.

For large n this makes no difference in efficiency. The chief difference between the Krylov-Schur and the Krylov-spectral methods is in the treatment of the Rayleigh quotient.3. We cannot. if this component satisfies the criterion for deflation. The first half of this subsection is devoted to the definition and consequences of semiorthogonality. Consequently. A particularly satisfying aspect of the Krylov-spectral decomposition is that the deflation of Ritz pairs is uncoupled. The reduction to diagonal form is done by the symmetric tridiagonal QR algorithm. But because the spectrum of A is real. This number can be used as in §2.. .. eigenvalues can be moved around by simple permutations—i. KRYLOV SEQUENCE METHODS The major difference between implicit restarts for the Lanczos and the Arnoldi decompositions is that implicit symmetric tridiagonal QR steps are used in the former (see Algorithm 1.3 to determine the convergence of the ith Ritz pair. we will call this analogue the Krylov-spectral method. For more see the notes and references. Contrast this with the discussion on page 339.e. there is no need for special routines to exchange eigenvalues or eigenblocks. SEMIORTHOGONAL METHODS We turn now to methods that allow the orthogonality of the Lanczos method to deteriorate. of course. Since the Schur decomposition of a symmetric matrix is its spectral decomposition. The fact that one Ritz pair has been deflated does not affect the later deflation of other Ritz pairs. Krylov-spectral restarting The comments on the forward instability of the QR algorithm for Hessenberg matrices (p. . m).. Both general and exact shifts can be used in restarting the Lanczos algorithm. (i = 1..3. However. Chapter 3). it can be set to zero and the decomposition deflated. when exact shifts are to be used in restarting. If this pair is moved to the beginning of the Rayleigh quotient.352 CHAPTER 5. Consequently. its component in 6 follows it. 325) apply to the tridiagonal Rayleigh quotient of the Lanczos algorithm. Convergence and deflation If is the current Krylov-spectral decomposition. The second shows how to maintain semiorthogonality by timely reorthogonalizations. Since the Rayleigh quotient is diagonal. A method that performs reorthogonalization only when there is danger of the loss becoming too great is called a semiorthogonal method. one can be more sophisticated about choosing filter polynomials than with the Arnoldi method. allow orthogonality to be completely lost. it will pay to use the analogue of the Krylov-Schur method. Consequently the residual norm for the ith Ritz vector is the magnitude of the ith component /% of 6. if we suitably restrict the loss we can compute accurate Ritz values and partially accurate Ritz vectors. 3. then the primitive Ritz vectors are the unit vectors e.

will limit the attainable accuracy of our results. Specifically. then Vol II: Eigensystems . 3. Then If we write R = I+ W. Owing to rounding error. the members of these bases will not have norm exactly one. 353 Definition 3. THE LANCZOS ALGORITHM Semiorthogonal bases We begin with a definition. The price we pay is that the errors that were formerly at the level of the rounding unit are now of the order 7max» and this number. we will tacitly assume that all our bases are normalized. Then U is SEMIORTHOGONAL if Three comments on this definition. we would say that U is orthogonal to working accuracy. • Our definition of semiorthogonality is meant to be applied to the bases generated by Algorithm 3. not the rounding unit. from the elementary properties of the 2-norm.C\\2 were of the order of the rounding unit. and let C = UTU.1. Since this small deviation from normality makes no difference in what follows. we have If 11/ . where W is strictly upper triangular. However. However. Thus semiorthogonality requires orthogonality to only half working accuracy. because they are normalized explicitly (statements 12 and 13). they will be normal to working accuracy. • Since the diagonal elements of C are one. there is nothing to keep us from using a larger value of Tmax — saY»to reduce the number of reorthogonalizations.SEC. Let U eRnxh have columns of 2-norm one.1. The QR factorization of a semiorthogonal basis Let U be semiorthogonal and let be the QR factorization U. the definition says that U is semiorthogonal if the elements of / — C are within roughly the square root of the rounding unit. • Choosing 7max to be about the square root of the rounding unit has the convenience that certain quantities in the Lanczos process will have negligible errors on the order of the rounding unit.

KRYLOV SEQUENCE METHODS where the approximation on the right is to working accuracy. Let us suppose that we must reorthogonalize at step k.19)] we can expect it to be of the same order of magnitude. note that 7 is the norm or the projection of Uk+i onto the column space of Uk and hence measures how w^+i deviates from orthogonality to 7£(£/&). We will now show that if our basis is semiorthogonal. Now 7 must be larger than 7max> smce a loss of semiorthogonality is just what triggers a reorthogonalization.354 CHAPTER 5. Then the vector d of reorthogonalization coefficients is Since \\QTq\\2 = 1 and RT £ /. In the Lanczos process it is rare to encounter an extremely small /?£. Consequently. Let where q£lZ(Q) and q_i_£'R(Q)'L have norm one. . Since the elements of the difference Sk of the adjusted Rayleigh quotient Hk and Tk depend on the components of d [see (3. destroying its tridiagonal structure. we have To get some notion of what this approximation means. It follows that up to terms of 0(6 M ) the matrix W is the strictly upper triangular part of C and hence that In particular We will use these facts in the sequel. these elements may be too large to ignore. these elements are insignificant and can be ignored. For full reorthogonalization. Let Uk be semiorthogonal and let Uk = QRbe its QR factorization.2)] that reorthogonalization in the Lanczos algorithm introduces elements in the upper triangular part of the Rayleigh quotient. we can expect the elements of d to be on the order of ^/CM . Let Pk^k+i — Auk — akUk — fik-\uk-\ be the relation obtained after the Lanczos step but before the reorthogonalization. The size of the reorthogonalization coefficients We have observed [see (3.

Theorem 3. Since by (3.5) we have At first glance this result appears to be discouraging. Perhaps more important is the effect of semiorthogonality on the residual norms. there are mitigating factors.2.1. Its proof is beyond the scope of this volume. 3.SEC. Let be the decomposition computed by Algorithm 3. w) is a primitive Ritz pair. Moreover. However.Then The theorem says that up to rounding error. so that most of its elements are much less than one. The effects on the Ritz vectors Although the effects of semiorthogonality on the Rayleigh quotient and its Ritz values are benign. THE LANCZOS ALGORITHM The effects on the Rayleigh quotient 355 We have observed [see (3. Nonetheless. Tk is a Rayleigh quotient associated with the space spanned by Uk. since it says that the errors in the Ritz vector can be as large as y^.2)] that the Rayleigh quotient Hk = Tk + Sk is not the same as the tridiagonal matrix Tk computed by Algorithm 3. Although w has 2-norm one.Thus we can expect the Ritz values from Tk to be as accurate as those we would obtain by full reorthogonalization. we see from (3. as the following remarkable theorem shows. only some of the elements ofR-I are of order y^. Instead we must compute the vector UkW. The problem is that although a primitive Ritz vector w will be accurately determined by Tk (provided it is well conditioned) we cannot compute the corresponding Ritz vector QkW. it can be used to compute accurate Ritz values.1 and assume that Uk is semiorthogonal. its elements decrease as the vector converges.7) that Vol II: Eigensystems . they have the potential to alter the corresponding Ritz vectors. Thus the product of (R — I)w may be considerably less than y7^. since it is these numbers that we use to decide on convergence. Let Uk — QkRk be the QR factorization ofUk. Now if (//.

which are the inner products 7^ = ujuj. Also write . We will now show how to estimate the loss of orthogonality as the Lanczos algorithm progresses. Although it is not expensive to compute and store Sk (or equivalently Hk). since it is the product SkW that determines the the inflation. Hence the residual norm is Thus the semilarge values in 5^ can inflate the residual norm. since it may be smaller than the right-hand side of (3.356 CHAPTER 5. we will leave this derivation to the end of the section and assume that Sk is available whenever we need it. Let Uk be the Lanczos basis computed by Algorithm 3. Be that as it may. However. since Hk is tridiagonal with a few spikes above the diagonal in the columns corresponding to reorthogonalization steps. the above analysis shows clearly that we cannot rely on the usual estimate \ftk^k\ to decide on convergence.8). The residual does not have to stagnate once it reaches -)/€M.1 and let Then the size of the off-diagonal elements of Ck.and HJ. we can use the inverse power method on Hk = Tk + Sk to correct the vector.7) to derive a recurrence for the 7^-.7) in the form where hj is a quantity of order || A||eM that accounts for the effects of rounding error in the relation. measure the departure from orthogonality of i/. it is almost as expensive to compute the inner products ujuk+i as to perform a full reorthogonalization. Unfortunately. the comments about eigenvectors apply here also. Suppose that we have computed Ck and we want to compute Ck+i • Write the jth column of (3. It turns out that we can use the relation (3.+i violates one of the semiorthogonality conditions at which point Uk+i is reorthogonalized. it has a QR factorization QR with R = I. For the sake of continuity. the formulas for the columns of Sk are not easy to derive. Estimating the loss of orthogonality Algorithms based on semiorthogonality allow the orthogonality to deteriorate until a vector Uf. Another advantage of saving this matrix is that if a particular Ritz pair stops converging.. KRYLOV SEQUENCE METHODS Since Uk+i is semiorthogonal. Instead one must carry along the matrix Sk and use it in computing residual norms. Such a process is cheap.

j. Algorithm 3. Consequently.^ themselves.. Because we maintain local orthogonality.j. and Vol H: Eigensystems .2: Estimation of uT . k—2 we can compute all but the first and last elements of row k+l of Ck.. we can absorb them into the corrections for qjhj and q^hk.j = u£+lv. THE LANCZOS ALGORITHM 357 Given the output of Algorithm 3.The quantities u^UjSj and uJUkSk are in principle computable. rp • By taking absolute values in (3. It requires an estimate a of \\A\\2. we can replace qjhj and q^hj. But these bounds grow too quickly to be useful: they will trigger reorthogonalizations before they are needed. subtracting.SEC. the quantity 7^+1. with aeM. But because we are maintaining semiorthogonality. Consequently.The first element can be computed by leaving out the term /^-^j-i in (3. where a is a suitable estimate of ||-A||2.10) by uj. What we need are estimates of the 7. this algorithm computes estimates of the inner products 7fc+i.11).11) we could obtain upper bounds on the loss of orthogonality. 3. experience indicates that \\ht\\2 < \\A\\2€M.. q£hj. Two comments. u k+l"J Multiplying (3. they are also on the order of aeM.UkSj.1.2 implements this scheme. Algorithm 3. and simplifying. u].* = uk+i uk IS °n the order of the rounding unit. Unfortunately. However.9) by u^ and (3. .We adjust the sign of this replacement so that it always increases 7^+1. In exact arithmetic with accurate values of qjhk. the quantities qjhj and q^hk are unknown. we obtain Letting j = 2.

\\2 = 1The orthogonalized vector is u = (I — UU^)u. the row k+l of C depends only on rows kth and k—1.qL£ 1l(Uk}L. the process is more complicated than a full reorthogonalization step. The fact that we have to reorthogonalize both Uk and v follows from the three-term recurrence If either Uk or Uk+i fails to be orthogonal to Uk-i. We must reorthogonalize both uk and v against Uk-i • 2. Moreover. driven upward by the choice of sign of r. Hence the norm of the component of . The reason seems to be that as the loss of orthogonality grows. we only need the row k+lto decide when to reorthogonalize uk+i.1 we do not test the vector Uk+i but its unnormalized counterpart v. To see what is going on. The fact that we may have to repeat the reorthogonalization of v is a consequence of the fact that we are using a semiorthogonal basis to perform the reorthogonalization. Periodic reorthogonalization The ability to estimate the deviation of the Lanczos vectors from orthogonality enables us to be precise about the test in statement 9 in Algorithm 3. The substitution of upper bounds on these quantities turns the 7^ into an estimate. This means we must replace j3k in the algorithm by \\v\\2. There are three differences. The process of reorthogonalizing v and Uk against t^-i may affect the orthogonality of these two vectors.j-i in the recursion (3. let Uk = QR be the QR factorization of Uk • Recall that we can write R = I+W. at any time we need only store two rows—the current row and the preceding row. 1. the lack of orthogonality will propagate to tifc+2 and hence to the subsequent Lanczos vectors.358 CHAPTER 5. • In Algorithm 3. This problem is also reflected by the presence of jk. We must reorthogonalize v against Uk3.11) for the orthogonality estimates. We may have to repeat the orthogonalization of v against Uk • We will treat these items in turn. where q£n(Uk). where to working accuracy W is the strictly upper triangular part of Ck = U^Uk [see (3. we must perform a reorthogonalization step.\\q\\2 = \\q±.j has a value greater than 7max. the 7.4)]. • In the basic recurrence.j would be exact. Suppose we are trying to orthogonalize a vector of the form u = iq + rqt.1. the terms of order a£M has little effect on the recursion. Nonetheless. if for some j the estimate jk+i. Consequently. and \\u\\2 . Hence the lost orthogonality must be restored. experiments show that the 7^ calculated by this algorithm track the true values rather well. KRYLOV SEQUENCE METHODS uJUkSk. Specifically. Because we have allowed our basis to deteriorate.

it is possible for one reorthogonalization to fail to purge the components along H(Uk). we have ||ar||2 = ||QTv||. THE LANCZOS ALGORITHM u lying in 7l(Uk) is the norm of 359 Now ||QTw||2 is the size of the component of it in TZ(Uk)i and (7! is the size of the corresponding component of u. It follows that the reorthogonalization step will tend to reduce the component by a factor of ^/e^. But we only perform a reorthogonalization if 7 is near y^ or larger. and. Since reorthogonalization leaves us with a Uk and Uk+i that are orthogonal to the columns of Uk-i to working accuracy. where the vectors are orthogonal to working accuracy. Now the vector v of reorthogonalization coefficients is x = U^v — RQTUk+i. Owing to limitations of the page size. To see this. Hence if ||QT^||2 > y^. Reorthogonalization also solves the problem of how to restart the recursion for the jij. our reasons for doing so are different.3 and 3. • Note that the algorithm does not renormalize Uk. the danger is that cancellation will magnify otherwise negligible rounding errors. But they are essentially one algorithm. the basis vectors are not orthogonal. and since R is near the identity matrix.||2 to decide whether to reorthogonalize again. The reason is that the reorthogonalization does not significantly change its norm. we have been forced to separate the outer Lanczos loop from the reorthogonalization. as the above analysis shows. Thus. In the semiorthogonal Lanczos method. 3.j should be some multiple of the rounding unit— say -\/n€M.SEC. they have a limited capacity to reduce unwanted components—even when there is no rounding error involved. it may require another orthogonalization. But since v is not semiorthogonal. after the process the elements 7^ and 7jt+i. the first reorthogonalization step may have failed to eliminate the component.4 pull things together. since in this case by semiorthogonality 7 is well below y^. Algorithms 3. The details will depend on the size of the problem and the choice of backing store. which is done by the modified Gram-Schmidt algorithm. It is worth noting that although we perform reorthogonalizations in the Arnoldi method. • The reorthogonalization. In the Arnoldi method. assumes that the previous Lanczos vectors are available. Thus we can use x/||?.This is not a problem for Uk. Here are some comments. When should we reorthogonalize? The size of the component of v in 1^(11k) is T ||<2 -y||2. and the cure is to repeat the orthogonalization. let [in the notation of VolII: Eigensystems .

360 CHAPTER 5. The algorithm also updates the Rayleigh quotient Hk = Tk + Sk [see (3. Algorithm 3.4. The reorthogonalization is done in Algorithm 3. KRYLOV SEQUENCE METHODS This algorithm performs Lanczos steps with periodic reorthogonalization.7)1.3: Periodic orthogonalization I: Outer loop o . It is assumed the last two rows of the matrix Ck estimating U^+1 Uk+i are available.

3. THELANCZOS ALGORITHM 361 This algorithm replaces statements 12-14 in Algorithm 3.3 Algorithm 3.SEC.4: Periodic reorthogonalization II: Orthogonalization step o Vol II: Eigensystems .

14) and (3. It is not at all clear that this is worthwhile. we can obtain Sk by subtracting the a. By semiorthogonality.362 CHAPTER 5. Since \\Uk\\2 = 1. and /?. since the storage is no more than that required for a set of primitive Ritz vectors. The results of the reorthogonalization replace Uk with and v with Solving (3. 72 < €M.12)] Uk = 7#-f T<7j_.16) and define Hk by . we get Also If we substitute (3. we have 72 + r2 = 1.Since by definition Sk = Hk — Tk. Since the reorthogonalized vector is rg_L. • We have chosen to compute the Rayleigh quotient Hk instead of Sk.13).and subdiagonals of Hk• Since reorthogonalization is an exceptional event.15) for Uk and v and substituting the results in (3. it has norm one to working accuracy. Before the reorthogonalization we have the relation which holds to working accuracy. KRYLOV SEQUENCE METHODS (3.. most columns of Hk will be zero above the first superdiagonal. one can store only the nonzero elements in a packed data structure.18) into (3. Hence r = 1 to working accuracy. which must be computed to test for convergence. Updating Hk We must now derive the formulas for updating Hk after a reorthogonalization.from the diagonal and the super. Instead of allocating storage to retain the zero elements.17) and (3.

SEC. 3. Specifically. Hence the difference between the adjusted Rayleigh quotient Hk and theLanczos matrix T* is also 0( y^). the corrections in (3. Specifically.19) consist of coefficients from the reorthogonalization step. If we do not require very accurate eigenvectors. Consequently. Computing residual norms A knowledge of Hk makes the computation of residual norms easy. Vol II: Eigensystems . but the following analysis does not require it.4. Ukw) is an approximate eigenpair in the space spanned by Uk. But for smaller residuals. it does not matter if we use Tk or Hk to estimate its norm. we should use Hk. Thus a knowledge of the adjusted Rayleigh quotient enables us to compute residual norms without having to multiply by A.We make no assumptions about how this pair was obtained—in most cases (ju. w) will be an eigenpair of Tk. which by semiorthogonality are bounded by 0(^l/€^). until the residual falls below y7^. These adjustments have been implemented in statements 21-23 of Algorithm 3. suppose that (/z. THE LANCZOS ALGORITHM then it is straightforward to verify that 363 which is a Krylov decomposition involving the new vectors iik and v. we can avoid accumulating H^. From the relation we have thai where It follows that the residual norm p = \\AUkW — ^UkW\\2 satisfies where bv semiorthoeonalitv Thus estimates p to at least half the number of significant figures carried in the calculation.

.2.28.20). Various statistics are summarized in the following table.8e-16 8. c5: The residual norm estimated by (3.20) is a reliable estimate of the residual norm.364 An example CHAPTER 5.5e-07 1. its eigenvalues converge to eigenvalues of A.0e-15 c2 3. In this example A is a diagonal matrix of order n = 500 whose first eigenvalue \\ = 1 and whose other eigenvalues are given by the recurrence The periodically reorthogonalized Lanczos algorithm was run on this matrix with a random starting vector and 7max = -^/W~l6/n.4e-12 1.QTAQ\\2.4e-12 1.5e-07 1. Ukw^ '). where the pair (^5 .9e-19 c4 1.5e-07 2.9e-15 c3 1.9e-02 2. cl: ||Tjb . as noted above. The third and fourth columns show that the classic estimate for the residual norm deviates from the true residual norm. KRYLOV SEQUENCE METHODS We conclude this subsection with an example that illustrates the behavior of periodic reorthogonalization in the Lanczos algorithm. whereas the former continues to decrease.4e-12 The columns of the table contain the following quantities. In particular the latter stagnates at about 10~~12.5e-15 7. This means that in spite of the contamination ofTk by the reorthogonalization process. The fifth column shows that the approximate residual norm (3. The second column shows that the corrections in Algorithm 3.9e-02 2.4 produce a Krylov decomposition whose residual is of the order of the rounding unit.9e-15 3.9e-15 3. The tridiagonalLanczos matrix Tk is to working accuracy the Rayleigh quotient formed from the Q-factor ofUk. Example 33. c4: The true residual norm.33. Orthogonalizations took place for k = 11. The numbers in the first column illustrate the results of Theorem 3.9e-02 2. Wg ') is the fifth eigenpairofthe Lanczos matrix Tk.17.3e-15 6.22. But the two values diverge only after the true residual norm falls below y^. k 10 20 30 40 cl 3. c2: The global residual norm c3: The classical estimate \(3ke£w$ '\ for the residual norm. where Q is the Q-factor of Uk.9e-15 3. in which we are concerned with the approximate eigenpairs (/4 .38.9e-15 5.4e-12 c5 1.

there is no cure except to compute Ritz vectors from the adjusted Rayleigh quotient Hk. Selective reorthogonalization is due to Parlett and Scott [209]. 197.1959]. But for some time the method was chiefly regarded as an exact method for tridiagonalizing a symmetric matrix. Lanczos knew that his method was in some sense iterative. 198. periodic reorthogonalization to Grcar [97]. For more on this topic. Special mention must be made of an elegant paper by Simon [237]. For an error analysis of full reorthogonalization. Chapter 4.4. Wu and Simon [306] introduce it impromptu under the name of thick restarting. and partial reorthogonalization to Simon [238]. the simple form of Hk makes it easy to apply a step of the inverse power method to the eigenvectors of Tk to obtain the primitive Ritz vectors of Hk. it is so viewed in an early paper of Wilkinson [296. For example. who in his thesis [196. Only when Sorensen showed how to restart the Arnoldi process (see §2) did the fully reorthogonalized method become a reasonable option. For example. When this happens. which unifies these methods. where the eigenvalues in question lie on the real line.4. see [6] and [7]. 1966] and especially Paige. The acceptance of the Lanczos method as an iterative method is due to Kaniel [145. However. In the Lanczos method. which. and a good deal of research has been devoted to finding a way around this problem. Semiorthogonal methods The semiorthogonal variants of Lanczos's algorithm all appeared between 1979 and 1984. NOTES AND REFERENCES The Lanczos algorithm As was noted in §3.Fortunately. Krylov-spectral restarting follows from our general treatment of Krylov decompositions. THELANCZOS ALGORITHM 365 Perhaps the most important point of this example is that Ritz vectors computed from the matrix T^ can fail to converge. 3. was the first to come to grips with the need for reorthogonalization. since his vectors could be combined to produce good approximations to eigenvectors before the algorithm ran to completion. Vol II: Eigensystems . Filter polynomials For Arnoldi's method exact shifts are generally used in restarting because the alternatives are cumbersome. Another possibility is to use Leja points or their approximations. inexact shifts are a real possibility. one could use the zeros of a Chebyshev polynomial to suppress the spectrum on a given interval. see [195]. 3. incidentally.SEC.1971] and other papers [195. 199] laid the foundation for much of the work that was to follow. Full reorthogonalization The need for full reorthogonalization has been generally cited as a defect of the Lanczos method.

A package LANZ implementing partial reorthogonalization is available from Netlib. What happens is that after a loss of orthogonality previously converged Ritz values may reappear and spurious Ritz values may be generated. 238].. The singular value decomposition The singular values and vectors (0-.2 is due to Simon [237]. KRYLOV SEQUENCE METHODS Periodic reorthogonalization is the most straightforward of the methods. Cullum and Willoughby [46.Although the idea is simple. selective reorthogonalization consists of saving converged and nearly converged Ritz vectors and orthogonalizing the Lanczos vectors against them.48] have shown how to detect the spurious Ritz values. the implementation is complicated.2 actually track the loss of orthogonality.366 CHAPTER 5. its explicit formulation and computational use are due to Simon [237. who also gives empirical grounds for believing that the quantities calculated in Algorithm 3. who shows that it applies to any basis generated by a semiorthogonal method. . The fact that semiorthogonality can cause the convergence of Ritz vectors to stagnate does not appear to have been noticed before (although in the context of linear systems the possible untoward effects are hinted at in [237] and [205]). u^ Vi) of a matrix X can be calculated by applying the Lanczos algorithm to the cross-product matrix X^X. But as is usual with Lanczos methods. but as the analysis in §3. Partial reorthogonalization is based on the observation that one only needs to reorthogonalize v against those vectors U{ for which the bounds computed by Algorithm 3. \(3k^Wj \ is the residual norm for the Ritz vector [remember that for Ritz vectors r in (3. however. and the reader should look for his presence in the above references.2.2 are above 7max. Broadly speaking.11) is implicit in Paige's thesis [196]. Chapter 3. Theorem 3.. Then there are constants 7^ of order of the rounding unit such that Now by (3. The recurrence (3. shows it may fail to . Selective reorthogonalization is based on a celebrated result of Paige [196]. The formulas (3. This works well for the dominant singular values.20) is zero]. the devil is in the details.19) are new. k) be the Ritz vectors computed at step k of the Lanczos process and let Wj be the corresponding primitive Ritz vectors..20). and the reader is referred to the treatment of Simon [238]. Let zj (j = 1. For another recurrence see [143]. A package LANSO implementing periodic reorthogonalization is available from Netlib. No reorthogonalization It is possible to use the Lanczos algorithm to good effect with no reorthogonalization. A package LANCZOS is available from Netlib.. It follows that Uk+i can have large components in 11(1}k) along only Ritz vectors that have almost converged.

we pointed out that a Krylov sequence cannot see more than one copy of a multiple eigenvalue and indicated that one way around the problem is to work with block Krylov sequences. The idea was to generate two Krylov sequences. since they are interior eigenvalues. which is particularly strong on implementation issues. one in A and the other in A1. biorthogonality). more serious ways. For the use of shift-and-invert enhancement in a block Lanczos algorithm see the paper of Grimes. The basic idea is to transform the generalized eiVol II: Eigensystems . But it can fail in other. 105.3. proposed by Parlett. Gutknecht [103. and Simon [98]. If we denote bases for these sequences by Uk and Vk and require that they be biorthognal in the sense that V^Uk = /. For more see [21. Lewis.104] gives a full treatment of the theory of the bi-Lanczos method. this reduces to the usual Lanczos method. Taylor. An alternative is to apply the method to the matrix whose eigenpairs are However. Such algorithms have been proposed by Cullum and Donath [45]. It turns out that if one orthogonalizes a block Krylov sequence one gets a block tridiagonal Rayleigh quotient. THE GENERALIZED EIGENVALUE PROBLEM 367 give sufficiently accurate results for the smaller singular values and their vectors. and Ruhe [219]. The method shares with the Lanzcos method the problem of loss of orthogonality (or. and Liu [210].SEC. convergence to the smaller eigenvalues can be expected to be slow. 4. 47. Software is available in SVDPACK at Netlib. The cure is a look-ahead Lanczos recurrence. The resulting generalization of the Lanczos algorithm is called a block Lanczos algorithm. then the Rayleigh quotient Tk = VJ^AVk is tridiagonal. that gets around the failure by allowing the Rayleigh quotient to deviate locally from tridiagonality. 89]. strictly speaking. For more see [79. Chapter 4. 4. Golub and Underwood [92. 275].280]. THE GENERALIZED EIGENVALUE PROBLEM In this section we will consider the use of Krylov methods to solve the generalized eigenvalue problem Ax = \Bx. and it follows that the columns of Uk and 14 satisfy three-term recurrence relations. When A is symmetric and u\ — v\. Block Lanczos algorithms In §3. The bi-Lanczos algorithm The original Lanczos algorithm [ 157] was actually a method for general non-Hermitian matrices.

KRYLOV SEQUENCE METHODS genvalue problem into an ordinary eigenvalue problem and then apply some variant of the Arnoldi or Lanczos method. not the original. To solve the linear system.) When the matrix in question is easy to factor. we need to factor a matrix—B in the above example.1. In many applications the matrix B is positive definite. The products = Cu can then be computed as follows. As it turns out. if we define C properly. .368 CHAPTER 5. TRANSFORMATION AND CONVERGENCE Transformation to an ordinary eigenproblem The first step in applying Krylov sequence methods to a generalized eigenproblem Ax = \Bx is to find an equivalent eigenproblem Cx — fix with the same right eigenvectors.1 we will discuss these issues. A second issue is how to determine convergence. In §4. In §4. (The alternative is to use an iterative method. since the usual residual criterion concerns the transformed problem. An important issue here is the nature of the transformation. we can transform to an ordinary eigenproblem and get the benefits of shift-and-invert enhancement for the cost of a single factorization. all transformations require that a linear system be solved to compute the products Cu required by Krylov methods. Unfortunately. we need to make each one count. we will call C the generalized shift-and-invert transform. 4. if B is nonsingular we can take C = B~1A. and write the problem in the form Then Hence if we set then Because the transformation from A to /x is the same as for the ordinary shift-and-invert method.2 we will show how we can replace orthogonality in the Arnoldi and Lanczos algorithms with B -orthogonality to obtain new algorithms that are specially tailored to this form of the generalized eigenproblem. this presents no problems. To see how this is done let K be a shift. For example. but that is another story. But when factorizations are expensive.

see the comments following Theorem 1. • The bounds are optimal in the sense that there are no matrices E and F satisfying for which where at least one of the inequalities is strict. We will proceed in two stages. 4. we can use this theorem to determine convergence. If the matrix A — KB can be factored. Chapter 2. the product v = Cu can be efficiently computed by the following algorithm.XB and let Let || • II denote the Frobemus or 2-norm. Since the residual is in general computable. Let (X. Direct verification. Residuals and backward error We turn now to the problem of convergence criteria. Namely. First we will establish a backward perturbation theorem for the generalized eigenproblem Ax = \Bx in terms of residual r = Ax . We will then show how to use the residual Cx — Xx to bound the norm of r. Thus the generalized shift-and-invert transformation requires only a single factorization. so that they will tend to be found first by Krylov methods. Vol II: Eigensystems .3. stop when the backward error is sufficiently small—say some multiple of the rounding unit. For more on the use of backward error to determine convergence. Chapter 2.3.XBx.SEC. The backward error theorem is an analogue of Theorem 1. THE GENERALIZED EIGENVALUE PROBLEM 369 The eigenvalues of the pencil (A. B) that are near K become large. and let where Then and Proof.x)bean approximate eigenpair of the pencil A .1. Theorem 4.

. ar) converged. we get Hence It follows from Theorem 4. = (X — «)-1. KRYLOV SEQUENCE METHODS When we use. we may take as an approximate bound on the backward error.KB)~1R^. we will have \\A\\ + \K\\\B\\ = \\A\\ + |A|||5||.B)~1B.2. Fortunately.B)~1B is no longer symmetric and it appears we are forced to use the Arnoldi method rather than the more efficient Lanczos method to solve it. We will then introduce the Arnoldi and Lanczos algorithms with respect to the B inner product. What we really want is a bound on the norm of the residual r = Ax . Finally. we will indicate some of the things that can happen when B is singular or ill conditioned. B) is that the matrix C = (A — K. and simplifying. However.1 that (A. where the normwise relative error in A and B is bounded bv Consequently. where fj. this procedure requires two factorizations—one of B and the other of A — K. Hence. compute a Cholesky factorization R^R of B and apply the Lanczos algorithm to the symmetric matrix R(A . Consequently.fix. We could. We will begin with a brief treatment of the B innerproduct.B. if p is less than some tolerance. To get such a bound multiply s by A — KB to get Dividing by /z. POSITIVE DEFINITE B A difficulty with applying the generalized shift-and-invert transformation to the S/PD pencil (A.XBx. in practice we will be looking for eigenvalues near the shift*. which is similar to C. This criterion has the drawback that we need to know the norms of A and B. 4. B). the residuals we calculate are of the form s = Cx . of course.370 Bounding the residual CHAPTER 5. we can declare the pair (A. x) is anexacteigenpairof pair(A. Unfortunately. the Arnold! method with a generalized shift-and-in vert transformation C = (A . say. we can restore symmetry of a sort by working with a different inner product generated by B.K. which may not be available.

This makes no difference when we are working over the field of real numbers. Let B be a positive definite matrix. • The reader should prove that the B inner product is actually an innerproduct—i. THE GENERALIZED EIGENVALUE PROBLEM The B inner product We begin with some definitions.B = n(xjy)Bynd(xjfj. In the complex case. VolII: Eigensystems .B)~1B is B-symmetric.. In particular. The latter proof is essentially the same as for the 2-norm and proceeds via the R-f'aiir'l'iv inrniiialitv • There is a curious inversion of the order of x and y as one passes from the left. definite. The alternate definition—(x.SEC. y]B-> which is not standard. y)s = x^By — would give (//or. that it is a symmetric. bilinear function—and that the -B-norm is actually a norm. Then the B INNER PRODUCT is defined hv The 5-NORM is A matrix C is ^-SYMMETRIC if for all x and y.e. Here are three comments on this definition.#)B.2. if we define then(/iz.2). 371 Definition 4. which is how functional analysts expect their inner products to behave. however. then the generalized shift-and-invert transform (A K. if A is symmetric. • It is easily verified that: The matrix C is B symmetric if and only ifBC is symmetric. the order matters.2/).to the right-hand side of (4. y)s — fi(x. 4.y)B — ^(#. Specifically.

5. If U is square it is Borthozonal. The following easily proved result will be important later. KRYLOV SEQUENCE METHODS It is natural to say that two vectors u and v are B-orthogonal if (w. consider the B variant of the classical Gram-Schmidt algorithm. hJ = (uj. This can be done by a variant of the Gram-Schmidt algorithm in which the B inner product replaces the ordinary inner product. 2.372 Orthogonality CHAPTER 5.v)B 3. for j = 1 to k Wj = B*UJ hjj — wj*v j v — v — hj*Uj end for j This program requires k multiplications by B. We will say that U is B-orthonormal if U^BU — I. Unless we are careful. All of the usual properties and algorithms associated with orthogonality carry over to Borthogonality. forj = 1 to k 2. On the other hand.. 3. y)s conceals a multiplication of a vector by B. To see this. 1. Uk of j5-orthonormal vectors. end for j 4. However. for j = 1 to k 5. Thus postmultiplication by an orthogonal matrix (but not a ^-orthogonal matrix) preserves 5-orthonormality. 4. for j — 1 to k hj = (ujJv)B v = v — hj*Uj end for j When we recast this algorithm in matrix terms we get the following. 1. 3. V)B = 0.. 1. B-Orthogonalization In what follows we will need to J9-orthogonalize a vector v against a set u\. consider the following B variant of the modified Gram-Schmidt algorithm. we can perform unnecessary multiplications. notice that the the notation (z. 2. . 4. v = v — hj*Uj 6.. end for j .

we can use the classical Gram-Schmidt algorithm. as suggested in (4. Ritz pairs may be computed from the Rayleigh quotient Hk = U^BCUk. where z = UkW. to save multiplications by B. then from (4. There are two ways around this difficulty. The procedure amounts to substituting f?-orthogonalizationfor ordinary orthogonalization. It follows on premultiplication by U^B that EkM = isw. THE GENERALIZED EIGENVALUE PROBLEM In matrix terms this algorithm becomes 373 This version requires only one multiplication by B. • Orthogonalization. Because reorthogonalization is always used in the Arnoldi process. if Hw = vw> then with z — Uw. Hence it is desirable to use the classical Gram-Schmidt algorithm whenever possible.6) where u?k is the last component of w. we have from (4. • Ritz pairs and convergence.Specifically. The B-Arnoldi method The J?-Arnoldi method generates an Arnoldi decomposition of the form where C is the generalized shift-and-invert transform and Uk is #-orthonormal.SEC. Now it is well known that the modified Gram-Schmidt algorithm is numerically superior to the classical Gram-Schmidt algorithm. if Cz = vz. Here are some details. 4.5). so that we can find the primitive Ritz vector v as an eigenvector of HkOn the other hand.6) Vol Hi Eigensystems . In large problems multiplications by B may dominate the computation.

#-Arnoldi method.7) contains || wjt+i ||2. which is a rather large matrix.) Thus we are free to perform orthogonal similarities on the Rayleigh quotient H^. suppose we have transformed our decomposition into the form Then we can deflate the pair (*/. Let Q be orthogonal. Deflation is somewhat more complicated than restarting. However. (In our terminology the transformed decomposition is a #-orthonormal Krylov decomposition. however. the main difference being the need to compute the matrix-vector products of the form Bv. so that all we need to calculate is ||Uk+i \\2> To summarize. Ukw] is Thus.3). .374 CHAPTER 5. u\) by setting #21 and 7^+1. and ||wjt+i||2 will generally be different from one. • Deflation. as usual.1 to zero. we can determine convergence of Ritz pairs directly from the decomposition and the primitive Ritz vector without having to compute the residual vector. The error introduced into the decomposition by this deflation is The problem is that we must estimate ||(£/2 Wjt+i)||2. the J?-Arnoldi method for the generalized eigenproblem with shiftand-invert enhancement is essentially a minor modification of the standard Arnoldi method. By (4. It should be noted that when we are deflating Ritz values the vector #21 is zero. KRYLOV SEQUENCE METHODS Hence the residual 2-norm of the Ritz pair (*/. in the restarted Arnoldi method this matrix will be in main memory and we can use the bound to compute a cheap bound. Note that (4. • Restarting.0ueT to while preserving the 5-orthonormality of UQ. To see what is involved. this quantity is one. we can transform the J?-Arnoldi decomposition CU = UH . This means that both implicit and Krylov-Schur restarting can be applied to the B-Arnoldi method without change.In the ordinary Arnoldi method. ||WAT+I||B is one. In the .

K. (This decomposition of z into x + y is not an orthogonal decomposition. The result now follows from the fact that for any matrix C of order n. Thus we cannot implement the classical Gram-Schmidt algorithm in the B -inner product. y) whenever z G N(B) — and this fact presents complications for our algorithms. In this case the B -inner product is said to be a semi-inner product.nB)~lBx = 0. we need a little geometry. it must be tridiagonal. Just as the restarted B-Arnoldi method is a slightly modified version of the usual restarted Arnoldi method. It follows that the Rayleigh quotient H = U^BCU is symmetric. We will now make the simplifying assumption that so that any vector z can be uniquely written in the form z — x+y. On the other hand if the block size is large or multiplications by B are cheap. so the restarted J9-Lanczos method is a modification of the ordinary restarted Lanczos method (see §3. rank(C) + null(C) = n. i. although they now estimate deviation from B -orthogonality. THE GENERALIZED EIGENVALUE PROBLEM The restarted 5-Lanczos method 375 We have already noted that if A is symmetric then C = (A—KB) ~ 1 B is B-symmetric. and the J9-norm a seminorm. we get Bx — 0.3 we described the Lanczos method with periodic reorthogonalization. it may be more efficient to process each block by the classical GramSchmidt algorithm as in (4. To see this. y) = (ar.. The B -Lanczos method with periodic reorthogonalization In §3. Semidefinite B Since the generalized shift-and-invert transform C = (A — K. When it is necessary to reorthogonalize. Once again the B version of the method can be obtained by obvious modifications of the original method. Thus the 5-Arnoldi method becomes a 5-Lanczos method. where x € T^(C) and y € J\f(B).B. the best we can hope for is a block modified Gram-Schmidt algorithm. We first observe that the null space of B and C are the same.2). If the block size is small. and rank(C) = n . On the other hand if a: is a null vector of C. (x + z. so that x 6 N(B). note that if z is a null vector of B then # is a null vector of C. Multiplying by A . For if there is an x such that Vol II: Eigensystems . to avoid forming matrix-vector products involving B.2 are unchanged. The B semi-inner products is blind to vectors lying in the null space of B — that is. it makes sense to store the columns of BUk. To see what is going on.B)~l B does not involve the inverse of B.SEC.null(B). BC is symmetric. and since H is Hessenberg. 4.4). The estimates of the deterioration of orthogonality in Algorithm 3.) The assumption is always true when C is nondefective. then (A . the columns of Uk must be brought in from backing store in blocks that fit in main memory. as suggested in (4.e. the algorithms described above can be formally applied when B is positive semidefinite.5).

Specifically. from (4. . Even if a primitive Ritz vector w is calculated accurately.11) that Hk is essentially the Rayleigh quotient with respect to a set of vectors uncontaminated by contributions from M(B). Now the eigenvectors of C. this can be done as follows. Uk and Cuk. be. . Consequently. other than the null vectors of B. Mathematically. Lett. lie in 7£((7). In practice. something that can happen only if C has a nontrivial Jordan form. 2. at each stage rounding error will introduce small errors lying in N(B). a random vector and let HI = Cv/\\Cv\\B. so that the primitive Ritz vectors are calculated inaccurately. . the null-space error may contaminate the Ritz vector UkW.. Since Uk+i is a linear combination of ^i. if we are going to use a Krylov method to calculate eigenvectors of B we must insure that the Krylov vectors lie in 7£(C). it must also lie in 7£(<7). « * lieinT^(C). our Krylov subspaces lie in 7£(C). Now suppose that wi. Write where the columns of Uk are in T^(C) and the columns of Vk are in M(B). .10) we have It then follows from (4. . Then it follows that and The contamination of the Rayleigh quotient is not a problem.Thent^ £ 7£(C). let us suppose that we have the J?-Arnoldi decomposition where Fk is an error matrix of order of the rounding unit that accounts for rounding errors made during the Arnoldi process. and such magnifications are empirically observed to happen... 1. Thus if we start with a vector in 7£(C). then C2x — 0. say.376 CHAPTER 5. There are two potential dangers here. The null-space error in Uk may somehow effect the Rayleigh quotient Hk. To analyze these possibilities. KRYLOV SEQUENCE METHODS Cx is nonzero and lies in J\f(B). . Since the B -inner product cannot see vectors in N(B\ it is possible for the Lanczos or Arnoldi process to magnify these small errors.

Hence the null-space error in UkW is small in proportion as the cmantitv is small. where r is a random vector. LetDk = diag(l.8) and the following discussion]. In it P denotes the projection ontoJ\f(B) corresponding to the decomposition R n = T^(C} + Af(B) [see (4. Ukw). as the Ritz pair converges.95. .3. But |/3^^|||w||2 is the residual norm of the Ritz pair (i/.SEC. 4. Hence. Each multiplication by C was followed by the addition of a random perturbation on the order of the rounding unit. Uk is the last component of w. Example 4.. w) be a primitive Ritz pair. Form > n/2. let and In this example we apply the B-Arnold! method to C with n — 500 and m = 400. the null-space error will go to zero. THE GENERALIZED EIGENVALUE PROBLEM 377 It turns out that the errors Vk do not contaminate the Ritz vectors either—at least asymptotically. as usual.. Vol II: Eigensystems . The starting vector u\ is of the form Cr/\\Cr\\s. .. From (4.95k~l). The following example illustrates this phenomenon.9) it follows that where. . Now the left-hand side of this relation has no components in M(B). The following table summarizes the results. To see this let (V.

H k-i = RQ[l:k. = 0. note that the null-space error (column 2) ultimately increases far above one.1 :k—IJ is an unreduced Hessenberg matrix. we can add to it any matrix V in N(B) such thatVfh. Hence if we multiply equation (4. where Hk is unreduced. Qk is unreduced Hessenberg and Rk is nonsingular.14) by g[l:Ar. Initially the component in the null space increases. So far all is according to theory. Define Uk — Uk+iQ. Theorem 4. the first column of Uk is a multiple ofCui. We will return to the remaining columns later.15) is an unreduced Arnoldi decomposition.weget Since R is nonsingular and Q [1:fc. Hence by the uniqueness of the unreduced Arnoldi decomposition. • Although we have proved this theorem for the ordinary Arnoldi process. Instead. It is based on the following characterization of the effects of a step of implicit restarting with zero shift. which is seen to grow steadily as k increases. The third column contains the norm of the component in N(B) of the Ritz vector corresponding to the fifth eigenvalue ofC. The fourth column contains the product \ftk^k\> which along with \\Uk+i\\2 estimates the residual norm.378 CHAPTER 5. If Hk — QR is the QR factorization of Hk. since in finite precision arithmetic the null space error will ultimately overwhelm any useful information in the Krylov vectors. then (up to column scaling) Uk = UkQ. This puts a limit on how far we can extend an Arnoldi or Lanczos sequence. Thus (4. . Let AUk = Uk+\ Hk be an Amoldi decomposition. l:k—l] is also unreduced Hessenberg. Other examples of rapid growth of null-space error may be found in the literature.14) the first column of Uk satisfies Since m ^ 0.l:A?-l].4. Then By definition Uk-i — UkQ[l'k. The chief difference is that Uk+i is no longer unique. LetAUk-i = UkHk be the decomposition resulting from one step of zero-shift implicit restarting. Moreover. Proof. (4. from (4. l:k—l]. However. it also holds for the #-Arnoldi process. The following suggests a way of purging the unwanted components from the entire sequence.15) is the decomposition that results from one step of zero-shifted implicit restarting. KRYLOV SEQUENCE METHODS The second column contains the norm of the component ofUk+i in N(B). Because Hk is unreduced Hessenberg. even when B is semidefinite. but as the residual norm decreases so does the component.

one of B and one of A — KB (however. so that the the initial factorization of B can be prorated against the several factorizations of A — KB). From (4. Scott [236] applied Parlett's algorithm to the pencil [B(A. Ericsson and Ruhe [72] proposed applying the Lanczos algorithm to the matrix R(A — KB)~1R^." The transform (A-KB)~1B appears implicitly in a book by Parlett [206. however. citing Vol II: Eigensystems . where B = R^R is the Cholesky factorization of B. They called their transformation "the spectral transformation. 319].ft"1 not be large. Since in practice the Arnoldi method must be used with restarts. The last two columns in (4.e. He also gives a version of the Lanczos algorithm for B~l A.9) we have 379 Since the left-hand side of this equation is in 1Z(C). p.3. which is needed to know when to purge. showing that it required no factorization of B. The cure is to perform periodic purging in the same style as periodic reorthogonalization. This approach has the difficulty that it requires two factorizations." Here we have preferred the more informative "shift-and-invert transformation. It helps that we can tell when the null-space errors are out of hand by the sudden growth of the Arnoldi vectors. (He also says that the algorithm is not new. THE GENERALIZED EIGENVALUE PROBLEM The application of this theorem is simple. B]. For this to happen it is essential that . NOTES AND REFERENCES The generalized shift-and-invert transformation and the Lanczos algorithm The idea of using a shift-and-invert strategy for large generalized eigenvalue problems arose in connection with the Lanczos process. 4.SEC. The fifth column is the null-space error in Uk+iQ. With the Lanczos algorithms. although some experimentation may be required to determine the longest safe restarting interval. thereafter it grows. The reason is that the null-space error in Uk (column 2) has grown so large that it is destroying information about the component of Uk in 7l(C}. which. 4. Unfortunately. The chief problem is that the null-space error in the Arnoldi vectors can increase until it overwhelms the column-space information. the problem of purification is analogous to that of reorthogonalization—it requires access to the entire set of Lanczos vectors.. in which the Lanczos vectors are B -orthogonal. However. For the first 50 iterations it remains satisfactorily small. so is the right-hand side—i. since in the presence of rounding error it will magnify the residual error Fk in the decomposition. at present we lack a reliable recursion for the null-space error. including a zero shift among the restart shifts may suffice to keep the errors under control.KB)~1B. the authors assume that one will use several shifts. one step of zero-shifted implicit restarting purges the B-Arnoldi vectors of null-space error.13) illustrate the effectiveness of this method of purging null-space error. The #-Arnoldi and Lanczos methods with semidefinite B are imperfectly understood at this time. who rejects it. requires a factorization of B.

Semidefinite B The basic paper for the semidefinite . He also considers the case of semidefinite B. They point out that the null-space error in the Lanczos vectors can grow and give an example in which it increases 19 orders of magnitude in 40 iterations.) When reduced to a standard eigenvalue problem by multiplying by B~l. Meerbergen and Spence [176] treat the nonsymmetric case—i. this pencil becomes (C..8) that 7l(C) and Af(B) have only a trivial intersection is violated in practical problems.e. His solution is a double purging of the Ritz vectors based on the formula (4. and Jensen [184]. but for a different purpose. (The same formula was used earlier by Ericsson and Ruhe [72]. Parlett. in which C is what we have been calling the generalized shift-and-invert transform. Ericsson. the authors give a rounding-error analysis of their procedures. Ericsson [71] considers these problems in detail for the Lanczos algorithm. Finally.) The fact that the purging becomes unnecessary as a Ritz vector converges appears to have been overlooked.12) can be used to purge a Ritz vector of null-space error. /).12). the semidefinite B-Arnoldi algorithm. They also point out that the formula (4.380 CHAPTER 5. Their example of the need for purification increases the null-space error by 13 orders of magnitude in 11 iterations.282]. .1 follows from generalizations of a theorem of Rigal and Caches [216] for the backward error of linear systems and is due to Fraysse and Toumazou [77] and Higham and Higham [112]. The hypothesis (4. to handle the problem of overlapping null and column spaces.#-Lanczos algorithm is by Nour-Omid. Residuals and backward error Theorem 4. KRYLOV SEQUENCE METHODS two technical reports [183. They introduce the elegant method of using zero-shift implicit restarting to purge the null-space error. He credits Parlett with recognizing that the algorithm can be regarded as the Lanczos algorithm in the B-irmer product. which they combine with the purification of Nour-Omid et al.

It is an older method and is potentially slow. it sometimes wins the race. But it is very reliable. Moreover. the converges of the power method will be slow. then the first term in the expansion dominates. We next turn to methods based on Newton's method for solving systems of nonlinear equations. and it is easy to use. combined with shift-and-invert enhancement or Chebyshev acceleration. 1. However. SUBSPACE ITERATION In motivating the power method with starting vector UQ (§ 1. if we perform the power method on a second vector u'Q to get the expansion 381 . there are other approaches. we get the Jacobi-Davidson algorithm. its implementation is comparatively simple. Subspace iteration is a block generalization of the power method. these methods appear to be mostly suitable for dense problems. Since Newton's method requires the solution of a linear system. with which this chapter and volume conclude. But with some care they can be adapted to large sparse eigenprobems. Chapter 2) we assumed that A was diagonalizable and expanded in terms of the eigenvectors Xj of A. converges to x j. and Aku0j suitably scaled. When we combine Newton's method with Rayleigh-Ritz refinement. When | A21 is near | AI |.1. If and 71 / 0.6 ALTERNATIVES Although the most widely used algorithms for large eigenproblems are based on Krylov sequences. In this chapter we will consider two alternatives: subspace iteration and methods based on Newton's method—particularly the Jacobi-Davidson method.

whose union is A(A). In §1. If this ratio is less than |A 2 /Ai |. THEORY As suggested in the introduction to this section. one considers the powers AkUo. the theory of subspace iteration divides into three parts: convergences of the subpaces. Finally. The eigenvalues of these matrices. since we are working with orthonormal vectors. the effect is to speed up the convergence. ALTERNATIVES we can eliminate the effect of A2 by forming the linear combination whose convergence depends of the ratio |As/Ai|. 1. L^ is of order m—l. We wish to show that as k increases the space K(AkUo) contains increasingly accurate approximation to the dominant eigenspaces of A and also to determine the rates of convergence. it is more natural to use a variant of the Rayleigh-Ritz method that computes the dominant Schur vectors of A. First.3 we will treat the symmetric case and in particular show how to use Chebyshev polynomials to accelerate convergence. However. However. In §1. We will treat each of these in turn. since they all tend toward the dominant eigenspaces of A.1. We will suppose that A has the spectral representation [see (1. are assumed to be ordered so that . The second problem is how to extract approximations to eigenvectors from the space.382 CHAPTER 6. the relation to the QR algorithm. It turns out that power subspace 7£( A*'£/o) generally contains a more accurate approximation to x\ than could be obtained by the power method. Hence we can cure the problem of dependency by maintaining an orthonormal basis for 7£( AkUo). of course. But in fact we are only interested in the space spanned by AkUo. Moreover. there are two problems that must be addressed before we can derive an algorithm based on the above observations. and the extraction of Schur vectors. the power subspace contains approximations to other dominant eigenspaces. Convergence Let UQ be an nxm matrix.7). This cure leads to an interesting and useful connection between subspace iteration and the QR algorithm. Subspace iteration is a generalization of this procedure. the columns of AkUo will tend toward linear dependence. and L$ is of order n—m. use a Rayleigh-Ritz procedure. Beginning with an nxm matrix UQ. in § 1.2 we will show how to implement subspace interation. Chapter 4] where LI is of order I.1 we will develop the basic theory of the algorithm. We could.

L% is of order n-m.1). we get The bottommost expression in this equation is analogous to (1. then By construction K(V} ') C K(AkUQ). Multiplying (1.9. so that the column space of AkUo will approach the eigenspace TZ(Xi).3) by Ah and recalling that AXi = XiLi.2) suggests that the first term in the expression will dominate. We summarize these results in the following theorem. Moreover. assume that C\2 = (Cj1 C^)H is nonsingular and let where D\ has t columns. To make this more precise. Moreover.SEC. (1. in which we recapitulate a bit. L^ is oforder m-1. Theorem 1. and their eigenvalues are ordered so that Vol II: Eigensystems . we have that for any e > 0 Thus we see that the column space of AkUo contains an approximation to the eigenspace IZ(Xi) that converges essentially as |A m +i/A/| fe . Let the matrix A of order n have the spectral representation where LI is of order I. 1. Then If we partition Vk = (V^ V2(A:)). SUBSPACE ITERATION 383 where d = Y^UQ. Chapter 1. by Theorem 2.1.

we can replace it with AkUoS. • The theorem contains no hypothesis that A is nondefective. the convergence of these approximations deteriorates as t increases. Chapter 4]. Moreover. we are not interested in the matrix AkUo—only the subspace spanned by it. This suggests that subspace iteration can deal with defectiveness.1 then applies.9.j-i/A^|. implies that we can take e = 0 in (1. which is in general smaller than | A 2 /A!|. If the matrix (C^ C^)H is nonsingular. so that the theorem is valid for the original UQ.Let Ok be the largest canonical angle between K(AkUo) and K(Xi). where S is chosen to restore independence. since the convergence ratio is essentially |Am. however. Consequently. since we will not know the eigenvalues of A when we choose m. A natural choice is to make AkUoR orthogonal. However. If . when the columns of AkUo begin to deteriorate. then Theorem 2. Forif^ = 1. we can reduce the size of UQ to the point where the hypothesis is satisfied (at the very worst we might end up with m = €). if |Ai| > \\2\. adding back the missing columns of UQ can only improve the accuracy of the approximation to 7£(XL). For example. Chapter 1. Fortunately. which Krylov sequence methods do not handle well [see the discussion following (3. This observation is important in applications. If. the eigenvalues A^ and A m+ i are nondefective.4).14). then the convergence depends essentially on the ratio |A m +i/Ai|. There are four comments to be made about this theorem. Theorem 1. ALTERNATIVES where d = Y) H E/b. The QR connection One difficulty with the naive version of subspace iteration is that the columns of Ak UQ tend to become increasingly dependent. the columns space of AkUo will contain approximations to each of the eigenspaces corresponding to the first £ eigenvalues of A. the magnitudes of the first ra eigenvalues of A are distinct. • We are free to position t at any point in the spectrum of A where there is a gap in the magnitudes of the eigenvalues. then for any e > 0.384 Let UQ G C n x m be given and write CHAPTER 6. If. • The theorem shows that subspace iteration is potentially much faster than the power method. for example. the columns will all tend to lie along the dominant eigenvector of A. • The theorem would appear to depend on choosing m so that | Am | is strictly greater than | A m+ i |. But if that condition fails.

the corresponding columns of Qk will not settle down. where there is a gap in the magnitudes of the eigenvalues. so that some eigenvalues have equal magnitudes.2.9). Vol II: Eigensystems . Chapter 1]. When the hypothesis (1. SUBSPACE ITERATION 385 we orthogonalize by computing a QR factorization after each multiplication by A. QkRk is me QR factorization of AkUo. Let U0eCnxm be orthonormal and let where Suppose that X 1U0 has anLU decomposition LT.2. and let X\ = QR be the QR factorization ofXi. then there are diagonal matrices Dk with \Dk\ = I such that UkDk —» Q.1. IfAkUo has the QR factorization UkRk.and R-factors from the unshifted QR algorithm.6) is not satisfied. An easy induction shows that if Rk = RI • • • Rk that is. In fact. Theorem 1. the proof of Theorem 2. The importance of this theorem is that the matrices Uk generated by (1. Partition X = (Xi X^}. Now recall (Theorem 2.SEC. the space spanned by the corresponding columns of Qk converge to the corresponding dominant eigenspace. Chapter 2) that if Ak = QkRk is the QR decomposition of Ak then Qk and Rk are the accumulated Q. we get the following algorithm. But we know that H(Qk) must contain an approximation to Xj that is converging as \Xj/Xm+i\k.5) converge to the Schur vectors spanning the dominant eigenspaces [see the comments at (1. Chapter 2. We now turn to the problem of how to extract an approximation that converges at this faster rate. the convergence of the jth column of Qk is governed by the largest of the ratios \Xj/Xj_i \ and \Xj+i/Xj\ (but only the latter when j — 1). This suggests that subspace iteration is in some way connected with the QR algorithm.2. Chapter 2. As in Theorem 2. 1. can be adapted to establish the following result. The diagonal elements of R converge to the dominant eigenvalues of A. Nonetheless. where U is unit lower trapezoidal and T is upper triangular.

The following procedure. 1. Compute the Schur decomposition B = WTWn.5) to generate approximations to eigenvectors of A by the standard Rayleigh-Ritz method. For further details. A general algorithm The theory of the last section shows that there are three nested operations that we must perform. which we do not treat here. The matrix B is the Rayleigh quotient B = Q^AQk. The innermost operation is a power step in which we calculate AU (for notational simplicity we drop the iteration subscript). l:j)l converge to Xj essentially as |Am/Aj|*. where the eigenvalues of B are arranged in descending order of magnitude Compute the QR factorization V — Qk+iRk+i The motivation for this procedure is the following. B = Q*V 3. What we really want to do is to rearrange the orthogonal matrix Qk so that its columns reflect the fast convergence of the dominant subspaces. 4. 1. accomplishes this. The reader should keep in mind that actual working code requires many ad hoc decisions.386 Schur-Rayleigh-Ritz refinement CHAPTER 6. which is called the Schur-Rayleigh-Ritz (SRR) method. But since we are only concerned with the subspace spanned by U. However. Since AQkW = VW. The basic result is that if | Aj | > | \j+i \. we can obtain the iterate following the SRR refinement by orthogonalizing VW. we would compute an orthonormal basis Wj for the corresponding invariant subspace of B and for QkWj. The power step must be performed at each iteration. we do not need to orthogonalize until there . Finally. ALTERNATIVES Given the sequence Qk.Now if we wish to use the Rayleigh-Ritz method to compute an orthonormal approximation to the dominant eigenspace Xj. V = AQk 2. the matrix Qk W contains Ritz bases for all the dominant eigenspaces^j. But the columns of Wj are just the first j Schur vectors of B. see the notes and references. IMPLEMENTATION In this subsection we will sketch at an intermediate level what is needed to produce an implementation of subspace iteration.2. at the outer level is the performance of an SRR step. so that the dominant eigenspace Xj is well define the approximations 1^[Qk('-. The next level is the orthogonalization of U. Thus the SRR procedure gives properly converging approximations to all the dominant eigenspaces at once. The full analysis of this method is complicated and we will not give it here. Consequently. we can use (1. one of the strengths of subspace iteration is that it gives meaningful results when the matrix A is defective or has nearly degenerate eigenvectors. V = VW 5.

3. SUBSPACE ITERATION 387 This algorithm is an outline of a general subspace iteration for the first nsv dominant Schur vectors of a general matrix A.1. end while 8. end while Algorithm 1. Test for convergence and deflate 10. where m > nv. the jth column of U satisfies Vol II: Eigensystems . 1. Convergence Convergence is tested immediately after an SRR step. end while 6. suppose that U is composed of the dominant Schur vectors of A. 1. To motivate the test. while (convergence is not expected) 3. Thus we may be able to perform several power steps before we need to Orthogonalize U.1: Subspace iteration is a danger of a nontrivial loss of orthogonality. 2. while (the columns of U are sufficiently independent) 4. In particular. so that where T is upper triangular with its eigenvalues arranged in descending order of magnitude. As usual we will test a residual. the algorithm leaves several important questions unanswered: 1. U = A*U 5. Perform an SRR step 9. These observations are summarized in Algorithm 1. while (the number of converged vectors is less than nsv) 2.SEC. Orthogonalize the columns of U 1. The input to the program is an n x m orthonormal starting matrix U. Similarly. How do we test for convergence of Schur vectors? How do we deflate converged Schur vectors? How can we tell when to perform an SRR step? How can we tell when to Orthogonalize? We will treat each of these problems in turn. Because it is so general. the SRR step is needed only when we expect some of the columns of U to have converged. 4.

Let where T( consists of the first I columns of T. to perform an SRR step.g.. we do not multiply U\ by A.. and stop with the first one to fail the test. Then when we multiply by A we form the matrix ({/i. there is a matrix E with \\E\\2 = \\R\\z such that (A + E)U = UTi.388 CHAPTER 6. Thus we declare that the jth column of U has converged if where tol is a convergence criterion (e. and a better choice might be 7 = \TJJ\ —i. (2. However.21). and SRR steps. This criterion is backward stable. if A has been obtained by shift-andinvert enhancement. Chapter 4. eM) and 7 is a scaling factor. Deflation Because the columns of U converge in order. Suppose that I vectors have converged. Chapter 5]. we should test convergence of the Schur vectors in the order j = 1. then such an estimate might be too large. For this reason.. Specifically.e. we calculate the Rayleigh quotient Here T\\ is the triangular part of the Schur decomposition that corresponds to U\. In other words. deflation simply amounts to freezing the converged columns of U. and partition U — (U\ #2). suppose the first i columns of U have converged. orthogonalization.. ALTERNATIVES where tj is the jth column of T.2. the matrix W of Schur vectors of B has the form and the new U has the form . A natural choice of 7 is an estimate of some norm of A.e. the first I columns of U are exact Schur vectors of A + E. Then by Theorem 2. This freezing results in significant savings in the power. Finally. At each orthogonalization step we orthogonalize the current U% first against U\ and then against itself. the magnitude of the jth eigenvalue in the Schur decomposition [cf. Our converge theory says that the Schur vectors corresponding to the larger eigenvalues will converge more swiftly than those corresponding to smaller eigenvalues... where U\ has i columns. AU%)—i.8. Because of the triangularity of TH.

we wish to determine / so that Hence It follows from (1. the columns of the matrices U. The cure is to orthogonalize the columns of A1U for some /. become increasingly dependent. where E is comparable to the error in rounding X on a machine whose rounding unit is eM. is an orthonormal basis for 7£(X) 1 . By Theorem 2..A2U. where \\E\\% < \\X\\eM—i. so that an unnecessary SRR step is performed just before convergence. especially in the early stages when linear convergence has not set in. so that we can estimate a by We can now use cr to predict when the residual norm will converge. is has a residual norm pk. if A has nearly equimodular eigenvalues.7) that The formula (1. When to orthogonalize As we have mentioned. Let us focus on a particular Schur vector. and suppose at step j it has a residual norm pj and at step k > j. 1.AU. then pk = vk~Jpj.e. One must also take care that the formula does not slightly underpredict /. Precautions must be taken against it wildly overpredicting the value of /. SUBSPACE ITERATION 389 When to perform an SRR step The theory of the last section suggests that the residuals for the approximate Schur vectors will converge linearly to zero. let X = A1U. We can use this fact to make a crude prediction of when we expect a Schur vector to converge.. so that one SRR step detects the convergence of the entire cluster.. For more see the notes and references. To see what that risk is. and consider the effects on H(X) of the perturbation X + E. VolII: Eigensystems . If we take / too small. Finally. if we take / too large we risk losing information about the subspace spanned by A1 U. then HQj^lh is the sine of the largest canonical angle 9 between H ( X ) and 1l(X + E). Chapter 4.2. it will pay to extrapolate the average of their residual norms.SEC.8) is a crude extrapolation and must be used with some care. On the other hand. then we perform unnecessary work. if Q is an orthonormal basis for H(X + E) and Q j.. If the residual norms are converging linearly with constant cr. Specifically.

10) will be zero. note that once the process starts converging. ALTERNATIVES be the QR factorization of X. the sensitivity of A1 U is proportional to K. which by hypothesis is large. we will have where T is the Schur form from the last SRR step. where Amax(T) and Xmin(T) denote the largest and smallest eigenvalues of T in magnitude. we will almost certainly have K. Then has orthonormal columns. Hence. Thus if we decide that no deterioration greater than 10* is to be tolerated.(Tl) < K(T)1.(R). Hence if Q±_ is any orthonormal basis for 7£(X). let CHAPTER 6.390 To find an orthonormal basis for 1Z(X + E).(T) > | Ai/A2|. the more all the columns . The common sense of this is that the larger the value of |A!/A 2 |. then Thus Hence and the degradation in the subspace of X is proportional to K. Hence as convergence sets in. Since £/is orthonormal. where / satisfies An implication of this analysis is that if |Ai| > |A2| —as it might be when we use shift-and-invert enhancement with a very accurate shift—then we will have to reorthogonalize at each step. we should orthogonalize after at most / steps. The reason is that for any matrix T we have «(T) < IAmax(r)/Amin(T)|. To apply this result to subspace iteration. Thus / in (1.

SUBSPACE ITERATION 391 of AU will be contaminated by components of a?i. not the trailing principal submatrix of T associated with the remaining subspaces. so that we can use the more efficient symmetric eigenvalue routines to perform the SRR step. The only problem is the ordering of the blocks of the real Schur form. Economization of storage In economizing storage it is important to appreciate the tradeoffs. so that there is no need of a postprocessing step to compute the eigenvectors. Moreover. For example. First.2. Real matrices If A is real. we must continue to work with the full T. Chapter 2). the Rayleigh quotient is symmetric. we can use an auxiliary matrix V and proceed as follows. Most of the above discussion goes through mutatis mutandis. which we must eliminate by orthogonalization. AI still makes its contribution to nonorthogonality. More important.1 we wrote U = AU. l.SEC. In statement 4 of Algorithm 1.11) — significantly less if n is large. The tradeoff is that we must perform m separate multiplications by the matrix Vol II: Eigensystems . we will want to work with the real Schur form of the Rayleigh quotient (Theorem 3. once it settles down it is likely to be pessimistic. indicating that the product A U is to overwrite the contents of U. whether or not its eigenvector has been deflated. in practice the technique has shown itself to be effective. Alternatively. Chapter 5. This can be done in two ways.12) obviously uses less storage than (1. There are two economies. For example.1. It should be stressed that after deflation. SYMMETRIC MATRICES When the matrix A is symmetric. however. we can obtain some obvious economies. However.10) with care.3. we can speed up convergence by replacing the power step with a Chebyshev iteration. we can introduce an auxiliary vector v of order n and write Algorithm (1. which has been treated in §2. 1. that are less obvious. The reason is that in the powering step. the Schur vectors are now eigenvectors. First we can make do with a single array U instead of the two arrays required by the nonsymmetric algorithm. since we have used several norm inequalities in deriving it. Once again we must treat the crude extrapolation (1.

But in calculating the Rayleigh quotient B = U^AU. at the cost of twice the storage. In this case. on the other hand. and a price must be paid for accessing that structure. In the algorithm (1. then so is B. of course. we have effectively scaled A by s. but we will perform many orthogonalization steps. instead of computing AkU we will compute pk(A)U. Chebyshev acceleration We have noted that if the eigenvalues of A are well separated in magnitude. To see why. we can use the power steps to accelerate convergence. but we will need few orthogonalization steps. . it will generally be represented by some data structure. A natural choice is the scaled Chebyshev polynomial where |A m+ i| < s < |A m |. both algorithms have the option of economizing storage. which brings the tradeoffs full circle. For the purposes of argument. we have Thus Pk(A)U will contain approximations to the dominant eigenvector that converge at least as fast as e~k cosh ' Al I converges to zero. then convergence will be fast. From (3. then convergence will be slow. if the eigenvalues of A are clustered in magnitude. so that we may assume that |A m +i| = 1 and hence Ic^A^-i-i)! = 1.392 CHAPTER 6. however. ALTERNATIVES A. we need only access A once to compute the entire product A U—but. This can be done without an auxiliary array as follows. we have no choice when A is non-Hermitian: we must use an extra array V.11).12) that price must be paid ra times. The analogous result is true for the other eigenvalues. Conversely. note that when we evaluate*^ (A/s). How that choice will be made must depend on the representation of A and the properties of the machine in question. So far as computing the product AU is concerned. and we need only compute its upper half. Specifically. Chapter 4.8). But if A is Hermitian. Thus there is a true choice in the non-Hermitian case. where pk is a polynomial of degree k. In the algorithm (1. assume that we have chosen s = |Am+1|. chosen to enhance the gap between | AI | and | Am+11. If the matrix A is sparse.

. What value should we use for the scaling factor 3? 2. Chapter 4) The following algorithm implements this recurrence. suppose that AI = 1. so that this drawback is of minimal importance. the tradeoff is that we must call the routine to multiply A into an array more often. m) be the Ritz values computed in the SRR step arranged in descending order of magnitude.01 = 0.01. Eigensystems . z/m —> A m . The choice has the drawback that as the iteration converges. But in subspace iteration we generally choose m somewhat greater than the number of eigenpairs we desire. 1. In implementing Chebyshev acceleration there are three points to consider. we can use the three-term recurrence (see item 1 in Theorem 3. For example. it is natural to take 5 = \vm\ for the scaling factor.87—a great improvement. Chapter 1) we know that) A m+1 | < |i/TO| < | A m |. How do we compute Ck(A/s)Ul 3. SUBSPACE ITERATION When | AI | is near one.7.. if we use Chebyshev acceleration. . How does the acceleration affect our procedure for predicting when to perform orthogonalization steps? Let V{ (i — 1.99.. To evaluate Ck(A/s)U. we can make do with two extra n-vectors. On the other hand. 1. VolIP. Note that we have used two extra n x m arrays — U and V—to implement the recurrence. As above.SEC. so that we do not get the best convergence ratio.6. Since by the interlacing theorem (Theorem 3. so that ordinarily we could expect a convergence ratio of 1/1. we get a convergence ratio of 0. If we generate the final U column by column. the convergence ratio 393 can be much smaller than the usual ratio | AI | k obtained by powering.

394 CHAPTER 6. which bears the same relation to Rutishauser'sLR algorithm [224. Although the hypothesis |A m | > |A m+ i| is not necessary for the spaces T^(Uk) to contain good approximations to the Schur vectors.1957]. Clint and Jennings [42. [16. 1976] introduced and analyzed the Schur-Rayleigh-Ritz method for approximating Schur vectors. For symmetric matrices Jennings [130. First. where it is argued that the phenomenon is unlikely to be of importance in practice.13) by 1.1971] compute Ritz vectors to get good approximations to eigenvectors from the Uk.4. For nonsymmetric matrices.Stewart [257. Second. NOTES AND REFERENCES Subspace iteration Subspace iteration originated in Bauer's Treppeniteration. and Clint and Jennings [41. Hence we must replace the formula (1. since A is symmetric. Stewart [249.9). The full Rayleigh-Ritz acceleration was introduced independently by Rutishauser [227. The latter paper also introduces the technique of extrapolating residual norms to predict convergence. The idea of predicting when to orthogonalize is due to Rutishauser [228] for the symmetric case and was generalized to the nonsymmetric case by Stewart [257]. A counterexample is given in [257]. Chapter 4.1955] as the orthogonal version presented here does to the QR algorithm. we must modify the formula for the interval between orthogonalizations in two ways. ALTERNATIVES Turning now to orthogonalization. it is necessary to prove that the SRR procedure retrieves them. Thus we can avoid the invocation of a condition estimator and compute ft(T) as the ratio of vi/vm of the Ritz values.1970].1967] introduced an approximate Rayleigh-Ritz step using first-order approximations to the primitive Ritz vectors. . so that where by (3. 1965] introduced the orthogonal variant. we will be performing Chebyshev iteration. if / is greater than one. In The Algebraic Eigenvalue Problem Wilkinson [300.1969]. the Schur T of the Rayleigh quotient is diagonal.1969].

NEWTON-BASED METHODS One of the most widely used methods for improving an approximation to a root of the equation f ( x ) = 0 is the Newton-Raphson formula It is a useful fact that specific instances of this formula can be derived without differentiation by using first-order perturbation expansions. For details and further references see [68]. suppose we wish to compute ^/a for some positive a. Software Rutishauser [228] published a program for the symmetric eigenproblem.271] is designed for real nonsymmetric matrices and uses Rayleigh-Ritz refinement to compute eigenvectors. s]. All these references contain detailed discussions of implementation issues and repay close study. 2. The program LOPS I of Stewart and Jennings [270. For example. 2.SEC. This amounts to solving the equation f ( x ) = x2 — a = 0. expand the square. we get the approximate linear equation from which we find that VolII: Eigensy stems . and drop terms in d2. s] rather than on [-s. When A is positive definite it is more natural to base the Chebyshev polynomials on the interval [0. The program SRRIT of Bai and Stewart [12] uses SRR steps to compute Schur vectors. Chebyshev acceleration can also be performed on nonsymmetric matrices and can be effective when eigenpairs with the largest real parts are desired. NEWTON-BASED METHODS Chebyshev acceleration 395 The use of Chebyshev polynomials to accelerate subspace iteration for symmetric matrices is due to Rutishauser [228]. as is done in the text. The Newton correction is On the other hand if we write where d is presumed small. Duff and Scott [68] describe code in the Harwell Library that includes Chebyshev acceleration for the symmetric and nonsymmetric cases.

it becomes the Jacobi-Davidson method. In this section we are going to explore the consequences of applying this approach to the eigenvalue problem.396 CHAPTER 6. However. we add the condition where w is a suitably chosen vector. in which case the fcth component of z + v remains unchanged. For example. z + v) be an exact eigenpair of A. To determine the system completely. Then If we discard products of small terms. It turns out that it yields some old friends like the Rayleighquotient method and the QR algorithm. where E is small. When combined with the Rayleigh-Ritz method. APPROXIMATE NEWTON METHODS This section concerns the derivation of Newton-based methods and the analysis of their orthogonal variants. let (// + r).2 we will treat the Jacobi-Davidson algorithm. z) of A and wish to determine r\ and v so that (//+77. this equation is underdetermined: there are only n relations to determine the n components of v and the scalar 77. A general approximate Newton method Suppose we are given an approximate eigenpair (/i. ALTERNATIVES Thus the Newton correction d and the approximation (2.1. In §2. we will assume that we have an approximation to A. Later we will consider the consequences of choosing w — z. we could choose w — e^.?+v) is an improved approximation.1) to the true correction d are identical. In the next section we will introduce approximate Newton methods and a class of variants that we will call orthogonal correction methods. we obtain the following approximate equation in 77 and v: We therefore define approximate corrections to be the solutions of the equation Unfortunately.. For reasons that will become clear later. For purposes of derivation. 2. . it also yields some new methods and suggests ways of implementing them for large sparse eigenproblems. We will begin with the general method and then specialize.

Vol II: Eigensy stems . P^-z = 0 and P/r = r. Let where r j_ is any vector that is orthogonal to r for which r^z ^ 0.4). To derive it.4) the approximate Newton correction formula. Then follows that We will call (2. P^~ is an oblique projector onto the orthogonal complement of r j_ along z.SEC. In the language of projectors. as the following theorem shows. write the first equation in (2.5) the approximate Newton correction equation. we will simply call it the Newton correction formula. 2. that is of theoretical and practical use. It actually characterizes the correction vector v. If A — A. The correction equation There is an alternative to the correction formula (2. It follows that nOW LET be the projector onto the orthogonal complement of w along u.4) in the form The first order of business is to get rid of 77. NEWTON-BASED METHODS Now in matrix form our equations become 397 By multiplying the first row of this relation by WH( A — /z/) the second row. involving only v. we obtain the triangular system l and subtracting it from Hence We will call (2. In particular.

to require the correction v to be orthogonal to the approximate eigenvector z.5). let A . Proof. .5).5).4) then v satisfies (2. then a simple picture shows that the minimum distance between z and span(£ + v) is approximately \\v\\z. Oblique projectors can be arbitrarily large. if we restrict p appropriately. In other words. something that is easily done. For if v J. v is close to the smallest possible correction.1. Then v satisfies (2. We have already shown that if v satisfies (2. where y is an arbitrary vector. However. In the above notation.398 CHAPTER 6. which exhibits the correction explicitly. We will return to this point later.4) if and only ifv satisfies (2.and P^ by a single orthogonal projection. Orthogonal corrections The presence of oblique projectors in the correction equation is disturbing. which may be too expensive to compute for large sparse problems. and large projections can cause numerical cancellation in expressions that contain them.pi be nonsingular. and their implementation requires a factorization of the matrix A — pi. The correction equation says that the correction v to z satisfies a simple equatior involving the projected operator Although the equation is pretty. we can replace the projections P^. we need only compute matrix-vector products of the form A± y. if we use an iterative method based on Krylov sequences to solve the correction equation. Then Thus v satisfies (2. the correction formula is cast in terms of inverses. On the other hand. ALTERNATIVES Theorem 2. We first observe that it is natural to take w = z—that is.5). Suppose now that v satisfies (2. is does not seem to offer much computational advantage over the correction formula. However. z and v is small compared with z.

But since we will be concerned exclusively with orthogonal corrections in the sequel.4. and (2. if we choose fj. Let (^. Theorem 2. • We call the vector v of Theorem 2. Then the Newton-based correction v that is orthogonal to z is given by the correction formula Equivalently. However. to be the Rayleigh quotient then r is indeed orthogonal to z. in this case the correction equation becomes and this equation is consistent only if r J_ z. correction formula. Now by Theorem 1.6) and (2. Chapter 2.7) the orthogonal correction formula and the orthogonal correction equation.2 the orthogonal (approximate Newton) correction. We should also like P/.to be the same. NEWTON-BASED METHODS If we take w = z.SEC.2. 2. z) be an approximate normalized eigenpair of A where and lei Let and assume that A is nonsingular. then we may take 399 which is an orthogonal projection. v is the unique solution of the correction equation Three comments on this theorem. and correction equation. Thus we have established the following result. we will usually drop the qualifier "orthogonal" and speak of the approximate Newton correction. Vol II: Eigensystems . dropping the approximate when A = A.

fj.7) is a projection of A suggests that for the purposes of analysis we should rotate the problem into a coordinate system in which the projector projects along the coordinate axes. The obvious sense is that we require v to be orthogonal to z.2)]. is in the higher-order terms. however. The correction equation can now be written in the form .I is projected onto the orthogonal complement of z in the correction equation. Specifically. The reason is that we use the normalized vector (z + v)/\\z + v\\2 and its Rayleigh quotient in place of z + v and /i + 77. even when A = A. • We have used the term "orthogonal" it more than one sense. Analysis The fact that the operator in the projection equation (2. But equally important is that the operator A . The distinction. we have the following correspondences: in which p is to be determined.400 CHAPTER 6. In the Z coordinate system. ALTERNATIVES • We have used the expression "approximate Newton method" rather than simply "Newton method" because both the correction formula and the correction equation contain the approximation A. the iterated method is not strictly speaking Newton's method. However. let Z = (z Z_\_) be a unitary matrix and partition We will also write where [see (2.

By definition it uses the Rayleigh quotient ft corresponding to z. We now turn to the iterative application of the approximate Newton method. There are three ways to simplify the analysis. work with the normalized vector . The easiest choice is v — fi -f /iHp. Vol II: Eigensystems . since v _L z. NEWTON-BASED METHODS Hence we have 401 This expression corresponds to the correction formula. Thus in the Z coordinate system the correction equation and the correction formula are so simply related that there is no need to prove a theorem to establish their equivalence. and our only problem is to find a reasonably small residual.Consequently. we may choose v at our convenience. Chapter 2. For in this case the residual for the corrected vector is where. Hence Inserting iteration subscripts. A natural quantity to measure the quality of the current approximation z is the residual norm \\r\\2.\\v\\2. the problem is trivial in the Z coordinate system. which makes the first component of the residual equal to zero. Second. in principle.4.§/p||2. Third. Thus any residual norm will serve to bound ||r||2.8) the first two terms in this expression vanish. we have the following theorem. First. we have \\z\\2 = \\z\\2 + \\v\\2 — 1 -f.Thus our problem is to develop a bound for the residual norm corresponding to the corrected vector z = z + v in terms of the original residual norm ||r||2. it is not necessary to compute f.SEC. if v is small—which we may assume it to be in a convergence analysis — z is essentially normalized. although we should. as pointed out in the last paragraph. and hence by Theorem 1. f has the smallest possible residual. The remaining component is By the correction formula (2. 2.

k and Mk are converging to A and M.1. then the convergence of the method is linear with convergence ratios not greater than ae. .HkZk. Let (^. Chapter 4. convergence to a poorly separated eigenvalue requires a correspondingly accurate approximation. the eigenpairs (nk. When € = 0. Chapter 4. zjt) (k — 0. It can then be shown that Hk and Mjt (suitably defined) approach fixed limits A.t II2 converge monotonically to zero. and define and Then Local convergence The recurrence (2. the bound ae < I is essentially What this says is that if we are going to use an approximation A instead of A in our iteration. z^ converge to an eigenpair (A. ALTERNATIVES Theorem 2.9) can be made the basis of a local convergence proof for the orthogonal Newton-based method. z) of A.3. In other words.) be a sequence of normalized approximate eigenpairs produced by the orthogonal approximate Newton method applied to the matrix A = A + E..10. so that by Theorem 2. hk = <%. Since (j. if ||ro||2 is sufficiently small. When A is Hermitian. the quantity a^k + &kilk\\Tk\\2) & uniformly bounded below one. that Hence for £ not too large.. Let Tk = Az/j .402 CHAPTER 6.11. we have by Theorem 2. so that rjk = \\Tk\\2. the error in the approximation must be reasonably less than the separation of A from its neighbors in the sense of the function sep. If e > 0.Hence the convergence is cubic.Further let Z = (zk Z^) be unitary. We will not present the proof here. If then for all sufficiently small H^H. then the residual norms ||f. M. However.. In particular. the convergence is quadratic. the idea is to obtain uniform upper bounds a on (7k and e on e^ (the quantities r/k are uniformly bounded by ||A||2).

This diagonal approximation is associated with the name of Jacobi. In this case the approximate Newton's method consists of inner iterations to solve the correction equations and outer iterations consisting of Newton steps.rl. See the notes and references. Here we give three situations in which it may be desirable to take A ^ A. As with eigenvalue problems. In some applications. Chapter 2. where r is the center of a focal region containing the eigenvalues of interest. where D is the diagonal of A and E = D — A. Vol II: Eigensystems . the rate of convergence depends on the accuracy of the approximation A = A + E. • Constant shift approximations.12). In many instances it is sufficient to replace A — \i^I by (A — r/). Since //^ changes with each iteration. This bound is isomorphic to the bound (2./^/ = A . For example. where Ek = (fJtk ~ r K> then A . (Note that r is sometimes illogically called a target in the literature.9) simplifies to where in the above notation a^ — ||(M* .7). In the pure Newton method. we might split A = D — E. Moreover. for the local convergence of the QR algorithm. the method will converge rapidly. This variant of the approximate Newton method is closely related to the RQ variant of the QR algorithm. Such approximations arise when A can be split into a matrix A that is easy to invert (in the sense of being able to solve systems involving A — [ikI) and a small remainder E. we would take A = A.SEC. But this is not always possible. • Inexact solutions. so that the method converges quadratically. Ideally. In this case the approximate Newton method can be implemented solving only diagonal systems. It is natural to ask what effect a residual-based convergence criterion has on the convergence of the approximate Newton method. It turns out the analysis given above covers this case. • Natural approximations. our iteration bound (2. it is impossible to solve the equation A — Hkl directly. this may require a matrix factorization at each step. 2. then we must use an iterative method to solve the correction equation (2. we solve equations involving the matrix A — fikl. In this case. See the notes and references. if A is diagonally dominant with small off-diagonal elements. we generally terminate an inner iteration when the residual is sufficiently small. If we lack a natural approximation. if E is small enough.rl) l\\2. NEWTON-BASED METHODS 403 Three approximations In the approximate Newton method. This is no coincidence.) If we write A = A + Ek. This is an example of an approximation that changes from iteration to iteration.

it produces information that later speeds convergence to other eigenpairs in the focal area. However. there is no real control over what eigenvalue the method will converge to.9) can guide our choice of a convergence tolerance for the inner iteration. Chapter 2.This fact along with the iteration bound (2. THE JACOBI-DAVIDSON METHOD The Jacobi-Davidson method is a combination of the approximate Newton iteration and Rayleigh-Ritz approximation. it is reasonable to assume that the iterative method in question produces an approximate solution v that satisfies (/ — zz^)v = v. The method is new and less than well understood.11). and it is technically interesting— and that is reason enough to include it. Moreover. it has been used successfully. In particular. Moreover. If we do not start the method close enough to an eigenpair. it followsfrom(2. as we shall see. it sometimes happens that while the method is converging to one eigenpair. ALTERNATIVES such that Now since (/ . Moreover.404 From Theorem 1. It has the advantage that the Rayleigh-Ritz procedure can focus the Newton iteration on a region containing the eigenvalues we are interested in. it can be combined with an effective deflation technique that prevents it from reconverging to pairs that have already been found. . there is a matrix CHAPTER 6.ZZE)S = s.zzH)r = r. Nontheless. This implies that we may move E inside the brackets in (2. so that Thus the effect of terminating the inner iteration with a residual 5 is the same as performing the outer iteration with an approximation in which ||£||2 = ||s||2.3. 2. it is promising. and its presence in a work whose concern is mostly the tried and true is something of an anomaly. The basic Jacobi-Davidson method The approximate Newton method is suitable for improving a known approximation to an eigenpair. it has two drawbacks.2.10) that (/ . it may converge to eigenpairs whose eigenvalues lie entirely outside the region we are interested in. We will begin with a general formulation of the Jacobi-Davidson algorithm and then introduce the refinements necessary to implement it. 1.

In particular (up to normalization). Yet this is only one vector in the correction space spanned by z0. 4. The correction is then folded into the correction subspace and the process repeated. we are free to choose one near a focal point. The quantity // is the Rayliegh quotient corresponding to z. 6. It is greedy. z)'.ZZH)V = -r.. 1. Algorithm 2. 10. because we can choose from among k+1 Ritz pairs. //. 1. T is the center of a focal region in which we wish to find eigenpairs. •> vk and use a Rayleigh-Ritz procedure to generate 2^+1.kmax Using V" compute a normalized Ritz pair (/^.zzu)(A.. Vk. 11. Although the Rayleigh-Ritz method is not infallible. and K. v J_ z Orthonormalize w with respect to V V = (V v) end for k Signal nonconvergence Algorithm 2. fi Choose a shift K near r or fj. the new approximation will generally be better than (2. Solve the system (/. . 2.SEC.K!)(! . 9. and it is possible that another vector in this space will provide a better approximate eigenvector. 3. this algorithm attempts to compute an eigenpair whose eigenvalue is near a focal point r. The matrix A is a suitable approximation of A... V =v for & = !. Moreover. Specifically. the correction vector v^ is applied once to the current approximate z^ and discarded—much as past iterates in the power method are thrown away.. This suggests that we save the vectors ZQ.1 implements the basic Jacobi-Davidson algorithm. z] such that /z is near r r — Az — fj..z 5. . we mean that the method attempts to do as well as possible at each step without any attempt at global optimization. If it is expensive to vary K in solving Vol II: Eigensystems . As stated in the prologue. The algorithm contains three shift parameters: r. By greedy..1: The basic Jacobi-Davidson algorithm o 2.12). . NEWTON-BASED METHODS 405 Given a normalized starting vector z. It consists of a basic Rayleigh-Ritz step to generate a starting vector for an approximate Newton step. so that we can mitigate the effects of wild jumps in the Newton iterates.. if (r is sufficiently small) return (^. 2. 8.VQ. VQ.

95".67.406 CHAPTER 6. where e is a vector of random normal deviates. The table in Figure 2.. Example 2.24 to 0..95. The following example shows that the Rayleigh-Ritz refinement can improve the convergence of the approximate Newton method.28 to 0. if it is inexpensive to vary K and A — A. To slow convergence so that the process has time to evolve.953.. ALTERNATIVES Figure 2. we begin with a diagonal matrix A with eigenvalues 1.1...041. Chapter4..1. which is atypical..1: Convergence of the Jacobi-Davidson method o the correction equation. However. As in Example 3. (All this excludes the first step. The Jacobi-Davidson algorithm was used to approximate the eigenvector ei. in Case 2. the second.1 exhibits the results.. The convergence ratios for the first case decrease from about 0. The first column shows the convergence of the residual norm in Case 1. K was set to // + 0. Two cases were considered: Case 1 in which the Rayleigh-Ritz step was included and Case 2 in which it was not. then by the analysis of the last section the choice K = fj. while for the second case they increase from a value of 0.) .952. will give quadratic convergence—provided something does not go wrong with the Rayleigh-Ritz procedure.4. It is seen that the JacobiDavidson method is converging faster than the ordinary approximate Newton method. A starting vector u was obtained by normalizing a vector of the form ei + 0. it will generally be chosen equal to r.05e.

up}.SEC. Specifically. If we have found a set ui. NEWTON-BASED METHODS 407 The third column shows the tangents of the angles between the correction space and the eigenvector ei. we will begin with a very accurate starting vector.. let (U t/j_) be unitary. we cannot say just how the iteration should be restricted without knowing the unknown eigenvectors. and let (z Z±) be unitary. we must take care that the iteration does not reconverge to a vector that has already been found. It follows that Vol II: Eigensystems ... The example actually shows more. we know that the other Schur vectors lie in the orthogonal complement of span(wi. the restriction can be cast in terms of orthogonal projections involving known vectors. we must restrict the iteration to proceed outside the space spanned by the previous eigenvectors. 2. U satisfies the relation for some upper triangular matrix T = U^AU. An algorithm to do this would allow us to build up ever larger partial Schur decompositions a step at a time.e. We begin by observing that we can extend (2. They track the residual norms rather closely and show that it is the Rayleigh-Ritz approximation that is driving the convergence.. Let us begin by assuming that we have an orthonormal matrix U of Schur vectors of A.13) by mimicking the reduction to Schur form.. A cure for this problem is to work with Schur vectors.If we decide at the last iteration to declare the iteration has produced a satisfactory approximation to ei and start looking for approximations to e2. and write Now suppose (/i. i.. Algorithm 2. . Then is block upper triangular.1 is designed to find a single eigenpair.. z] is a normalized eigenpair of B. Unfortunately. The first issue is the matter of deflation. When more than one eigenpair is to be computed. Thus. . that is. The last column exhibits the tangents of the angles between the correction subspace and the subdominant eigenvector 62. Extending Schur decompositions We now turn to implementation issues. We wish to extend this partial Schur decomposition to a partial Schur decomposition of order one greater. up of Schur vectors.

Note that because V is orthogonal to H(U).z. an orthonormal matrix V of orthogonalized corrections. We will assume that V and v are orthogonal to 7£(J7). z).15). Note that a knowledge of the eigenvector y is all that is necessary to extend the partial Schur decomposition. Let be the orthogonal projector onto the column space of U±. Then it is easily verified that Now suppose that (p. CHAPTER 6. We now compute the residual . ALTERNATIVES is a partial Schur decomposition of A of order one higher than (2. we will use the Jacobi-Davidson procedure. We next compute the Rayleigh quotient B = VTA±V and a Ritz pair (/i.15). This typically dense matrix process is not much use in the large sparse problems that are our present concern. it follows on multiplying by U± that Bz = fj.408 Hence. The ingredients for a step of the procedure are the matrix U of Schur vectors.z. However.13). Since py = A±y = U±Bz. z) is an eigenpair of B. y) satisfies Conversely. For it follows from (2. Then y = U±zt where z — U±y. and let y = U±. In other words there is a one-one correspondence to eigenvectors of A± lying in 7£( t/j_) and eigenvectors ofB. the following observation allows us to recast the algorithm in terms of the original matrix A. We begin by orthogonalizing v with respect to V and incorporating it into V. and a new correction vector v. suppose y satisfies (2. the Rayleigh quotient can also be computed from the original matrix A in the form VTAV.14) that the extended partial Schur decomposition is where To compute a vector y satisfying (2.. Then (ji.

To assess the size of s.g. we solve the correction equation where v _L U. What we really want is to get a satisfactorily small residual when we substitute z for y in (2. There are several comments to be made. z) can be accepted as our eigenpair. to get another correction to add to V. • Again excluding operations with A.17)—that is.\iz and orthogonalize it against U. z. multiplying s by UH. we get It follows to that we may use r in a convergence test. if there is cancellation when v is orthogonalized against V. • The routine EPSD (Extend Partial Schur Decomposition) is designed to be used by a higher-level program that decides such things as the number of Schur vectors desired. the mainfloating-pointoperations come from matrix-vector products such as Vp. On the other hand.2 implements this scheme. However. we compute the size of its components in U and U±. and how to restart when the arrays V and W are about to overflow storage. we can expect the operation count to be a multiple of n(m—£)2 flam. Algorithm 2.. since by definition U^v = 0. the focal point r (which may vary). Finally. 2. if we multiply s by Pj_ = U±U±. Thus if U is not too large and the algorithm returns at k = m < m. NEWTON-BASED METHODS 409 and test to see if it is small enough that (//. • We do not compute the residual r directly. First. we want to be sufficiently small. Thus v should be orthogonalized against both U and V—with reorthogonalization if necessary. This is done in statement 5. There is a subtle point here. we get Hence Ps = 0. Rather we compute Az . in statements 4 and 15). the results will no longer be orthogonal to U to working accuracy.SEC. we do not have to orthogonalize v against U in statement 3. • Excluding operations with A (e. it is not necessary to recompute the entire Rayleigh quotient C — only its southeast boundary. the storage requirements are up to 2nm floating-point numbers. Vol II: Eigensystems . • As V grows. • In principle.16) and (2.

2: Extending a partial Schur decomposition o . EPSD returns the next correction vector v. an additional correction vector v. EPSD extends it to a partial Schur decomposition of the form The inputs. W = AV.410 CHAPTER 6.C/f/ H . If convergence does not occur before that happens. Pj_ = / . ALTERNATIVES Given a partial Schur decomposition AU = t/T. It is assumed that the arrays V and W cannot be expanded to have more than m columns. are V. and an approximation A = A + E. In the following code. in addition to /7. an nx (•£—!) matrix of orthonormalized correction vectors.. C — WHV\ a focal point r. Algorithm 2. and AX = PL APj.

If such a vector is accepted as a Ritz vector. Otherwise put. we noted that an interior eigenvalue could be approximated by a Rayleigh quotient of a vector that has nothing to do with the eigenvalue in question. Ultimately. 2. Recall (Definition 4. we may proceed as follows. of course. and a radius />. Specifically. Unlike Krylov sequence methods. we replace V with a basis for the primitive Ritz subspace corresponding to the Ritz values within p of r. the natural choice of K is either r or /z. V will become so large that it exceeds the storage capacity of the machine in question. it may make the correction equation harder to solve—requiring. It is possible. however. a focal point r. Restarting Like Krylov sequence methods. The basis V contains hard-won information about the vectors near the focal point. An alternative is to use the harmonic Rayleigh-Ritz procedure.SEC. which discourages imposters. Here we use the notation of Algorithm 2.r!)U. then Vol II: Eigensystems . information that should not be discarded in the process of restarting. the Jacobi-Davidson algorithm builds up a basis V from which eigenvectors can be extracted by the Rayleigh-Ritz method. if WR is the QR factorization of (A . But it is better to have a good vector in hand than to know that a vector is bad. the Rayleigh-Ritz method may fail. Chapter 4.4. to abuse that freedom. At that point we must restart with a reduced basis. a matrix factorization at each step. but since \JL varies with fc. We can. Vp) is a harmonic Ritz pair with orthonormal basis V and shift r if Moreover. detect these imposters by the fact that they have large residuals. Harmonic Ritz vectors In §4.11. The later choice will give asymptotically swifter convergence. Chapter 4) that (r + £. given V".2. the Jacobi-Davidson method does not have to preserve any special relation in the new basis — we are free to restart in any way we desire. NEWTON-BASED METHODS 411 • As mentioned above. for example. Fortunately we can extract that information by using a Schur-Rayleigh-Ritz process.

The principle change in the harmonic extension (Algorithm 2.3). ALTERNATIVES so that we can calculate 6 and p from this simpler and numerically more stable generalized eigenproblem.412 CHAPTER 6. the Rayleigh quotient C is Hermitian. Having computed a primitive harmonic Ritz pair (p.2. Hermitian matrices When A is Hermitian some obvious simplifications occur in Algorithm 2. is in how the harmonic Ritz vectors are . so that EPSD need only return the converged Ritz pair (statement 13). In §4. Although it is more complicated than Algorithm 2.2 (EPSD). Instead we must use the ordinary Rayleigh quotient p^VAVp.2 owing to the need to keep the columns of W orthogonal. we have the following algorithm. we cannot use the harmonic eigenvalue in computing the residual. Note that the matrix B — W^ U can be updated as we bring in new correction vectors v.3 implements the harmonic version of partial Schur extension. For example.18). since that value will not make the residual orthogonal to V. Note the indirect way of updating the QR factorization of the restarted matrix It is faster than simply forming WRQ[:. it has comparable storage requirements and operation counts. The Schur form is diagonal. Similarly economies can be achieved in the restarting procedure (2. we showed that where z — Vv is the harmonic Ritz vector. A little algebra then shows that Algorithm 2. other than the natural economies listed in the last paragraph. The algorithm for restarting the harmonic extension algorithm is more complicated than Algorithm 2. Chapter 4.4. Specifically. r + 6). so that only the upper half of C need be retained. 1:1] and computing its QR factorization.

If convergence does not occur before that happens. and an approximation A = A + E. 2. HEPSD extends it to a partial Schur decomposition of the form using harmonic Ritz vectors. Algorithm 23: Harmonic extension of a partial Schur decomposition o Vol II: Eigensystems . an additional correction vector v\ the QR factorization WR of (A — r!)V. The inputs. It is assumed that the arrays V and W cannot be expanded to have more than m columns.L = I . the current harmonic pair (#.SEC. an nx(l—1) matrix of orthonormalized correction vectors.UUH. NEWTON-BASED METHODS 413 Given a partial Schur decomposition AU — UT. are V. R). In the following code P. in addition to U. and Aj_ = P^AP_L. a focal point r. HEPSD returns the next correction vector v.

the nonsymmetric eigenproblems (4.22). . where a is a solution of (2. Chapter 4.20). we will assume that a solution x exists.414 CHAPTER 6. then the bulk of the work in the algorithm will be concentrated in the formation of Q = M~1Q. it can be used whenever another solution involving M and Q is required. As noted in §4. then x satisfies (2. Solving projected systems We now turn to the problem of solving the correction equation It will pay to consider this problem in greater generality than just the correction equation.4 implements this scheme. • Statement 1 anticipates the use of this algorithm to solve several systems.23). If C has been precomputed. Chapter 4.21). If it is expensive to solve systems involving the matrix M.4. Since x _L Q. ALTERNATIVES computed. We should therefore use the Hermitian eigenproblem to compute harmonic Ritz pairs. a linear system from which we can compute a. we have or Hence.22) and (4. Algorithm 2. we will consider the general system in which we will assume M is nonsingular. It is easy to verify that if x is defined by (2. both for residuals for the correction equation and for restarting. To derive the solution of this system. can give complex harmonic Ritz pairs. Therefore. Here are three comments. when A is Hermitian.

Even when there is no cancellation. it should be one that can be stably updated without pivoting—e. then one can solve the system Mz z and update C in the form In both cases there is a great savings in work. But if a factorization is to be updated. For more see the discussion on economization of storage on page 391. In the first place. if Q is augmented by a single vector z. 2. a QR decomposition. we can set Q = (U z) and use Algorithm 2. while the vector x can be very large. we can expect cancellation in statement 5. NEWTON-BASED METHODS This algorithm solves the projected equation 415 where Q is orthononnal. • If Mis ill conditioned but (I-QQH)M(I-QQH) is not. we must allocate additional storage to contain Q — at least when M is nonsymmetric.4 to solve the correction equation. the algorithm requires that we solve systems of the form Vol II: Eigensystems . the savings will be insignificant. if n is very large. The mechanics of the algorithm argue strongly against changing the shift K frequently. and M is nonsingulai Algorithm 2. x is likely to have inaccuracies that will propagate to the solution x.g.4: Solution of nroiftcted ennations Likewise. then the vector x will not be large.SEC. However.. In this case. However. The correction equation: Exact solution We now return to the solution of the correction equation Since z _L U. • We could go further and precompute or update a factorization of C by which we could calculate the solution of the system in statement 4.

we can precompute the matrix C. Thus our equivalent of (2.The preconditioner M is determined from the matrix A — AV/. Second. First. it is easy to use Krylov sequence iterations with it. we have more . the matrix K is. Thus.2 and 2. But that is not the whole story. (/ — QQH)(A — K!)(! — QQ^). Since the Jacobi-Davidson algorithm already requires that we be able to form matrix-vector products in A. The idea is to determine a matrix M — usually based on special characteristics of the problem in question—such that M~1K is in some sense near the identity matrix. it could retard convergence to other eigenpairs. if we can find an effective preconditioner that does not depend on K. In the Jacobi-Davidson method. The subject of iterative methods is a large one—far beyond the scope of this volume.23) is the following algorithm. by a process called preconditioning. not the shifted matrix. Let us consider a general linear system Ku — b. where Q = (U z). The most successful of the various iterative methods for solving this system are based on Krylov sequences. The system in statement 2 can be solved by Algorithm 2.4 suggest that we should avoid ill conditioning in A — K!. This requires a factorization of A.QQH)( A — «/)(/ QQH). and update it by adding the vector z as described above. This means we should avoid a focal point r that accurately approximates an eigenvalue A of A. we can form the matrix-vector product z = M~lKw by the following algorithm.4. as usual. which has to be recomputed whenever K changes. In application to the Jacobi-Davidson algorithm. If it is cheap to solve systems in M. Second. the matrix is (/. ALTERNATIVES (A — K!)X = y.3 as the shift K. to describe how iterative solutions are used in the Jacobi-Davidson method. Fortunately.416 CHAPTER 6. This means that we only need to be able to form the matrix-vector product Kw for an arbitrary vector w. we can vary K — at least within the limits for which the preconditioner is effective. All this suggests that we use the focal point r in Algorithms 2. it itself must be projected. we need only know the operations we are required to perform with the matrix of our system. The correction equation: Iterative solution An alternative to the exact solution of the correction equation is to use an iterative method to provide an approximate solution. The use of a preconditioned solver gives us more freedom to choose a shift K. the system to be solved involves the preconditioner. The convergence of most Krylov sequence methods can be improved. if K does not change. Even though such a focal point will speed up convergence to A. however. since the preconditioner must approximate the projected system. often dramatically. Our comments on the accuracy of Algorithm 2.

4 (which the reader should review). The table in Figure 2.4. The convergence is satisfactory.01 is introduced in the correction equation. Example 2. since we work only with the projected operator (/ — QQ^)(A — Kl)(I — QQ^} and never have to explicitly solve systems involving A -AC/. there is very little hard analysis to guide us in this matter. The only difference is that at each stage a norm wise relative error of 0. The following example illustrates this phenomenon.SEC. However. If we decrease the error in the solution to the correction equation in the above exVol II: Eigensystems . the approximations to 62 in the correction space stagnate after an initial decline.2: Convergence of the Jacobi-Davidson method with errors in the correction equations freedom to place K near an eigenvalue. 2. though not as fast as in Example 2.2 shows the progress of the iteration (for a different random starting vector). NEWTON-BASED METHODS 417 Figure 2. this lack of accuracy can exact a price when we come to compute other eigenpairs. The data is the same as in Example 2. Unfortunately. and there is no reason to solve the correction equation more accurately than to that number of figures. There is a tension between the local convergence driven by the approximate Newton iteration and the broader convergence driven by the correction subspace.5. we can choose to solve the correction equation more or less accurately. However. only a certain number of figures of v actually contribute to the improvement of z. From a local point of view. Since we are free to stop an iterative method at any time.

Newton. partitioning the R. Many people have applied Newton's method to the eigenvalue problem. Specifically. the convergence to ei is marginally faster. we must have (here we are ignoring second-order terms in the normalization of the first row and column of the Q-factor). working with a polynomial equation p(x) = 0. The scheme involved explicitly computing the polynomials p. he seems to have been the originator of the idea expanding and linearizing to compute corrections to zeros of functions —just the approach we use here. suggesting that the approximations to vectors other than the current vector will stagnate at about the level of the error in the solution of the correction equations.and Q-factors. should stagnate at the level of \\E\\. — obviously a computationally expensive procedure. They justify the choice w — z by observing that if we require z and z + w to be normalized. in which case the extra work in computing the polynomials is justified. but the convergence of approximations to 62 stagnates at 5-10~3 and 5-10~4. 2. etc. More generally.) Simpson showed how to effect the same calculations by working only with p. so that approximations to eigenvectors of A. The formulation in terms of derivatives is due to Simpson. linearizing. The fact that asymptotically Newton's method and the Rayleigh quotient method produce essentially the same iterates can be established by a theorem of Denis and Schnabel [64] to the effect that any two superlinearly converging methods launched from the same point must up to higher-order terms produce the Newton direction. The relation of Newton's method to the RQ variant of the QR algorithm is best seen by considering the RQ factorization of A—K! in the Z coordinate system. NOTES AND REFERENCES Newton's method and the eigenvalue problem According to Kollerstrom [150] and Ypma [307] neither Newton nor Raphson used the calculus in the derivation of their methods. q. r. For some references. with w — z is due to Peters and Wilkinson [211]. and solving for e. then up to second-order terms in w we must have WTZ = 0. any convergence of the correction subspace should be toward eigenvectors of A = A + E. Newton was actually interested in computing series expansions for x. He then continued by expanding q(d -f e) = r(e). Peters and Wilkinson [211] explore in detail the relation of Newton's method to the inverse power method. (However. It then follows that .3). see [4]. The formula (2.418 CHAPTER 6. when the approximate Newton method is used. It is ironic that although Newton cannot be credited with the formula that bears his (and Raphson's) name.3. other than the one currently being found. ALTERNATIVES ample to 10~3 and 10~4. expanded p(x + d) = q(d) and ignored higher terms in d to get a linear equation for d.

5: Davidson's method o so that up to second-order terms the correction from the Q-factor and the Newton correction are the same. it differs from the approximate Newton method given here in that we are working with a special constrained problem and the approximation is not to the Jacobian itself. which is essentially an approximate inverse power method. where D is the diagonal of A and an orthonormal matrix V of correction vectors and returns a Ritz pair (^u. Pi is the. from which it computes a Ritz vector and a new correction that is folded back into the subspace. the Jacobi-Davidson method might better be called the Newton-Rayleigh-Ritz method. they rediscovered Vol II: Eigensystems . 2. and Steihaug [55] introduced and analyzed inexact Newton methods. To honor these distinctions we use the term "approximate" rather than "inexact" Newton method. The chief difference is in the correction equation. The Jacobi-Davidson method As it has been derived here. In 1990. NEWTON-BASED METHODS 419 This algorithm takes a symmetric diagonally dominant matrix A = D + E. less favorable problems it tends to stagnate. 1975]. Eisenstat. It is very close to Algorithm 2. so that the asymptotic relation between the inverse power method and Newton's method insures that Davidson's method will work well. Olsen [186] and his coauthors observed that the lack of orthogonality of v to z was a drawback. However. which is sketched in Algorithm 2. nroiftction on to 7?. The story starts with Davidson's algorithm [53. in which the Newton equation is solved inexactly.1 — it builds up a correction subspace. In some sense these methods are related to the approximate Newton methods introduced here. For diagonally dominant matrices the approximation is quite good. Dembo. since any residual in the approximate solution of the Newton equation can be thrown back onto the Jacobian and conversely any perturbation of the Jacobian in the Newton equation corresponds to an approximate solution with a nonzero residual. Algorithm 2.SEC. and it is instructive to see how the current name came about. Inexact Newton methods In 1982. Using first-order perturbation expansions.5. z) approximating the fcth smallest eigenvalue of A and an augmented correction matrix R Here. But in other. (V"!-1-.

It is worth noting that the author's application was so large that they could not save the corrections. and their colleagues also adapted the algorithm to harmonic Ritz vectors and the generalized eigenvalue problem. Specifically.420 CHAPTER 6. 245] Sieijpen. In 1996. which expresses the correction in terms of a projection of the original matrix. There the authors accumulate R~l in the basis V. and hence what they actually used was the approximate Newton method. but with an approximation as in (2. he computes a refined eigenvector of the form This is easily seen to be one step of the approximate Newton method with a diagonal (Jacobi) approximation. It is the key to making the method effective for large sparse matrices. 244].6). .19). Sleijpen and van der Vorst extended this example to general matrices. their major contribution was the introduction of the correction equation. they went on to show how to deflate the problem and to incorporate iterative solutions with preconditioning. ALTERNATIVES the matrix formulation of Peters and Wilkinson. Templates may be found in [241. Thus Olsen's method is Davidson's method with an approximate Newton correction. In later papers [74. Jacobi partitioned a dominant symmetric matrix A in the form Assuming an approximate eigenvector in the form ei. In particular. thus deriving the orthogonal Newton step.3). 242. From the point of view taken here.243. Sleijpen and van der Vorst [240] derived essentially the same algorithm by considering an algorithm of Jacobi [128.1846] for approximating eigenvectors of diagonally dominant matrices. 239. van der Vorst. Hermitian matrices The implementation for harmonic Ritz vectors suggested here differs somewhat from the implementation in [243]. We have preferred to leave V orthogonal and work with the harmonic Rayleigh-Ritz equation (2. and solved it to get the correction formula (2.

Here we confine ourselves to a review of some of the more important notation and facts. SCALARS. This background has been presented in full detail in Volume I of this work. The set of real numbers will be denoted by R. Vectors are denoted by lowercase Latin letters. its components will be denoted by a corresponding lowercase Latin or Greek letter. An m x n matrix is a rectangular array of mn scalars having m rows and n columns. its elements will be denoted by a corresponding lowercase Latin or Greek 421 . If a matrix is denoted by a letter.APPENDIX: BACKGROUND The reader of this volume is assumed to know the basics of linear algebra and analysis as well as the behavior of matrix algorithms on digital computers. 1. 2. 3. The scalars that form a vector are called the components of the vector. AND MATRICES 1. The set of real mxn matrices is denoted by R m xn. Scalars are real or complex numbers. Thus the vector a may be written Figure A. The set of real nvectors is denoted by Rn. If a vector is denoted by a letter.I gives the Greek alphabet and its corresponding Latin letters (some of the correspondences are quite artificial). VECTORS. 5. All scalars are denoted by lowercase Latin or Greek letters. An n-vector is a collection of n scalars arranged in a column. the set of complex numbers by C. Matrices are denoted by uppercase Latin and Greek letters. 4. the set of complex n-vectors by C n . the set of complex mxn matrices by C m x n . The scalars in a matrix are called the elements of the matrix.

It is written trace(A). If A is of order n.422 APPENDIX: BACKGROUND Figure A. The determinant of A is written det(A). 9. • • • . then det(iiA) = iin(A). An n x n matrix is said to be square of order n. its inverse is written A~l. and the conjugate transpose as A^. det(AH) = det(A). The trace of a square matrix is the sum of its diagonal elements. 3. The matrix sum and matrix product are written as usual as A -f B and AB. 2. ^22. It has the following useful properties. . 4. 8. det(AB) = det(A)det(£). Thus the matrix A may be written 6. 11. We also write 10. The transpose of A is written A1. Akk. The conjugate of a matrix A is written A. If A is block triangular with diagonal blocks AH . then det(A) = det(4n) det(422) • • -det(Akk). 1.I: The Greek alphabet and Latin equivalents o letter. det(A~l) = det(A)-1. If A is nonsingular. 1. 5.

7. If A and B are matrices of the same dimension.. in... 1. A square matrix whose off-diagonal elements are zero is said to be diagonal.. or On when it is necessary to specify the dimension. 18. 15. or O m X n when it is necessary to specify the dimension. . A matrix A is symmetric if A1 = A. Sn). . It is Hermitian if AH = A. . 17. Similarly for the relations A < B. Vol II: Eigensystems . SCALARS. f>n is written diag(<$i. in is a permutation of the integers 1. 14. The vector whose components are all one is written e. A real matrix U is orthogonal if U^U — I. The vector whose Uh component is one and and whose other components are zero is called the z'th unit vector and is written e t ... 21. 22. The matrix /„ whose diagonal elements are one and whose off-diagonal elements are zero is called the identity matrix of order n. 12... . Premultiplying a matrix by the cross operator reverses the order of its rows. | det( A) | is the product of the singular values of A. VECTORS.in- 19. 23. When the context makes the order clear. If ii. The matrix is called the cross operator. Any vector whose components are all zero is called a zero vector and is written 0. . the matrix is called a permutation matrix. . A matrix U is unitary if UHU = /. AND MATRICES 423 6. Postmultiplying a matrix by the cross operator reverses the order of its columns. . an identity matrix is written simply /. det( A] is the product of the eigenvalues of A. A diagonal matrix whose diagonal elements are ^ j .. A matrix with orthonormal rows or columns is said to be orthonormal. n. Any matrix whose components are all zero is called a zero matrix and is written 0. 20. Postmultiplying a matrix by P rearranges its columns in the order *!>•••. 16. and A > B. The absolute value of a matrix A is the matrix whose elements are the absolute value of the elements of A. we write A < B to mean that ij < bij. . Premultiplying a matrix by P rearranges its rows in the order n . . . . It is written \A\. A > B. a 13.SEC.

The subspace spanned by a set of vectors X is written span(^f).g. 4. 27. A rectangular matrix whose elements above the main diagonal are zero is upper trapezoidal. 5. The dimension of a subspace X is written dim(X}. A rectangular matrix whose elements below the main diagonal are zero is upper trapezoidal. X. The null space of A is N(A) — {x: Ax — 0}. sets of scalars. With the exception of the sets R. but a zero is always 0 or 0. If X and y are disjoint subspaces.. The symbol "X" stands for a quantity that may or may not be zero (but usually is not). or matrices are denoted by calligraphic letters — e. 2. vectors. The following is a Wilkinson diagram of a diagonal matrix of order four: Here the symbol "0" stands for a zero element.. Triangular or trapezoidal matrices whose main diagonal elements are one are unit triangular or trapezoidal. LINEAR ALGEBRA 1. The column space of a m x n matrix A is the subspace H(A) = {Ax: x 6 F71}. 25. The nullity of A is .424 APPENDIX: BACKGROUND 24. then 3. Let F be R or C depending on the underlying scalars. 2. En. A Wilkinson diagram is a device for illustrating the structure of a matrix. Lower and upper triangular matrices have Wilkinson diagrams of the form 26. etc. C. The rank A is 6. Letters other than X are often used tc denote such elements.

The matrix A 2 is unique. POSITIVE DEFINITE MATRICES 1. Any principal submatrix of a positive definite matrix is positive definite..1) computes the LU decomposition.3). We call the factorization the LU decomposition of A and call the matrices L and U the L-factor and the U-factor. If A is positive definite. 3. In the real case. An nxp matrix X is of full column rank if rank(J^) = p. If A is positive (semi)definite. A is positive (semi)definite if and only if its eigenvalues are positive (nonnegative). 6. then A is nonsingular and A~l is positive definite. THE CHOLESKY DECOMPOSITION 425 7. 2. 4. its determinant is nonnegative. Then A can be factored uniquely in the form where L is unit lower triangular and U is upper triangular. 7. If A is positive semidefinite. 5. Then A is positive definite if If x / 0 => x Ax > 0. 2. THE LU DECOMPOSITION 1. Let A be of order n. then there is a positive (semi)definite matrix A 7 such that 8. Let A be positive definite and let X G C n x p . Pivoting may be necessary for stability (see Algorithm 1:3. its determinant is positive.1. Then X^AX is positive semidefinite. Be warned. 3. The diagonal elements of a positive definite matrix are positive.SEC. It is of full row rank if rank(X) = n. then A is positive semidefinite. Let A be of order n and assume that the first n-1 leading principal submatrices of A are nonsingular. i 9. We say that X is of full rank when the context tells whether we are referring to column or row rank (e. when n > p). Gaussian elimination (Algorithm 1:3. 5. some people drop the symmetry condition and call any matrix A satisfy ing the conditions ^ 0 => xHAx > 0 a positive definite matrix.1. 4. If A is positive definite. Vol II: Eigensystems .g. It is positive definite if and only if X is of full column rank.

Then there is a unitary matrix Q of order n and an upper triangular matrix R of order p with nonnegative diagonal elements such that This factorization is called the QR decomposition of A.2 computes a pivoted Cholesky decomposition. See Algorithm 1:3.426 APPENDIX: BACKGROUND 5. the QR factorization is unique. 6. If X is of full rank. 3.5. The Cholesky decomposition can be computed by variants of Gaussian elimination.2. This factorization is called the Cholesky decomposition of A. Let X be an nxp matrix with n > p. If we partition Q — (Qx QL)-> where Qx has p columns. 4. there is a permutation matrix P such that where the elements of R satisfy Algorithm 1. . 2. If X is of full rank. There is a pivoted version of the decomposition. 2. 3. THE QR DECOMPOSITION 1. then Qx is an orthonormal basis for the column space of X and Q± is an orthonormal basis for the orthogonal complement of the column space of X.1. The matrix Q is the Q-factor of A. and the matrix R is the R-factor. It is unique.2. The matrix R is called the Cholesky factor of A. then we can write which is called the QR factorization of R. Specifically. THE CHOLESKY DECOMPOSITION 1. Any positive definite matrix can be uniquely factored in the form where R is an upper triangular matrix with positive diagonal elements.

then P = X Jf H . PROJECTORS 1.1. 4. It projects onto y± along X. which is said to project onto X along the orthogonal complement y± of y. then P is a projector (or projection).SEC. PROJECTORS 427 5. If P is a Hermitian projector. There is a pivoted version of the QR decomposition. 7. I—P is its complementary projector. Such a projector is called an orthogonal projector. The QR factorization can be computed by the Gram-Schmidt algorithm (Algorithms 1.P)x <E y±.2.4.1. If P is a projector onto X along y±. Vol II: Eigensystems .4. 6.5.1 computes a pivoted QR decomposition. A = XHX is positive definite. 7. there is a permutation matrix P such that XP has the QR decomposition where the elements of R satisfy Algorithm 1. and R is the Cholesky factor of A. where X is any orthonormal basis for 7£(P). 2. Specifically. Any vector z can be represented uniquely as the sum of a vector Px G X and a vector (/ .11-13). The QR decomposition can be computed by orthogonal triangularization (Algorithm 1.2). 3. 7. If P2 = P. If X is of full rank. Its complementary projector is written Pj_. Any projector P can be written in the form The spaces X — 7Z(X) and y = TZ(Y) uniquely define P.

This page intentionally left blank .

[10] Z. Note on the quadratic convergence of Kogbetliantz's algorithm for computing the singular value decomposition. Demmel. C. Du Croz. 1995. [7] J. Z. Demmel. Demmel. [9] Z. 57:269-304. Ruhe. The solution of characteristic value-vector problems by Newton's method. Reichel. On Bernoulli's numerical solution of algebraic equations. Baglama. M. Calvetti. Anselone and L.ps. 146:203-226. Reichel. Bai. II: The evaluation of the latent roots and latent vectors of a matrix. J. 186:73-95. S. Demmel. Sorensen. Cited in [29]. Hammarling. C. 9:17-29. McKenney. 37:400-440. C. B. A. Numerische Mathematik. Philadelphia. J. Ostrouchov. Quarterly of Applied Mathematics. E. W. Anderson. [11] Z.1926. Computation of a few small eigenvalues of a large matrix with application to liquid crystal modeling. [2] A. Journal of Computational Physics. and D. A. Rail.1988. Proceedings of the Royal Society of Edinburgh.REFERENCES [1] A. The principle of minimized iterations in the solution of the matrix eigenvalue problem.1951. and L.1996. 429 .1998. D. van der Vorst. Proceedings of the Royal Society of Edinburgh. J. and H. SIAM. Also available online as LAPACK Working Note 8 from http: //www. On swapping diagonal blocks in real Schur form. [8] Z. second edition. 46:289-305. J. SIAM. 2000. International Journal of High Speed Computing. Philadelphia. [4] P. 1:97-112. S. 104:131-140. Bai.1937. [3] E. Bai. 1989. [5] W. D. Studies in practical mathematics. Bai and J. 11:38-45. Aitken.org/lapack/lawns/lawn08 . A.1993. Greenbaum. Linear Algebra and Its Applications. BIT. Dongarra. Baglama. Amoldi. Aitken. Calvetti. and L. Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. [6] J. Bischof.netlib. Dongarra. LAPACK Users' Guide. Iterative methods for the computation of a few eigenvalues of a large symmetric matrix. On a block implementation of Hessenberg QR iteration. Bai and J. J. Linear Algebra and Its Applications.1968.

Bounds for the variation of the roots of a polynomial and the eigenvalues of a matrix. Stewart. and J. and G. [18] R. Error analysis of update methods for the symmetric eigenvalue problem. Ben-Israel. Philadelphia. H. [24] R. Also in [305. Published in the USA by John Wiley. H. Sulle funzioni bilineari. 1970. Bavely and G.1971. [25] A. Harlow. 14:598-816. SIAM Journal on Matrix Analysis and Applications. Algorithm 776: SRRIT—A Fortran subroutine to calculate the dominant invariant subspace of a nonsymmetric matrix. Das Verfahren der Treppeniteration und verwandte Verfahren zur Losung algebraischer Eigenwertprobleme. S. Department of Computer Science. Bhatia. [17] C. 23:494-513. W. New York. Barth. L. 16:359-367. 8:820-826. Wilkinson. W. Bellman. ACM Transactions on Mathematical Software. Pitman Research Notes in Mathematics. Perturbation Bounds for Matrix Eigenvalues. [19] E. Stewart. Bartels and G. 1993. Linear Algebra and Its Applications. [21] M. 1996. 1990. 1979. [22] R. 9:386-393. SIAM Journal on Numerical Analysis. International Journal of Supercomputer Applications. Beltrami. 6:13-49.BX = C.1992. Algorithm 432: The solution of the matrix equation AX . Springer. second edition. Bauer.430 REFERENCES [12] Z. Minneapolis. Stewart. pages 249-256]. 1987. [14] R. Numerische Mathematik. Bhatia. An English translation by D. Bhatia. [15] W. Bjorck. Numerical Methods for Least Squares Problems. Essex. Calculation of the eigenvalues of a symmetric tridiagonal matrix by the method of bisection.1997.1957. 21:51-54. [23] R. SIAM Journal on Applied Mathematics. [20] A.1972. Matrix Analysis. L. 11:98-106. McGraw-Hill. L. Boley is available in Technical Report 90-37. W. Introduction to Matrix Analysis. An algorithm for computing reducing subspaces by block diagonalization. W. 1997. 142:195-209. Eisner.1873. Barlow. University of Minnesota.1990. Martin. Giornale di Matematiche ad Uso degli Studenti Delle Universita. Large scale singular value computations. New York. Berry. .1967. SIAM. Communications oj'the ACM. [13] J. [16] F. Zeitschriftfur angewandte Mathematik undPhysik. Longmann Scientific & Technical. Bai and G. 8:214-235. A note on pencils of Hermitian or symmetric matrices. Berman and A. Krause. R.

Chan. Forsythe. [28] C. R. Department of Mathematics.1980. Sur F equation a 1'aide de laquelle on determine les inegalites seculaires des mouvements des planetes. 2. An improved algorithm for computing the singular value decomposition. A memoir on the theory of matrices. New York. Large-Scale Matrix Problems. 52:270-300. of Linear Algebra and Its Applications. and R. SIAM Journal on Matrix Analysis and Applications. [33] A. F. VolII: Eigensystems . Rank one modification of the symmetric eigenproblem. Cited and reprinted in [33. Thirteen volumes plus index. Lawrence. Bemerkung iiber die beiden vorstehenden Aufsatze. Wiley. and P. 1982. Mathematics of Computation. Golub. Sorensen. Technical report. L. R. Cauchy. 1981. The Collected Mathematical Papers of Arthur Cayley. North-Holland. Journal fur die reine und angewandte Mathematik. H. J.1857. Numerische Mathematik. C. [29] K. Schneider. 21:1202-1228. ACM Transactions on Mathematical Software. Bjorck. [36] S. and H. 53:281—283. 31:31-48. Chatelin. 475-496]. Vangegin. Cay ley. C. 8:84-88. Van Dooren. [27] A. and level 3 performance. On efficient implementations of Kogbetliantz's algorithm for computing the singular value decomposition.1858.1982. Nielsen. Numerical methods for computing angles between linear subspaces. Mathias. F.REFERENCES 431 [26] A. New York. [38] F. v. P. 1999. Cayley. [35] T. W. [34] T. [31] A. 8:72-83. ACM Transactions on Mathematical Software.1978. Borchart. 27:579-594. Numerische Mathematik. [37] J. Arthur Cayley and A. M. In Oeuvres Completes (IIe Serie). [30] J. 1829. Braman.1988. Bjorck and G. Masson. Chan. R. R. [39] F. editors. Valeurs Propres de Matrices. Philosophical Transactions of the Royal Society of London. Chatelin. Byers. 1988. Cambridge University Press. Algorithm 581: An improved algorithm for computing the singular value decomposition. The multi-shift QR-algorithm: Aggressive deflation. Plemmons. Reprinted 1963 by Johnson Reprint Corporation. Chandrasekaran. pp. Expanded translation of [38]. [32] A. Eigenvalues of Matrices. University of Kansas. and D. editors. P. volume 9. 1889-1898. An efficient and stable algorithm for the symmetric-definite generalized eigenvalue problem. A reprint of Volumn 34. maintaining well focused shifts. 1993. 148:17-37.1973.2000. Cambridge. Paris. New York. Bunch. Charlier.

Lanczos Algorithms for Large Symmetric Eigenvalue Computations Volume 1: Theory. K. sparse real symmetric matrices. [42] M. pages 505-509. Linear Algebra and Its Applications. 4:197-215. Jennings. 1971. 1999.W.1986. Cullum and R. Higham. Stuttgart. R. 1985. Journal of the Institutefor Mathematics and Applications. R. Curtis and J. and M. 10:118124. Willoughby. . [47] J.1983. Birkhauser. [51] J. Journal of Computational Physics. [49] J. Willoughby. Cullum and W. R.1984. The nearest definite pair for the Hermitian generalized eigenvalue problem. 12:278-282. Donath.432 REFERENCES [40] S. 36:177-195. Finding a positive definite linear combination of two Hermitian matrices. Pacific Grove. [44] C. Cheng and N. Crawford. The evaluation of eigenvalues and eigenvectors of real symmetric matrices by simultaneous iteration. Englewood Cliffs. Journal of the Institute for Mathematics and Applications. 13:7680. 302-303:6376. 44:329-358. R. Reid. NJ. Moon. Willoughby. A divide and conquer method for the symmetric eigenproblem. E. A block Lanczos algorithm for computing the Q algebraically largest eigenvalues and a corresponding eigenspace of large. Prentice-Hall. [46] J. Cited in [94]. N. ACM Transactions on Mathematical Software. ILPrograms. 51:37-48. A. K.Daniel. Numerical Linear Algebra and Applications. Phoenix. [45] J. SIAM Journal on Scientific and Statistical Computing. Datta. [50] A. In Proceedings of the 1974 IEEE Conference on Decision and Control.1974. Cullum and R. 1983. The Approximation Minimization of Functionals. Computer Journal. S. [48] J. [52] B. A. Algorithm 646 PDFIND: A routine to find a positive definite linear combination of two real symmetric matrices. H. Linear Algebra and Its Applications. Cited in [25]. Cuppen. 8:111-121. CA. Clint and A. Clint and A. J. Brooks/Cole. Crawford and Y. M. A. [43] C. AZ.1981. On the automatic scaling of matrices for Gaussian elimination. Computing eigenvalues of very large symmetric matrices—An implemention of a Lanczos algorithm with no reorthogonalization. J. 1995. A Lanczos algorithm for computing singular values and vectors of large matrices. Cullum. Lake. Jennings. [41] M.1972.1971. 1970. A simultaneous iteration method for the unsymmetric eigenvalue problem. Numerische Mathematik.

SIAM Journal on Scientific and Statistical Computing.1977. Dongarra and D. Kahan. A fully parallel algorithm for the symmetric eigenvalue problem. Veselic. Steihaug. [60] J. Applied Numerical Linear Algebra. J. The rotation of eigenvectors by a perturbation. Demmel. Accurate singular values of bidiagonal matrices. B. 19:46-89. [57] J. Dembo. 17:87-94. Eisenstat. SIAM Journal on Scientific Computing.1997.1990. Philadelphia. E. 13:1204-1245. More. 20:599-610. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Davidson. SIAM Journal on Scientific and Statistical Computing. Eisenstat. and Z. Kagstrom. Computing stable eigendecompositions of matrix pencils. W. Computing the singular value decomposition with high relative accuracy. Dennis and R. Linear Algebra and Its Applications. Linear Algebra and Its Applications. Demmel. [59] J. Dhillon. Sorensen. Journal of Computational Physics. 7:1-46. 8:sl39-sl54.1987. SIAM. III. Linear Algebra and Its Applications. M. I. Veselic. [56] J. Vol II: Eigensystems . Jacobi's method is more accurate than QR. 11:873-912. W. [62] J. 88/89:139-186. Demmel. Drmac.1992.1987.1986. A new O(n2) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. E. Prentice-Hall. SIAM Journal on Numerical Analysis. Inexact Newton methods. Computing stable eigendecompositions of matrices. [63] J. 79:163-193. J. Schnabel. 299:21-80. NJ. C. 1998. Kahan. Demmel and B. 1983. S. Quasi-Newton methods. S. Drmac. 19:400-408.1975.1997. S. and T. Slapnicar. Demmel and W. SIAM Journal on Matrix Analysis and Applications. Computer Science Division. W.1982. Demmel and K. S. Demmel. [66] J. S1AM Journal on Numerical Analysis. [64] J. motivations and theory. [61] J. Implementation of Jacobi rotations for accurate singular value computation in floating point arithmetic. 1997. [58] J. Englewood Cliffs. Dennis and J. C. Davis and W. 1983. The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices. SIAM Review. Technical Report CSD-97-971. 18:1200-1222. SI AM Journal on Numerical Analysis.1970. University of California at Berkeley. [55] R. The condition number of equivalence transformations that block diagonalize matrix pencils. [54] C.REFERENCES 433 [53] E. K. [65] I. R. Gu. PA. [67] Z.

1960. A. A. Eidgenossische Technische Hochsulle Zurich. Vols. W. [76] J. Ruhe. Numerische Mathematik.1985. Linear Algebra and Its Applications. parts I and n. [75] G.1994.1961. L. Large Scale Eigenvalue Problems. [78] R. Eisner. Transaction of the American Mathematical Society. Chelsea Publishing Company. [77] V. Freund. [74] D. Mathematics of Computation. Freund. E. Linear Algebra and Its Applications. Francis. Duff and J. An optimal bound for the spectral variation of two matrices.1993. 5:1-10. Sleijpen. and N. V. [71] T. 19:137-159. Eisner. Ericsson. [80] F. 1959.1998. 94:1-23. S. ACM Transactions on Mathematical Software. Fernando and B. Numerical Linear Algebra with Applications. The spectral transformation Lanczos method for the numerical solution of large sparse generalized symmetric eigenvalue problems. 35:1251-1268. 1962. Computer Journal. F. van der Vorst. Computing selected eigenvalues of large sparse unsymmetric matrices using subspace iteration. [73] K. Gantmacher. 11:77-80. SIAM Journal on Scientific Computing. G. 47': 127-138. Cullum and R. Toumazou. [79] Roland W.1982. G. Ericsson and A. R. M.1998. 43:135-148. R. New York. M. G. and H. H. 67:191-229. [69] L. Research Report 91-06. A generalized eigenvalue problem and the Lanczos algorithm. . 1991. 332-345. Scott. The cyclic Jacobi method for computing the principle values of a complex matrix. On the variation of the spectra of matrices. [70] L.11. 1992. Fokkema.1980. editors. The QR transformation. The Theory of Matrices. Quasi-kernel polynomials and their use in non-Hermitian matrix iterations. Forsythe and P. Amsterdam. Jacobi-Davidson style QR and QZ algorithms for the reduction of matrix pencils. Henrici. Nachtigal. 4:265-271. In J. Willoughby. N. Gutknecht. [72] T. 1. Accurate singular values and differential qd algorithms.434 REFERENCES [68] I. An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices. 1986. A note on the normwise perturbation theory for the regular generalized eigenproblem Ax = \Bx. Parlett. Journal of Computational and Applied Mathematics. Interdisciplinary Project Center for Supercomputing. pages 95-120. 20:94-125. North-Holland. Fraysse and V.

1996. Calculating the singular values and pseudo-inverse of a matrix. [83] W.1973. Mathematical Software HI. Underwood.1979. H. Academic Press. Oak Ridge. [88] G.1931. Golub and R. Oak Ridge National Laboratory. Uber die AbgrenzungderEigenwerteeiner Matrix. H.1968. Golub. Goldstine.REFERENCES 435 [81] B. Golub. 6:749-754. editor. R. Garbow. New York. and M. F. Luk. Nash. ACM Transactions on Mathematical Software. [93] G. Golub and C. T. Also in [305. H. JournaloftheACM. The block Lanczos method for computing eigenvalues. [85] H. TN. H. In Lecture Notes in Computer Science. 1977. [94] G. Dongarra. singular values. 1977. Singular value decomposition and least squares solution. 14:403-420. [86] G. L. Moler. H. Least squares. Matrix eigensystem routines—EISPACK guide extension. H. AC24:909-913. H. The Jacobi method for real symmetric matrices. Matrix Computations. Stanford. Rice. Givens. H. Johns Hopkins University Press. New York. Boyle. In J. The Lanczos algorithm for the symmetric Ax = XBx problem. P. S. F. [90] G. H. MD.1959. H.1965. and C. Johns Hopkins University Press. Golub. Mathematics and Natural Sciences. NumerischeMathematik. Van Loan. Wilkinson. SIAM Review. Technical Report STAN-CS-72-270. [87] G. 6:176-195. 7:149-169. J. Baltimore. Vol II: Eigensystems . [89] G. A procedure for the diagonalization of normal matrices. Underwood. H. von Neumann. JournaloftheACM. Izvestia AkademiiNauk SSSR. Horowitz. third edition. A.1959. 1972. A block Lanczos method for computing the singular values and corresponding singular vectors of a matrix. [91] G. pages 134151]. 2:205-224. H. [82] S. H. S.1970. and matrix approximations. Gerschgorin. 13:44-51. Murray. [92] G. [95] G. Overton. Van Loan. F. Baltimore. Some modified matrix eigenvalue problems. Matrix Computations. Springer-Verlag. Computer Science. Kahan. Golub and W. 1989. Goldstine and L. M. Technical Report 1574. and J. IEEE Transactions on Automatic Control. Reinsch. J. B. [84] H. pages 364-377. Golub. Aplicace Mathematicky. Golub and C. Hessenberg-Schur method for the problem AX + XB = C. MD. F. Stanford University. SIAM Journal on Numerical Analysis. CA. Van Loan. 6:59-96. Golub. and C. J. Cited in [94]. Golub and C. second edition. J. 1954. 15:318-344. Numerical computation of the characteristic values of a real matrix. and J.1981.

8:741-754. SIAM Journal on Scientific and Statistical Computing. G. 15:15-58. A. Hari and K. Finding well-conditioned similarities to block-diagonalized nonsymmetric matrices is NP-hard. 18:578-619. [101] M. Physics Division. [106] M. A completed theory of the unsymmetric Lanczos process and related algorithms: Part I. Veselic. Numerische Mathematik. H. Lawrence Berkeley Laboratory. SIAM Journal on Matrix Analysis and Applications. [97] J. 60:375-407. 1981. Grcar. [100] M. 21:1051-1078. SIAM Review.1994. PhD thesis. [98] R. 8:101-186. 15:228-272. University of Illinois at UrbanaChampaign. J. [102] M. 1994. A completed theory of the unsymmetric Lanczos process and related algorithms: Part II. Technical Report LBL-35100. Grimes. Stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem. Hadamard. Gu and S. Heath.436 REFERENCES [96] G. H. Gutknecht and K. [104] M. Gu. C. SIAM Journal on Matrix Analysis and Applications. J. Gutknecht. H. C. Eisenstat. (4e seriej.2000. T. On sharp quadratic convergence bounds for the serial Jacobi methods.1995. Eisenstat. deMath. A shifted block Lanczos algorithm for solving sparse symmetric generalized eigenproblems. Ressel. On Jacobi methods for singular value decompositions. Gu and S. SIAM Journal on Matrix Analysis and Applications.1994. .1987. CA. SIAM Journal on Matrix Analysis and Applications. [109] M. G.1986. SIAM Journal on Matrix Analysis and Applications.1995. Laub.1991. SIAM Journal on Scientific and Statistical Computing.1892. Berkeley. A divide-and-conquer algorithm for the symmetric tridiagonal eigenvalue problem. 16:79-92. [108] V. C. Gu and S. Journ. SIAM Journal on Matrix Analysis and Applications. 15:1266-1276.1994. J. [105] M. SIAM Journal on Matrix Analysis and Applications. H. C. A divide-and-conquer algorithm for the bidiagonal SVD.1992. D. H. Wilkinson. Look-ahead procedures for Lanczos-type product methods based on three-term Lanczos recurrences. Paige. Essai sur 1'etude des fonctions donnees par leur developpement de taylor. C. Gutknecht. Eisenstat. [107] V. and R. Lewis. J. 7:1147-1159. Computing the SVD of a product of two matrices. Hari. Ill-conditioned eigensystems and the computation of the Jordan canonical form. [99] M. Analyses of the Lanczos Algorithm and of the Approximation Problem in Richardson's Method.1976. and H. Golub and J. Simon. 13:594-639. C. 16:11'2-191. Ward. [103] M.

[121] A. J. [Ill] P. On the speed of convergence of cyclic and quasicyclic Jacob! methods for computing eigenvalues of Hermitian matrices. Journal of the Society for Industrial and Applied Mathematics. [122] A. A. New York. The Numerical Treatment of a Single Nonlinear Equation. Originally published by Blaisdell. B. A. inverses. New York.1997. Numerische Mathematik. Computing. 41:335-348. Numerische Mathematik. Henrici. Linear Algebra and Its Applications. [114] N.1962. S. SIAM Journal on Scientific Computing. [113] N. 39:254-291. Topics in Matrix Analysis. pages 151-201. McGraw-Hill. Higham. Johnson. The accuracy of floating point summation. J. Iterative algorithms for Gram-Schmidt orthogonalization. 24:417'-441 and 498-520. F. Cambridge University Press.REFERENCES 437 [110] P. R. Higham. SIAM Journal on Matrix Analysis and Applications. Philadelphia. Horn and C. 1964. Householder and F. R. J. Horn and C. Hoffman. Journal of Educational Psychology. spectral variation and fields of values of nonnormal matrices. L. Accuracy and Stability of Numerical Algorithms.1993.2000. G. Ipsen. Johnson. 1:29-37. Nachr. 1998. Cambridge University Press. [115] N. 1991. Grundziige einer allgemeinen Theorie der linearen Integralgleichungen. [117] W. New York. 1996. [116] D. [120] H. 14:783-799.1989. 1912. In Acta Numerica. [119] R. Computing an eigenvector with inverse iteration. SIAM Review. Bounds for iterates. 1970. Hotelling. 6:144-162. Dover. 4:24-39. [112] D. The Theory of Matrices in Numerical Analysis. [118] R. Matrix Analysis. Higham. [123] A. Leipzig and Berlin. Cambridge University Press. Cambridge. J. Analysis of a complex of statistical variables into principal components. Cambridge. J. Bauer. Householder. S.1998. Householder. C. Hilbert. S. SIAM. Cambridge. VolII: Eigensy stems . [124] I. Teubner. Higham and N.1933. On certain methods for expanding the characteristic polynomial. QR factorization with complete pivoting and accurate computation of the SVD. 309:153-174. Relative perturbation bounds for matrix eigenvalues and singular values. 1985. 1904-1910. 20:493-512. C. Ipsen. [125] I. Higham. A collection of papers that appeared in Gott. F.1958. Henrici.1958. Structured backward error and condition of generalized eigenvalue problems.

1875. Jia and G. Jacobi. [136] C.1874. Stewart. [130] A. A refined subspace iteration algorithm for large sparse eigenproblems. J. College Park.2001. Absolute and relative perturbation bounds for invariant subspaces of matrices. [135] E. Proceedings of the Cambridge Philosophical Society.2000. Kagstrom and A. Jacobi. Journal de Mathematiques Pures et Appliquees. Ueber eine neue Auflosungsart der bei der Methode der kleinsten Quadrate vorkommenden linearen Gleichungen. An analysis of the Rayleigh-Ritz method for approximating eigenspaces.1857.1967. Technical Report TR-3986. Jia. Cited in [167]. Algorithm 560: JNF. G.1846. [139] B. 32:35-52. [127] C. Uber ein leichtes Verfahren die in der Theorie der Sacularstorungen vorkommenden Gleichungen numerisch aufzulosen. A new shift of the QL algorithm for irreducible symmetric tridiagonal matrices. Essai sur la geometric a n dimensions. Astronomische Nachrichten. 63:755-765.1980. Jia and G. and refined Ritz vectors. Jordan. G. Traite des Substitutions et des Equations Algebriques. 1870. [138] C. Stewart. [129] C. posthumous. [134] Z. . Transactions on Mathematical Software.1997. Refined iterative algorithm based on Arnoldi's process for large unsymmetric eigenproblems. Jennings. Jacobi. Ruhe. 22:297-306. [132] Z. Journal fur die reine und angewandte Mathematik. Jia. W. 30:51-94. Deuxieme Serie. Ipsen.438 REFERENCES [126] LC.2000. Memoire sur les formes bilineaires. 1999. Journalfiir die reine und angewandte Mathematik. Linear Algebra and Its Applications. Bulletin de la Societe Mathematique. Uber eine elementare Transformation eines in Buzug jedes von zwei Variablen-Systemen linearen und homogenen Ausdrucks.1845. [133] Z. F. Linear Algebra and Its Applications. 6:437-443. [137] C. Department of Computer Science. J. [128] C. 309:45-56. University of Maryland. G. Ritz vectors. 259:1-23. [131] Z. A direct iteration method of obtaining latent roots and vectors of a symmetric matrix. Mathematics of Computation. 1985. 19:35-54. On the convergence of Ritz values. W. 70:637-647. J. Paris. 65:261-272. Jiang and Z. Zhang. Applied Numerical Mathematics.3:lQ3-174. Linear Algebra and Its Applications. Jordan. an algorithm for numerical computation of the Jordan normal form of a complex matrix.Jordan. 53:265-270.

REFERENCES

439

[140] B. Kagstrom and A. Ruhe. An algorithm for numerical computation of the Jordan normal form of a complex matrix. Transactions on Mathematical Software, 6:398-419,1980. [141] B. Kagstrom and P Wiberg. Extracting partial canonical structure for large scale eigenvalue problems. Numerical Algorithms, 24:195-237,1999. [142] W. Kahan. Inclusion theorems for clusters of eigenvalues of Hermitian matrices. Technical report, Computer Science Department, University of Toronto, 1967. [143] W. Kahan and B. N. Parlett. How far should you go with the Lanczos process? In J. Bunch andD. Rose, editors, Sparse Matrix Computations, pages 131-144. Academic Press, New York, 1976. [144] W. Kahan, B. N. Parlett, and E. Jiang. Residual bounds on approximate eigensystems of nonnormal matrices. SIAM Journal on Numerical Analysis, 19:470484,1982. [145] S. Kaniel. Estimates for some computational techniques in linear algebra. Mathematics of Computation, 20:369-378,1966. [146] T. Kato. Perturbation Theory for Linear Operators. Springer-Verlag, New York, 1966. [147] M. Kline. Mathematical Thought from Ancient to Modern Times. Oxford University Press, New York, 1972. [148] A. V. Knyazev. New estimates for Ritz vectors. Mathematics of Computation, 66:985-995,1997. [149] E. G. Kogbetliantz. Solution of linear systems by diagonalization of coefficients matrix. Quarterly of Applied Mathematics, 13:123-132,1955. [150] N. Kollerstrom. Thomas Simpson and "Newtons method of approximation": An enduring myth. British Journal for the History of Science, 25:347—354, 1992. [151] J. Konig. tiber eine Eigenschaft der Potenzreihen. Mathematische Annalen, 23:447^49,1884. [152] L. Kronecker. Algebraische Reduction der Schaaren bilinearer Formen. Sitzungberichte der Koniglich Preuftischen Akademie der Wissenschaften zu Berlin, pages 1225-1237,1890. [153] A. N. Krylov. O Cislennom resenii uravnenija, kotorym v techniceskih voprasah opredeljajutsja castoy malyh kolebanil materiaFnyh. Izv. Adad. Nauk SSSR old. Mat. Estest., pages 491-539,1931. Cited in [121].

Vol II: Eigensystems

440

REFERENCES

[154] V. N. Kublanovskaya. On some algorithms for the solution of the complete eigenvalue problem. Zhurnal VychislitelnoiMatematiki i Matematicheskoi Fiziki, 1:555-570,1961. In Russian. Translation in USSR Computational Mathematics and Mathematical Physics, 1:637-657,1962. [155] J.-L. Lagrange. Solution de differents problems de calcul integral. Miscellanea Taurinensia, 3,1762-1765. Cited and reprinted in [156, v. 1, pages 472-668]. [156] J.-L. Lagrange. (Euvres de Langrange. Gauthier-Villars, Paris, 1867-1892. [157] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards, 45:255-282,1950. [158] R. Lehoucq and D. C. Sorensen. Deflation techniques for an implicitly restarted Arnoldi iteration. SIAM Journal on Matrix Analysis and Applications, 17:789821, 1996. [159] R. B. Lehoucq. The computation of elementary unitary matrices. ACM Transactions on Mathematical Software, 22:393-400,1996. [160] R. B. Lehoucq. On the convergence of an implicitly restarted Arnoldi method. Technical Report SAND99-1756J, Sandia National Laboratories, Albuquerque, NM, 1999. [161] R. B. Lehoucq and J. A. Scott. An evaluation of software for computing eigenvalues of sparse nonsymmetric matrices. Preprint MCS-P547-1195, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, 1996. [162] R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK Users' Guide: Solution of Large Scale Eigenvalue Problems by Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia, 1998. [163] C.-K. Li and R. Mathias. The definite generalized eigenvalue problem: A reworked perturbation theory. Department of Mathematics, College of William & Mary, 2000. [164] R.-C. Li. Solving secular equations stably and efficiently. Technical Report UCB//CSD-94-851, Computer Science Division, University of California, Berkeley, 1993. [165] R-C. Li. Relative pertubation theory: I. Eigenvalue and singular value variations. SIAM Journal on Matrix Analysis and Applications, 19:956-982,1998. [ 166] R-C. Li. Relative pertubation theory: II. Eigenspace and singular subspace variations. SIAM Journal on Matrix Analysis and Applications, 20:471-492,1998. [167] C. C. Mac Duffee. The Theory of Matrices. Chelsea, New York, 1946.

REFERENCES

441

[168] M. Marcus and H. Mine. A Survey of Matrix Theory and Matrix Inequalities. Allyn and Bacon, Boston, 1964. [169] R. S. Martin, C. Reinsch, and J. H. Wilkinson. Householder tridiagonalization of a real symmetric matrix. Numerische Mathematik, 11:181-195,1968. Also in [305, pages 212-226]. [170] R. S. Martin and J. H. Wilkinson. Reduction of the symmetric eigenproblem Ax — \Bx and related problems to standard form. Numerische Mathematik, 11:99-110,1968. Also in [305, pages 303-314]. [171] W. F. Mascarenhas. On the convergence of the Jacobi method for arbitrary orderings. SI AM Journal on Matrix Analysis and Applications, 16:1197-1209, 1995. [172] R. Mathias. Accurate eigensystem computations by Jacobi methods. SIAM Journal on Matrix Analysis and Applications, 16:977-1003,1995. [173] R. Mathias. Spectral perturbation bounds for positive definite matrices. SIAM Journal on Matrix Analysis and Applications, 18:959-980,1997. [174] R. Mathias. Quadratic residual bounds for the Hermitian eigenvalue problem. SIAM Journal on Matrix Analysis and Applications, 18:541-550,1998. [175] H. I. Medley and R. S. Varga. On smallest isolated Gerschgorin disks for eigenvalues. III. Numerische Mathematik, 11:361-369,1968. [176] K. Meerbergen and A. Spence. Implicitly restarted Arnoldi with purification for the shift-invert transformation. Mathematics of Computation, 66:667-689, 1997. [177] D. S. Mitrinovic. Analytic Inequalities. Springer-Verlag, New York, 1970. [178] C. B. Moler and G. W. Stewart. An algorithm for generalized matrix eigenvalue problems. SIAM Journal on Numerical Analysis, 10:241-256,1973. [179] R. B. Morgan. Computing interior eigenvalues of large matrices. Linear Algebra and Its Applications, 154-156:289-309,1991. [180] R. B. Morgan. On restarting the Arnoldi method for large nonsymmetric eigenvalue problems. Mathematics of Computation, 65:1213-1230,1996. [181] R. B. Morgan. Implicitly restarted GMRES and Arnoldi methods for nonsymmetric systems of equations. SIAM Journal on Matrix Analysis and Applications, 21:1112-1135,2000. [182] C. L. Miintz. Solution direct de 1'equation seculaire et de quelques problemes analogues transcendents. Comptes Rendus de I'Academie des Sciences, Paris, 156:43^6,1913. VolII: Eigensy stems

442

REFERENCES

[183] M. Newman and P. F. Flanagan. Eigenvalue extraction in NASTRAN by the tridiagonal reduction (PEER) method—Real eigenvalue analysis. NASA Contractor Report CR-2731, NASA, Washington DC, 1976. Cited in [236]. [184] B. Nour-Omid, B. N. Parlett, T. Ericsson, and P. S. Jensen. How to implement the spectral transformation. Mathematics of Computation, 48:663-67'3,1897. [185] W. Oettli and W. Prager. Compatibility of approximate solution of linear equations with given error bounds for coefficients and right-hand sides. Numerische Mathematik, 6:405-409,1964. [186] J. Olsen, P. J0rgensen, and J. Simons. Passing the one-billion limit in full configuration-interaction (FCI) calculations. Chemical Physics Letters, 169:463^72,1990. [187] E. E. Osborne. On pre-conditioning of matrices. Journalof the ACM, 7:338345,1960. [188] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. I. Archive for Rational Mechanics and Analysis, 1:233-241,1958. [189] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors, n. Archive for Rational Mechanics and Analysis, 2:423-428,1958. [190] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. HI (generalized Rayleigh quotient and characteristic roots with linear elementary divisors). Archive for Rational Mechanics and Analysis, 3:325-340,1959. [191] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. IV (generalized Rayleigh quotient for nonlinear elementary divisors). Archive for Rational Mechanics and Analysis, 3:341-347,1959. [192] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. VI (usual Rayleigh quotient for nonlinear elementary divisors). Archive for Rational Mechanics and Analysis, 4:153-165,1959. [193] A. M. Ostrowski. On the convergence of the Rayleigh quotient iteration for the computation of the characteristic roots and vectors. V (usual Rayleigh quotient for non-Hermitian matrices and linear elementary divisors). Archive for Rational Mechanics and Analysis, 3:472-481,1959. [194] A.M. Ostrowski. Solution ofEquations and Systems ofEquations. Academic Press. New York. 1960.

REFERENCES

443

[195] C. C. Paige. Practical use of the symmetric Lanczos process with reorthogonalization. BIT, 10:183-195,1970. [196] C. C. Paige. The Computation of Eigenvalues and Eigenvectors of Very Large Sparse Matrices. PhD thesis, University of London, 1971. [197] C. C. Paige. Computational variants of the Lanczos method for the eigenproblem. Journal of the Institute of Mathematics and Its Applications, 10:373—381, 1972. [198] C. C. Paige. Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix. Journal of the Institute of Mathematics and Its Applications, 18:341-349,1976. Cited in [25]. [199] C. C. Paige. Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem. Linear Algebra and Its Applications, 34:235-258,1980. Reprinted in [27]. [200] C. C. Paige. Computing the generalized singular value decomposition. SIAM Journal on Scientific and Statistical Computing, 7:1126-1146,1986. [201] C. C. Paige, B. N. Parlett, and H. van der Vorst. Approximate solutions and eigenvalue bounds from Krylov subspaces. Numerical Linear Algebra with Applications, 2:115-134,1995. [202] C. C. Paige and M. A. Saunders. Toward a generalized singular value decomposition. SIAM Journal on Numerical Analysis, 18:398-405,1981. [203] C. C. Paige and P. Van Dooren. On the quadratic convergence of Kogbetliantz's algorithm for computing the singular value decomposition. Linear Algebra and Its Applications, 77:301-313,1986. [204] B. N. Parlett. The origin and development of methods of the LR type. SIAM Review, 6:275-295,1964. [205] B. N. Parlett. A new look at the Lanczos algorithm for solving symmetric systems and linear equations. Linear Algebra and Its Applications, 29:323-346, 1980. [206] B. N. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall, Englewood Cliffs, NJ, 1980. Reissued with revisions by SIAM, Philadelphia, 1998. [207] B. N. Parlett and J. Le. Forward instability of tridiagonal qr. SIAM Journal on Matrix Analysis and Applications, 14:279-316,1993. [208] B. N. Parlett and C. Reinsch. Balancing a matrix for calculation of eigenvalues and eigenvectors. Numerische Mathematik, 13:293-304,1969. Also in [305, pages 315-326].

Vol IT. Eigensystems

444

REFERENCES

[209] B. N. Parlett and D. S. Scott. The Lanczos algorithm with selective orthogonalization. Mathematics of Computation, 33:217-238,1979. [210] B. N. Parlett, D. R. Taylor, and Z. A. Liu. A look-ahead Lanczos algorithm for unsymmetric matrices. Mathematics of Computation, 44:105-124,1985. [211] G. Peters and J. H. Wilkinson. Inverse iteration, ill-conditioned equations, and Newton's method. SIAM Review, 21:339-360,1979. Cited in [94]. [212] D. A. Pope and C. Tompkins. Maximizing functions of rotations: Experiments concerning speed of diagonalization of symmetric matrices using Jacobi's method. JournaloftheACM, 4:459-466,1957. [213] D. Porter and D. S. G. Stirling. Integral Equations. Cambridge University Press, Cambridge, 1990. [214] M. J. D. Powell. A new algorithm for unconstrained optimization. In J. B. Rosen, O. L. Mangasarian, and K. Ritter, editors, Nonlinear Programming, pages 31-65. Academic Press, New York, 1970. Cited in [64, page 196]. [215] Lord Rayleigh (J. W. Strutt). On the calculation of the frequency of vibration of a system in its gravest mode, with an example from hydrodynamics. The Philosophical Magazine, 47:556-572,1899. Cited in [206]. [216] J. L. Rigal and J. Gaches. On the compatibility of a given solution with the data of a linear system. JournaloftheACM, 14:543-548,1967. [217] W. Ritz. liber eine neue Method zur Losung gewisser Variationsprobleme der mathematischen Physik. Journal fiir die reine und angewandte Mathematik, 135:1-61,1909. [218] A. Ruhe. Perturbation bounds for means of eigenvalues and invariant subspaces. BIT, 10:343-354,1970. [219] A. Ruhe. Implementation aspects of band Lanczos algorithms for computation of eigenvalues of large sparse symmetric matrices. Mathematics of Computation, 33:680-687,1979. [220] A. Ruhe. Rational Krylov algorithms for nonsymmetric eigenvalue problems. H. Matrix pairs. Linear Algebra and Its Applications, 197-198:283-295,1994. [221] A. Ruhe. Rational Krylov subspace method. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 246-249. SIAM, Philadelphia, 2000. [222] H. Rutishauser. Anwendungen Des Quotienten-Differenzen-Algorithmus. Zeitschrift filr angewandte Mathematik und Physik, 5:496-508,1954. [223] H. Rutishauser. Der Quotienten-Differenzen-Algorithmus. Zeitschrift fiir angewandte Mathematik und Physik, 5:233-251,1954.

REFERENCES

445

[224] H. Rutishauser. Une methode pour la determination des valeurs propres d'une matrice. Comptes Rendus de VAcademic des Sciences, Paris, 240:34-36,1955. [225] H. Rutishauser. Solution of eigenvalue problems with the LR-transformation. National Bureau of Standards Applied Mathematics Series, 49:47-81,1958. [226] H. Rutishauser. The Jacobi method for real symmetric matrices. Numerische Mathematik, 9:1-10,1966. Also in [305, pages 202-211]. [227] H. Rutishauser. Computational aspects of F. L. Bauer's simultaneous iteration method. Numerische Mathematik, 13:4-13,1969. [228] H. Rutishauser. Simultaneous iteration method for symmetric matrices. Numerische Mathematik, 16:205-223,1970. Also in [305, pages 284-301]. [229] Y. Saad. On the rates of convergence of the Lanczos and the block Lanczos methods. SIAM Journal on Numerical Analysis, 17:687-706,1980. [230] Y. Saad. Variations of Arnoldi's method for computing eigenelements of large unsymmetric matrices. Linear Algebra and Its Applications, 34:269-295,1980. [231] Y. Saad. Projection methods for solving large sparse eigenvalue problems. In B. Kagstrom and A. Ruhe, editors, Matrix Pencils, pages 121-144. SpringerVerlag, New York, 1983. [232] Y. Saad. Chebyshev acceleration techniques for solving nonsymmetric eigenvalue problems. Mathematics of Computation, 42:567'-588,1984. [233] Y. Saad. Numerical Methods for Large Eigenvalue Problems: Theory and Algorithms. John Wiley, New York, 1992. [234] J. Schur. Uber die charakteristischen Wiirzeln einer linearen Substitution mit einer Anwendung auf die Theorie der Integralgleichungen. MathematischeAnnalen, 66:448-510,1909. [235] H. R. Schwarz. Tridiagonalization of a symmetric band matrix. Numerische Mathematik, 12:231-241,1968. Also in [305, pages 273-283]. [236] D. S. Scott. The advantages of inverted operators in Rayleigh-Ritz approximations. SIAM Journal on Scientific and Statistical Computing, 3:68-75,1982. [237] H. D. Simon. Analysis of the symmetric Lanczos algorithm with reorthogonalization methods. Linear Algebra and Its Applications, 61:101-132,1984. [238] H. D. Simon. The Lanczos algorithm with partial reorthogonalization. Mathematics ofComputation, 42:115-142,1984. [239] G. L. G. Sleijpen, A. G. L. Booten, D. R. Fokkema, and Henk van der Vorst. Jacobi-Davidson type methods for generalized eigenproblems and polynomial eigenproblems. BIT, 36:595-633,1996. Vol II: Eigensystems

446

REFERENCES

[240] G. L. G. Sleijpen and H. A. van der Vorst. A Jacobi-Davidson iteration method for linear eigenvalue problems. SIAM Journal on Matrix Analysis and Applications, 17:401^25,1996. [241] G. L. G. Sleijpen and H. A. van der Vorst. Generalized Hermitian eigenvalue problems: Jacobi-Davidson methods. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 123-127. SIAM, Philadelphia, 2000. [242] G. L. G. Sleijpen and H. A. van der Vorst. Generalized non-Hermitian eigenvalue problems: Jacobi-Davidson method. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 238-246. SIAM, Philadelphia, 2000. [243] G. L. G. Sleijpen and H. A. van der Vorst. The Hermitian eigenvalue problem: Jacobi-Davidson methods. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 88-105. SIAM, Philadelphia, 2000. [244] G. L. G. Sleijpen and H. A. van der Vorst. Non-Hermitian eigenvalue problems: Jacobi-Davidson methods. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems, pages 221-228. SIAM, Philadelphia, 2000. [245] G. L. G. Sleijpen, H. A. van der Vorst, and E. Meijerink. Efficient expansion of subspaces in the Jacobi-Davidson method for standard and generalized eigenproblems. Electronic Transactions on Numerical Analysis, 7:75-89,1998. [246] D. C. Sorensen. Implicit application of polynomial filters in a fc-step Arnoldi method. SIAM Journal on Matrix Analysis and Applications, 13:357-385, 1992. [247] D. C. Sorensen and P. T. P. Tang. On the orthogonality of eigenvectors computed by divide and conquer techniques. SIAM Journal on Numerical Analysis, 28:1752-1775,1991. [248] A. Stathopoulos, Y. Saad, and Kesheng Wu. Dynamic thick restarting of the Davidson and the implicitly restarted Arnoldi algorithms. SIAM Journal on Scientific Computing, 19:227-245,1998. [249] G. W. Stewart. Accelerating the orthogonal iteration for the eigenvalues of a Hermitian matrix. Numerische Mathematik, 13:362-376,1969. [250] G. W. Stewart. Error bounds for approximate invariant subspaces of closed linear operators. SIAM Journal on Numerical Analysis, 8:796-808,1971. [251] G. W. Stewart. On the sensitivity of the eigenvalue problem Ax = \Bx. SIAM Journal on Numerical Analysis, 4:669-686,1972.

REFERENCES

447

[252] G. W. Stewart. Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM Review, 15:727-764,1973. [253] G. W. Stewart. Introduction to Matrix Computations. Academic Press, New York, 1973. [254] G. W. Stewart. Modifying pivot elements in Gaussian elimination. Mathematics of Computation, 28:537-542,1974. [255] G. W. Stewart. Gerschgorin theory for the generalized eigenvalue problem Ax = XBx. Mathematics of Computation, 29:600-606,1975. [256] G. W. Stewart. Algorithm 407: HQR3 and EXCHNG: FORTRAN programs for calculating the eigenvalues of a real upper Hessenberg matrix in a prescribed order. ACM Transactions on Mathematical Software, 2:275-280,1976. [257] G. W. Stewart. Simultaneous iteration for computing invariant subspaces of a non-Hermitian matrix. Numerische Mathematik, 25:123-136,1976. [258] G. W. Stewart. On the perturbation of pseudo-inverses, projections, and linear least squares problems. SIAM Review, 19:634-662,1977. [259] G. W. Stewart. Computing the CS decomposition of a partitioned orthogonal matrix. Numerische Mathematik, 40:297-306,1982. [260] G. W. Stewart. A method for computing the generalized singular value decomposition. In B. Kagstrom and A. Ruhe, editors, Matrix Pencils, pages 207-220. Springer-Verlag, New York, 1983. [261] G. W. Stewart. Two simple residual bounds for the eigenvalues of Hermitian matrices. SIAM Journal on Matrix Analysis and Applications, 12:205-208, 1991. [262] G. W. Stewart. Error analysis of QR updating with exponential windowing. Mathematics of Computation, 59:135-140,1992. [263] G. W. Stewart. On the early history of the singular value decomposition. SIAM Review, 35:551-566,1993. [264] G. W. Stewart. Afternotes Goes to Graduate School. SIAM, Philadelphia, 1998. [265] G. W. Stewart. Matrix Algorithms I: Basic Decompositions. SIAM, Philadelphia, 1998. [266] G. W. Stewart. The decompositional approach to matrix computations. Computing in Science and Engineering, 2:50-59,2000. [267] G. W. Stewart. A Krylov-Schur algorithm for large eigenproblems. Technical Report TR-4127, Department of Computer Science, University of Maryland, College Park, 2000. To appear in the SIAM Journal on Matrix Analysis and Applications. Vol II: Eigensystems

448

REFERENCES

[268] G. W. Stewart. A generalization of Saad's theorem on Rayleigh-Ritz approximations. Linear Algebra and Its Applications, 327:115-120,2001. [269] G. W. Stewart and J.-G. Sun. Matrix Perturbation Theory. Academic Press, New York, 1990. [270] W. J. Stewart and A. Jennings. Algorithm 570: LOPSI a simultaneous iteration method for real matrices. ACM Transactions on Mathematical Software, 7:230232,1981. [271] W. J. Stewart and A. Jennings. A simultaneous iteration algorithm for real matrices. ACM Transactions on Mathematical Software, 7:184-198,1981. [272] J. J. Sylvester. On a theory of the syzygetic relations etc. Phil. Trans., pages 481-84,1853. Cited in [28]. [273] J. J. Sylvester. Sur 1'equation en matrices px = xq. Comptes Rendus de I'Academie des Sciences, pages 67-71 and 115-116,1884. [274] O. Taussky. A recurring theorem on determinants. American Mathematical Monthly, 46:672-675,1949. [275] D. R. Taylor. Analysis of the Look Ahead Lanczos Algorithm. PhD thesis, Department of Mathematics, University of California, Berkeley, 1982. Cited in [79]. [276] L. N. Trefethen. Computation of pseudospectra. In ActaNumerica, pages 247295. Cambridge University Press, Cambridge, 1999. [277] L. N. Trefethen. Spectral Methods in MATLAB. SIAM, Philadelphia, 2000. [278] L. N. Trefethen and D. Bau, HI. Numerical Linear Algebra. SIAM, Philadelphia, 1997. [279] L. N. Trefethern. Pseudospectra of linear operators. SIAM Review, 39:383^06, 1997. [280] R. Underwood. An iterative block Lanczos method for the solution of large sparse symmetric eigenproblems. Technical Report STAN-CS-75-496, Computer Science, Stanford University, Stanford, CA, 1975. PhD thesis. [281] P. Van Dooren. The computation of Kronecker's canonical form of a singular pencil. Linear Algebra and Its Applications, 27:103-140,1979. [282] J. M. Van Kats and H. A. van der Vorst. Automatic monitoring of Lanczos schemes for symmetric or skew-symmetric generalized eigenvalue problems. Technical Report TR 7, Academisc Computer Centre Utrecht, Budapestlaan 6, De Vithof-Utrecht, the Netherlands, 1977. Cited in [236]. [283] C. F. Van Loan. Generalizing the singular value decomposition. SIAM Journal on Numerical Analysis, 13:76-83,1976.

1996. The calculation of eigenvectors by the method of Lanczos. 9:184-191. 46:479-491.1955. S. [285] R. [293] K. Monatshefte Akadamie Wissenshaften Berlin. Linear Algebra and Its Applications. S. [296] J.1959. F. Ward. [294] H. Convergence of algorithms of decomposition type for the eigenvalue problem. 2000. 241-243:877-896. 12:835-853. [291] D. Watkins. 6:366-376. On smallest isolated Gerschgorin disks for eigenvalues. SIAM Journal on Matrix Analysis and Applications. 22:364-375. John Wiley & Sons. Watkins. 4:296-300. Computer Journal. Watkins. Computing the CS and the generalized singular value decompositions.1868. S. Germany. Numerische Mathematik. pages 310-338. S.1995. Van Loan. H. Bericht B 44/J/37. [292] D. 1991. Wilkinson.1960. 143:19-47. New York. On the transmission of shifts and shift blurring in the QR algorithm. [289] D. H. Wilkinson. H. Varga. [287] R. [297] J.1964. Forward stability and transmission of shifts in the QR algorithm.1975. The combination shift QZ algorithm. H. Watkins and L. Aerodynamische Versuchsanstalt Gottingen. S. [286] R. 3:23-27. 16:469—487. [298] J. Ward. Fundamentals of Matrix Computations. Note on the quadratic convergence of the cyclic Jacobi process. The use of iterative methods for finding the latent roots and vectors of matrices. Linear Algebra and Its Applications. SIAM Journal on Numerical Analysis.1981. 1:148-152. Wilkinson. S. Wilkinson. Mathematical Tables and Other Aids to Computation. 1944. Computer Journal.1985. Zur Theorie der bilinearen und quadratischen Formen. SIAM Journal on Matrix Analysis and Applications. Performance of the QZ algorithm in the presence of infinite eigenvalues. Watkins. 1991. [295] J. Numerische Mathematik.1962. 2:141-152. Householder's method for the solution of the algebraic eigenvalue problem. SIAM Journal on Scientific and Statistical Computing.REFERENCES 449 [284] C. Eisner. [290] D. Numerische Mathematik. C. Wielandt. Beitrage zur mathematischen Behandlung komplexer Eigenwertprobleme V: Bestimmung hoherer Eigenwerte durch gebrochene Iteration. Balancing the generalized eigenvalue problem. Weierstrass. C. Vol II: Eigensystems . [288] D.

2000. Clarendon Press. Wilkinson. Wilkinson.1979. H. 22:602616. Reinsch. Kronecker's canonical form and the QZ algorithm. Oxford.450 REFERENCES [299] J. H. NJ. Note on matrices with a very ill-conditioned eigenproblem. SIAM Journal on Matrix Analysis and Applications. 37:531-551. Englewood Cliffs. [306] K. Global convergence of tridiagonal QR algorithm with origin shifts. 28:285-303. Wilkinson. H Linear Algebra. 1963. Wilkinson. Springer-Verlag. 19:176-178. Prentice-Hall. [302] J. Wilkinson. Linear Algebra and Its Applications. QR. . Historical development of the Newton-Raphson method. J. Convergence of the LR. 1965. Wu and H. H. 1:409—420. [307] T. Rounding Errors in Algebraic Processes. Vol. Wilkinson. Linear Algebra and Its Applications. H. 1971. [303] J. [301] J.1965. H. [304] J. The Algebraic Eigenvalue Problem. New York. Handbook for Automatic Computation. Simon. Thick-restart Lanczos method for large symmetric eigenvalue problems.1968. [300] J. and related algorithms. [305] J. H. Ypma. Computer Journal. SIAM Review.1995. H. Numerische Mathematik. 8:77-84.1972. Wilkinson and C.

301 Arnoldi factorization.303-304 block. 300 Arnoldi factorization. 402 inner and outer iterations. 297.347 advantages. see norm \A\ (absolute value of A). Slanted page numbers indicate an entry in a notes and references section. 344 comparison of implicit and Krylov-Schur restarting.299. 334-336. 76 Aitken. 332 computation of residuals. see Arnoldi decomposition Arnoldi method. 422 A~T (transpose inverse of A). 339-340 . 403 correction equation. 301 reduced.. C.4). 403 orthogonal correction. 400-402 and inexact Newton method. see canonical angle approximate Newton method. 69. 399 Arnoldi decomposition. 402 natural approximation. 397. 402 convergence rates. 423 accumulation of transformations. 315 alternate form. 337 in Arnoldi decomposition. 302. 419 constant shift approximation. 398 correction formula. 300 uniqueness. The abbreviation me indicates that there is more information at the main entry for this item. 55 absolute norm. 422 AT (transpose of A). 336 convergence criteria.347 choice of tolerance. (componentwise comparison). see eigenvalue angle. 300 QR factorization of Krylov sequence. 315 computation.110 A < B. 341-342 shift-and-invert enhancement. 1-norm. 337-340. H. Only authors mentioned explicitly in the text are indexed. etc. 299. 422 A~l (inverse of A). 403 drawbacks. see norm A (conjugate of A). N. 399 nomenclature. 336 hat convention. A < B. 403 451 local convergence. 117. see Arnoldi method computation of residuals. 422 A~H (conjugate transpose inverse of 4).420 inexact solution. 423 algebraic multiplicity. 422 Abel. 302 termination of Krylov sequence. 344 Arnoldi process. 396 analysis. see norm 2-norm. 340 in Krylov-Schur method. 422 Au (conjugate transpose of .INDEX Underlined page numbers indicate a defining entry. 396-397 diagonally dominant A. 340 nonorthogonal bases. 298 Rayleigh quotient. 300 uniqueness of starting vector. 336 deflation. 397 derivation. 404-405 equivalence of correction formula and equation. 398 error recurrence. 403-404 solution. A.

147 spectral decomposition updating. 332-333. 303 restarted. 372-373 economizing B products. 304-305. 351 QZ algorithm. 167 inertia of a tridiagonal matrix. 331 filter polynomial. 345 also see filter polynomial implicit restarting. 328-329 loss of orthogonality. 316 also see Arnoldi method. 305-306.316 stability. 375 restarted. Arnoldi method. 230. 373 deflation. 162. 371 INDEX orthogonal matrix.. 69 backward stability and complex eigenvalues of real matrices. 305 convergence criteria. 338-339 equivalence of implicit and Kry lov-Schur restarting. 332-333. 336 stability. 371 B-Arnoldi method. 346 operation count. 346-347 shift-and-invert enhancement. 152 reduction to bidiagonal form. 179 . 347 stagnation of Krylov subspace. 371 Cauchy inequality. 334-335. 338-339 divide-and-conquer algorithm for the spectral decomposition. 315. 320 Krylov-Schur restarting. 373 5-Lanczos method. 372-373 maintaining orthogonality.196. 189 bidiagonal QR step.253. 324-325 example. 10 as paradigm. 374 orthogonalization. 102 Hessenberg QR algorithm. 61. 220 deflation in Arnoldi method.372 orthogonalization. 372 symmetry. 373 orthonormal matrix. 217 reduction to Hessenberg-triangular form. 325-326. 334. 323 forward instability. 346 exchanging eigenblocks.170 implicit tridiagonal QR step.E. Krylov-Schur restarting shift-and-invert enhancement. 317. 126 Arnoldi method. 337-338 stability. 3J6. 126 eigenvector computation. 371 norm. 318. 94 Householder's reduction to tridiagonal form. 325. 372 orthogonality. 62 from residual. 318-322 double shift method. 375 backward error and unitary similarities. 325.369 in convergence testing. 322 truncation index. 341 arrowhead matrix. 345 convergence criteria. 326-328. W. 304 reorthogonalization. implicit restarting. 320-323 overview.316 Arnoldi. 201 B inner product. 334. 117 operation count. 183 double shift QR algorithm. me operation count. 329 the cycle. 373 residual norms.265. 345 contraction phase. 70.344 ARPACK. 347 storage requirements. 318 the cycle. 373-374 restarting. 344-345 rationale.346-347 band tridiagonalization. 320 exact (Raylei gh quotient) shifts. 10 inverse power method.452 residual norm. 374 Ritz pairs. 192-193 Lanczos algorithm with full orthogonalization. 374-375 periodic reorthogonalization. 70 convergence criteria.

315. 255 Braman.. 228 deflation. 217 operation count. 23 Bauer. S. 107-108... 108 Bartels..227 first column of the transformation. 55 and terminating Krylov sequences.264 of a combination of subspaces. 111 Bai.52 bi-Lanczos algorithm. 231. 250 subspaces of unequal dimensions. 4.129. 219-220 operation count. 24 and companion matrix.. 220 shift computation. 279 in Lanczos's method. 317 in complex plane.227-228 QR step. 394 column space (K(X)). R. 110 matrix pencil. 23 Chatelin. 250-251 of a vector and a subspace. 367 look-ahead recurrence. 226. 217 relative stability of singular values.156 operation count. 388 Wilkinson's contribution.. A. 228 Chandrasekaran. 421 Cn (complex n-space). 280 as filter polynomial. 223-224. E. 217 stability. see Krylov subspace block triangularization of nearly block triangular matrix. 23 Chan. 55 complete system of eigenvectors. 392 Cholesky algorithm. 315 matrix of order two... 234 updating. E. 222-223 zero shift.227 graded matrices.and row-oriented algorithms. 226 pivoting.264 angles between right and left eigenspaces. T.279 Chebyshev polynomial. 264 block Krylov subspace. 283. 49 characterization of largest angle.23. 217. 129 Bunch. 217 reduction to. 157 Cholesky decomposition. 395 balancing. A. 280 in subspace iteration..426 pivoted.223 bidiagonal QR algorithm. 227 biorthogonal bases. 248. 23. Z. 249. 394 Bellman. 7 and diagonalization. 215 complex bidiagonal matrix to real.112 and grading. 8-9 condition .227-228 combined with QR decomposition. 23 Beltrami.E. 87 subspace iteration. 245 Bjorck. L. 217. 424 column. 224. 4-5 characteristic value.228 operation count. 27J_. 4. K. F.. R. 152-153. J. 138. 249-250 Cauchy. 346. 129 453 C (complex numbers). A. M. 162 companion matrix. 235 characteristic equation. 224 detecting negligible superdiagonal elements.. 201 Vol II: Eigensystems Byers. 421 C m x n (space of complex mxn matrices)..155 Clint.R. 24 characteristic polynomial.INDEX stability in the ususal sense. 367 bidiagonal matrix. R. R. 24 Bau. 251 between two vectors. 23 Cayley. 220 stability. 249 computation. D. 421 canonical angle. 215-217.. 102. 226 Bhatia. H. 44 computation. 171 chordal distance.

140 generalized eigenvector. etc.R. 205 CS decomposition. dn] (diagonal matrix).53 condition number. 7-8. 279 consistent norm. 34 convergence. W. 236-237 algorithms.I. 48.. 181-182 operation count. 237 Cullum. R. A. 48 limitations.. QR algorithm.227.. 261 singular values. 183 divide-and-conquer algorithms. R.296 use by statisticians. 48 Hermitian matrix.S. 51-52 also see sep generalized eigenvalue. 8-9 distinct eigenvalues. 424 divide-and-conquer algorithm for the spectral decomposition. 143 ill conditioning. 15 sensitivity of eigenvalues. see condition congruence transformation. 235-236 rotating to make B positive definite. 422 Dhillon. 201 Curtis. 367 Cuppen. 276 dependence of eigenvectors. 291-292. B.227 attractive features. 34 eigenvalue. 423 diagonalization block. J. 214. C. 201 generalities.418 det(^l) (determinant of A). J. 236 cross-product algorithm for the S VD. J.454 INDEX Davidson. 48 simple eigenblock. 419 Demmel.265 defectiveness... see norm convergence ratios. 214 inaccuracy in the right singular vectors.. 231 well conditioning.. 20 dim(^) (dimension of X). 25 uniqueness. 190 conjugate gradient method. E. M.!... 183. 107 complex eigenvector.202 diag(<$i. 7. . 384 and Krylov subspaces. 211 inaccuracy in the singular values. 419 . 42-43.. 50 Hermitian matrix. 12... 211-212 inaccuracy in the left singular vectors.. 3± dominant eigenvalue. 185 unsuitability for graded matrices. 48-51. 20 complete system of eigenvectors. 212-213 refined Ritz vector. 236 deflation. power method. 23. C. 7 matrix. J. 181-183.185 error analysis. 183 stability. Dembo. 181-182 dominant eigenpair. 265. 48 eigenvalue. 15. 206 singular vector. 344 by inspection. direct approach. 114 also see Hessenberg QR algorithm. 210-211 assessment. 280. 3j_ defective. 34 condition number.51 eigenvector. 185 recursive vs. 129.38 definite generalized eigenvalue problem. 30 Crawford. 419 Davis. 210 S/PD generalized eigenvalue. E. 156 Datta. normwise and componentwise. 48 condition number.53 condition number. 8 dominant eigenvalues. 53. S. 260 simple eigenspace. 23 Davidson's algorithm. 183 the effects of deflation. 346 Dennis. 106-107 operation count..19-20 assessment. 227 cross-product matrix. R. 209-210 condition number. N.20] depth of the recursion.

38.264 residual bounds for eigenspaces of Hermitian matrices. 2. 244 representation by a basis. 423 eigenbasis.INDEX dominant eigenvector. 2 algebraic multiplicity. 2. me derivatives. 265 residual bounds for eigenvalues of Hermitian matrices. 8 singular matrix.53 real matrix. I. 2 left.53 rigorous bounds. 243 nonorthogonal. 4. 2 perturbation theory first-order.6. me continuity.53 simple eigenvalue. 3 also see eigenspace eigenspace.53 rigorous bounds. 242 behavior under similarity transformations. 31. me Hermitian matrix. 240 existence.. 46-47. 23-24 multiple. 47 first-order. E. 5 characteristic equation. 100 diagonal matrix. 52 condition. 4 left and right. 22. 263-264. me eigenvalue. 3 dominant. 48. 45-46. W. 45-46. 2 complete system. 251. 242 right. 280. 7. 2 simple. 5-6 right. 38 simple.201 Duff. 2 eigenvector. 240-241 representation of A with respect to a basis.52-53 . J. 247 deflation. 12 block triangular matrix. 46-47. 7. see eigenspace eigenblock. 31. me Hermitian matrix. 12-13 normal matrix. 4. 14 also see eigenpair. S. 2 normalization. see eigenspace eigenpair. 31. 37-38 defective. Hermitian matrix. 3 triangular matrix. 240. 242 orthonormal. 44. similarity transformation eigenvector. 23-24 left. 262-263. 253. 3 dominant.. 31. 244. 242 left. me diagonal matrix.12 Gerschgorin's theorem.24 complete system. 243 eigenbasis. 13 history and nomenclature. 13 history and nomenclature.242 eigenblock. 3 unitary matrix. 242 left. 14 perturbation theory. J. 242 as orthogonal complement of right eigenspace. 6. 3 dominant. 242 examples. 254.265 Vol II: Eigensystems 455 optimal.115. 243 nonuniqueness. me nilpotent matrix.. 127. 8 singular matrix. power method. 241 residual. 4. me diagonal matrix.52 geometric multiplicity. me Eisner's theorem. me also see simple eigenspace eigenvalue.265 simple. me characteristic polynomial. 3 skew Hermitian matrix.334-335 backward error. 252. 242 eigenpair.. 39. me computing eigenvectors. 23.367 Dongarra. 395 ej («th unit vector). 56 Donath. 14 triangular matrix. 7.

89 factorization. 317. L. 41 fl (floating-point evaluation). 317 in Lanczos algorithm. 296 Frobenius norm.. W. 52. 200 residual. 202 elementary reflector. 380 Galerkin method.53 real matrix. 14 normalization. 228 filter polynomial. 380 Freund.53 rigorous bounds.. see norm Caches. 2 null space characterization. 365 Ritz values as roots. 200 generalized eigenvalue problem.419 EISPACK 2x2 blocks in QZ algorithm. J55 characteristic polynomial.285 Eisner. 131 effect on eigenvalues and eigenvectors. 130 perturbation of. D. 5-6 right. 136-137 right. 345-346 stability. me . 133 condition. 27 flam (floating-point add and multiply). 365 in Arnoldi method. 328 exponent exception computing bidiagonal QR shift. 141-142 perturbation bound. 200 Eisenstat. 140 first-order expansion. similarity transformation eigenvectors of a symmetric band matrix. 317. 133 chordal distance. 130 eigenvalue.202 attaining a small residual. 48-50. 156 eigenvalues by bisection. 138. 195-196 solution of inverse power equation. 58 in complex arithmetic. me first-order. 198-200 limitations of the algorithm. 27 Ericsson.. 137 perturbation bound. R. Hermitian matrix. 91 Fokkema.365 Leja points. 132 generalized Schur form. J. 130 algebraic multiplicity. R. 368. 4 simple. S. 200 loss of orthogonality.52. 200 starting vector. 129. 86-87 flrot.. 67 INDEX Fernando. V. 130 left.380 exchanging eigenblocks. 326-328. 197 orthogonalization. 197-198 Gram-Schmidt algorithm. 196-197 convergence. G. 379. T. 346 Francis. 143 first-order expansion.. see Rayleigh-Ritz method Gaussian elimination. 3 also see eigenpair. 110-112. 157 and inertia. 317 Fischer's theorem. C. 119 plane rotation. 198 tradeoffs in clustering.231 eigenvector. 46-47. 191 band matrix.456 normal matrix. 201. 8 singular matrix. 3 triangular matrix. 142-143 equivalence transformation. 111 6M (rounding unit). see Householder transformation Eisner's theorem. 132. 296 background. 128. 102 implicit double shift. 38. 134. R. 193 eigenvector computation. 195-200. 345 Chebyshev polynomial. K. me generalized shift-and-invert transform. 228. 130 condition.155 eigenpair. V. 347. 6 perturbation theory condition.170 Fraysse. J.. 352.. 223 computing inertia. 45^6. 140 projective representation.

203 singular value decomposition. 131 algebraic multiplicity. 294 Rayleigh quotient. 370 matrix-vector product.411 computation. generalized eigenvalue problem. 130 perturbation theory. 133 condition. 131 regular pencil.193 diagonal similarities. 136. 134-135 simple eigenvalue. 380 right and left eigenvectors. 138 real generalized Schur form.200 and balancing. 368-369 regular pair. eigenpair. J. 133 real generalized Schur form. 108-110 bidiagonal QR algorithm. R. 293 generalized eigenvalue problem. H. 294. 134. 144-147. H... G. 294 exact eigenspace. 143 generalized shift-and-invert transform. 40-41 Gerschgorin disks. 368. 198. 369 S/PD pencils.. 202 Jacobi's method. S. 365 Greek-Latin equivalents. 236-237 computation via the CS decomposition.. 227-228 Jacobi's method. 23. H. 227 S/PD eigenproblem. 421 Grimes. 203 types of grading.. 170. 130-131 pencil. 236-237 relation to the S/PD eigenproblem. 39 Gerschgorin. 296.201. 303 B variants. 136. 367 graded matrix. 367 Hadamard. 228 Gutknecht. 132 generalized Schur form. 198. 52 Givens rotation. 235 harmonic Rayleigh-Ritz method. M. 170 also see plane rotation Givens. 110 and the QR algorithm. 293-294 from Krylov decomposition.201 Vol II: Eigensystems 457 Goldstine.336 screening of spurious eigenpairs. 39. generalized eigenvalue problem. me Rayleigh quotient. 296 Rayleigh quotient preferred to harmonic Ritz value. 292 Hermitian matrix. 135 shifted pencil. me reduction to ordinary eigenproblem. 369. eigenvalue. J55 also see generalized eigenvalue problem. 108.359 Grcar. M. M. 203 Golub. 13. 108 Gram-Schmidt algorithm.52 condition. 304 modified. G. 236 geometric multiplicity. 52. 110 Handbook for Automatic Computation: Linear Algebra. 112 balancing for eigenvalue computations. 140 nonregular pencils.. H. 370 generalized singular value decomposition. 42-43 . R.264. 369 backward error. eigenvector QZ algorithm.379-380 bounding the residual.. see Gram—Schmidt algorithm Gu. 143. me infinite eigenvalue. see eigenvalue Gerschgorin's theorem.24. 133 Weierstrass form. 155 order of eigenvalues. 372 gsreorthog. 131 residual..227. 293. 201.INDEX Hessenberg-tri angular form. J. 202 eigenvectors by the inverse power method. 113. 293 Hermitian matrix eigenvalue. 55. 314-315 Hermitian matrix. 132. 294 ordinary eigenproblem. 147-148. 307. 112 eigenvalues by bisection. 296 harmonic Ritz pair. A. W. 367 gsreorthog.

me invariance of Hessenberg form. 69 Householder transformation. accumulating transformations. 83 reduction of row vectors. the. 43 interlacing theorem. 92.. 43.. 34 no-ni . S.. 82-83 reduction of a vector to multiple of e n . 98-100 convergence. 97. 118 real Francis shifts.112 aggressive deflation. 42 principal submatrix. 112 Wilkinson shift.265 residual bounds for eigenvalues.156 operation count. 96-97 stability. L. A..380 Hilbert. 123 operation count.. 87 stability. 98-100 eigenvalues only. 147 Higham. 94-95. 95-96 basic QR step. A.346 Francis double shift. P. 95 double shift algorithm. 51 special properties. 111 product with a matrix. 12. 123 applying transformations to deflated problems. me Hessenberg QR algorithm ad hoc shift. D. 254. me nonorthogonalreduction. 116. 92 stability.B±. 120-123 stability. 41 Hoffman-Wielandt theorem. 144-147. 8! Householder reduction.105 applying transformations to deflated problems. 23. 41 perturbation. 119 starting. 118-123. 95-96 double shift algorithm. 87. 380 Higham. 115. 82 Householder. 263-264. 100 searching for negligible subdiagonal elements. 262-263. 116.H. 105-106. 203 Hotelling. 81-82 reduction of a vector to multiple of ei. 94 complex conjugate shifts.265 sep. 94-95 relative criterion. 126 explicit shift algorithm. 279 hump. 118-119 implicit Q theorem. 23 Horowitz. 157 also see symmetric matrix Hessenberg form.. US. 86 overwriting original matrix. 92-94 operation count.112 computation. 111 of a symmetric matrix. 100.128 general strategy. 24. 111 implicit Q theorem. 100 operation count. J. 129 deflation. 94-95 backward stability.. 97-98 Hessenberg-triangular form. J.148 implementation issues. 112 implicit double shift. 98 forward instability. 87. 112 multishift.110. 42 min-max characterization. 82-83 reduction of real vectors. D. 123 QR step.228.169 Horn. 123 detecting negligible subdiagonals. 116 unreduced. 128-129 transmission of shifts. 325. N. 23 Hoffman-Wielandt theorem. R.112. 94 normwise criterion.458 Fischer's theorem. 158 uniqueness of reduction. 111. 83-88. 42 eigenvector. 147 stability. 87 operation count. 123-126 INDEX 2x 2 blocks. 115 deflation. 86 first column of the accumulated transformation. 13 residual bounds for eigenspaces.

22 sensitivity of eigenvalues. 417-418 correction equation. 408-409 convergence to more than one vector. 396. 203 Jacobi.419-420 compared with approximate Newton method. 416 stopping. 295. 70-71. 42 interval bisection. 34-35 principal vector. 404-405 exact solution of correction equation. C. 409 Jennings.INDEX /. 205 oo-norm. see eigenspace inverse power method. 409 orthogonalization. R. 415-416 extending partial Schur decompositions. 22..25 Jordan block. see Jordan canonical form Jordan canonical form.403.170 backward error.420 iterative solution of correction equation. C.. JU6.. 195-200 use of optimal residuals. 415-416 restriction on choice of shifts.296 Jiang.118 inertia. 419 derivation from approximate Newton method.420 Jacobi-Davidson method. 407 retarded by inexact solution. 70. 412-414. 404. R. 21. Z. 191 invariance under congruence transformations. 226. 414-415 ill conditioning. S. 203 quadratic. 69 convergence. 420 459 Davidson's algorithm. 347. 202 threshold method. 67 residual computation.36 assessment. 380 Jia. 394. In (identity matrix). I. P. 23 Jordan block. G. 193 invariant subspace. 192 stability. 423 implicit Q theorem.24-25. 192-193 inf (X) (smallest singular value of X). 68-69 implementation. 395 Jensen.20 powers of. 411—412 Hermitian matrices.. 296. 21 Jordan. 411 with harmonic Ritz vectors. 416-418 free choice of shifts. 190 of a tridiagonal matrix. 412 shift parameters. J. 267 Vol II: Eigensystems . 201 and LU decomposition. C.265 Johnson. 203 rediscovery. 202 cyclic method. 409 partial Schur decomposition. 405 harmonic Ritz vectors. 66-67 symmetric band matrix. 7. 419-420 operation count. 56 effects of rounding error. 201. 66. see norm interlacing theorem. 423 f (cross matrix). 202 convergence. 405-406 solving projected systems. 417-418 Olsen's method. 202-203 and graded matrices. E. 155. 191 computed by Gaussian elimination. u) (Krylov subspace). 408-409 restarting. u) (Krylov matrix). 407 residual computation. 407-411 focal region. 414-415 storage requirements. 267 fck(A. 203 parallelism.265 Jacobi's method. 406 convergence testing. 203 as preprocessing step.264 Kk(A.202. 190. A. 409 updating the Rayleigh quotient. 71 Ipsen. 38 uniqueness. 170. 416-417 Krylov sequence methods.. 416 preconditioning. C.381. 415 multiple right-hand sides. 68 dominant eigenvector.405.. 191-193 operation count..

277 convergence.3J5 adjusted Rayleigh quotient. 316 computation of harmonic Ritz vectors. 280 Kato. 348 as iterative method. 279 Kublanovskaya. 269-272 square root effect.. Lanczos algorithm Krylov. 365 full reorthogonalization. 365 Leja points. 365 convergence and deflation. 273-274 deflation.J. 268 and similarity transformations. 24 Kronecker. J.418 Krause. 23 K6nig. 352 Chebyshev polynomial. 312-314 equivalence. 110 Kogbetliantz.279 multiple eigenvalues.460 INDEX elementary properties.367 computation. 311-312 nomenclature. 311 Krylov subspace. 351 updating.279-280.365 stability.. S. 352 error analysis. 267-268 invariance under shifting.170. 312-314 similarity. 265 Kaniel.. 351 implicit restarting.309. L. N. 273 to interior eigenpairs. 279 and eigenspaces. 351 global orthogonality.. M. 367 computing residual norms. 276 and scaling. 277 compared with power method.365 Krylov-spectral restarting. 301 also see Arnoldi decomposition.. 269 uniqueness of starting vector. 52 Kronecker product. 279. 267.N. 309 Rayleigh quotient. 363 estimating loss of orthogonality. 155 Krylov decomposition. 267 Krylov sequence. 278 block Krylov sequence.52 Kline. 314-315 computation of refined Ritz vectors. 23 A(A) (the spectrum of ^4). 278 non-Hermitian matrices convergence. 277. 352. 307 Kahan. 274 multiple eigenvalues. 315 and Rayleigh-Ritz method. 274-275. 2 Lanczos algorithm. 268 block. G.T. 335 economics of transforming. 309 reduction to Arnoldi form. 272-273 to inferior eigenvector. 268 termination. 351-352.. 277 non-Hermitian matrices. 306 uniqueness. 268 and characteristic polynomial. G. 356-358 filter polynomial. E. 365 essential tridiagonality of Rayleigh quotient. 266. 314 computation of residuals. 273 to the subdominant eigenvector. A. 297. 53. Krylov decomposition. 310 orthonormal.-L. 228 Kollerstrom.. 273-274 polynomial bounds.. 298 . 280. 310 translation. 365 block. W.307 Rayleigh quotient.264. 268 Krylov matrix. 269. 362-363 and characteristic polynomial..156.227. 266 degeneracy in Krylov sequence.315 assessment of polynomial bounds. 110 Lagrange.201. 349. 277 convergence. 297. N. 309 to an Arnoldi decomposition. 275-276 polynomial representation. 266 convergence.280. 350 Lanczos decomposition. V.279 and defective matrices..280 polynomial bounds..

239 latent value. 128. Lanczos algorithm. 346 left eigenpair. 202 eigenvectors by the inverse power method. 235 spectral decomposition downdating. 365-366 semiorthogonality. 102 exchanging eigenblocks. 362-363 also see Lanczos algorithm. 156 2x2 blocks in double shift QR algorithm. 315.129 PWK method for the tridiagonal QR algorithm. 366 Rayleigh quotient. 307. 24 Le. 228 CS decomposition. 350 difficulties. 355-356 size of reorthogonalization coefficients. C. 358-359 effects of relaxing constraint. 308 local orthogonality. 350. 308 operation count. 233 large matrices. 350 method of minimized iterations.. 347 contamination by reorthogonalization. 359 software. 112 2x2 blocks in QZ algorithm. 364 modified Gram-Schmidt algorithm. 350. 350. Lanczos algorithm. 129 ad hoc shift in QR algorithm. see Lanczos algorithm Lanczos recurrence. 239 storage of eigenvectors. 105 bidiagonal QR algorithm. 237 differential qd algorithm. 202 eigenvectors of triangular matrices.. 349-350 updating adjusted Rayleigh quotient. 366 partial reorthogonalization. 353. 365-366 software. 345. 171 reduction to tridiagonal form. 280 selective reorthogonalization. Lanczos algorithm. full reorthogonalization. 348 local and global. 358-359 contamination of the Rayleigh quotient. J. 365 Lanczos decomposition.315 loss of orthogonality..365 LAPACK. 328. 354 singular value decomposition. 279. 353 QR factorization. partial reorthogonalization. 349. no reorthogonalization. 346 multishiftQR algorithm. 170 S/PD generalized eigenvalue problem. 355 residual norms. 170 Wilkinson's algorithm. see Lanczos algorithm Lanczos. 2 Lehoucq. J. 353-354 Vol II: Eigensystems 461 Rayleigh quotient. 358-362. 279 no reorthogonalization. 239 storage of sparse matrices. 350 loss of orthogonality. B.348 alternative. 350 thick restarting. 365 and semiorthogonality. 187 tridiagonal QR step. G. 307-308. periodic reorthogonalization.365-366 and reorthogonalization.346 Lewis. 348-349 general algorithm. Lanczos algorithm. 367 .INDEX Lanczos recurrence. 308 storage considerations. 228 divide-and-conquer algorithm for the spectral decomposition. R. 348. 183 eigenvalues by bisection.365-366 example.. 366-367 termination and restarting. 350 reorthogonalization. 366 software. 355-356 Ritz vectors. 201 storage of band matrices. selective reorthogonalization restarted. 366 periodic reorthogonalization.

A. B. 30 examples. 421 cross. 31-33.36 compatable norm. C. 403. 28.S. S. K. 193. C.345 multiple eigenvalue. 423 orthonormal. 26 absolute norm. 28 Martin. B. 423 history. 424 triangular.43 matrix norm. 418 Newton-based methods. 28 and spectral radius. S.. 113 Mathias... I. 6. 36 spectral radius bounds. 129. see convergence ratios LR algorithm. K. 236 More.. R J. 29-30 construction by similarities. J... 156 Moon.26. 26 and 2-norm. 27 properties. 23 identity. C. 235 local convergence ratio.27 limitations. 422 symmetric. 2 Murray. 27 and largest singular value. 36 l-norm. 418-419 Newton. 205 matrix norm. 380 method of minimized iterations.35-36. R. 395 application to eigenproblem. 203 .12 geometric multiplicity.235. 423 element.33. 26.394 convergence. 418 relation to the QR algorithm. 24 Newton's method. 423 permutation. 27 vector norm. 29 equivalence of norms. 418 relation to Rayleigh quotient method. 65 tridiagonal matrix. 424 unit triangular. 423 sum and product... 30 Euclidean norm. 27 and singular values. 27 unitary invariance. 36 consistent vector norm. 265 Morgan. 27 co-norm. 2 algebraic multiplicity. 29. 26 Frobenius norm. 12-13 nonnormal matrix. R. 422 orthogonal. 33 the hump. trapezoidal.. 423 trapezoidal. 421 full rank. R. 367 INDEX M(X) (null space of X). P. 395-396 history. 28 examples. 110. 201 nilpotent matrix. 27 of orthonormal matrix. 205 trace characterization... 4. 25 operator norm. 423 order. 27 rounding a matrix.. 424 unitary. 26 unitary invariance. 26 family of norms. 25. 381 Nielsen. 58-59 Meerbergen. 423 matrix powers (Ak). 111 LU decomposition. Y. 418-419 derivation by perturbation expansion. 191 Lui.J.27 2-norm and Frobenius norm. 34 matrix-vector product effects of rounding error. 33-36 behavior when computed. Z. 279 Moler.6. 423 zero. 71 norm. 424 Nash.425 and inertia. 425 Hermitian. 296.462 Li.265 matrix. 423 diagonal. 36 consistent norm.12 multiset..

A. 60-62 deflation. 91 pre. 92 bidiagonal QR step. 424 nullity [null(A)]. 70 power method. 2-norm subordinate norm.. C. 167 implicitly restarted Arnoldi. 418. 88-89 plane rotations Givens' method. C. 162 Vol II: Eigensystems 463 spectral decomposition updating.280.. 409 Krylov-Schur restarting. J. 380 null space CV(X)). 86 Householder-matrix product. 14 eigenvalues. 425 Powell.266. 89-91 operation count. 88 introducing a zero into a matrix. 88 introducing a zero into a 2-vector.279. 27 triangle inequality. singular value decomposition.. 424 Oettli. eigenvalue. 220 complex arithmetic. E. j)-plane. 36. 111 and Wilkinson diagrams. see Rayleigh-Ritz method plane rotation. S/PD generalized eigenproblem Peters. 179-181 Wilkinson's algorithm for S/PD eigenproblem. M. relative and structured perturbation theory. see norm. 419 operation count application of a plane rotation. 108 band tridiagonalization. 69-70. N. Hermitian matrix. 226 implicit double shift QR step. 181-182 eigenvectors of a triangular matrix. 185 divide-and-conquer algorithms. 88. M.. 24. 346. 107 divide-and-conquer algorithm for the spectral decomposition. see generalized eigenvalue problem pentadiagonal matrix. J. 347. D.. see exponent exception Paige. 425 positive semidefinite matrix.420 Petrov method. see unitary similarity Osborne. B. 367. 102 explicit shift Hessenberg QR algorithm. 304 balancing. 192 Jacobi-Davidson method.228. 315. 70 Olsen. see eigenpair. 91 Arnoldi process. 112 Ostrowski. 27 vector norm. 217 reduction to Hessenberg-triangular form. 86-87 deflation by inspection.W. 89 generation. 366 Parlett. 380 pencil. generalized eigenvalue problem. B. 147 reduction to tridiagonal form. eigenvector. 186 perturbation theory.. 100 Householder reduction to Hesssenberg form.296. 232 orthogonal similarity. 379.. 295. 65 . 329 Lanczos recurrence. 170 positive definite matrix.170. 320-323 inertia of a tridiagonal matrix. 308 QZ algorithm. 66 choice of starting vector. 71 overflow. 123 implicit tridiagonal QR step. 14 Nour-Omid. 91 application. 237. 58. 150-152 reduction to bidiagonal form. 57 convergence criteria. simple eigenspace. 365.. 70 effects of rounding error. 14 eigenvectors. 26 normal matrix.169. 347.. 26 unitarily invariant norm. 189 basic QR step in Hessenberg QR algorithm.296.91 exponent exception. E.265.and postmultiplication.60 convergence. 56. 89 in the (e. G.381 assessment. 82 hybrid QRD-SVD algorithm.INDEX spectral norm.

78-79 disorder in eigenvalues. 421 jpmxn (Space Of reai m x « matrices). 155 choice of double shift. 301 in Krylov decomposition. 80 variants (RQ. 111 relation to Newton's method. 252 in Arnoldi decomposition. 309 in Lanczos decomposition. 111 shifted. 136. QZ algorithm. 343-344 potential instability. 64 Markov chain. 24 pseudospectrum. tridiagonal QR algorithm QR decomposition.418 INDEX Wilkinson shift. subspace iteration. see QR algorithm QR algorithm. 260 and optimal residuals. 59 operation counts. 58-59 rate of convergence. 150-152 QZ step. 76 convergence. 148-152 real shifts. 80 implications for the shifted algorithm. see QR algorithm . 75 also see bidiagonal QR algorithm. Hessenberg QR algorithm. Strutt).77. 155 deflation. 73-74 multiple eigenvalues. 152 operation count. 79 equimodular eigenvalues. 64. 418 Rayleigh quotient shift. 403. 71. 69 relation to Newton's method. 113. 394. 110 impracticality on a full matrix. see QR algorithm quasi-triangular matrix. 58 Rayleigh quotients. 69. 80-81 QL algorithm.55. 426-427 and QR algorithm. 126-128 QZ algorithm.464 implementation. see Jordan canonical form projector. 347 no free lunch. 77-80 history. see Rayleigh-Ritz method principal vector. 75 quadratic. 421 R n (real n-space). 381-382 Prager. 169.70 accuracy of. 75 unshifted algorithm. W.170 QR step. 111 alternate shifts. 148 treatment of infinite eigenvalues. 36-37 qd-algorithm. 147-148. 75 cubic.153 eigenvalues only. 110 QL algorithm. 1.396 quadratic convergence. LQ). 69 normalization. 427 proper value. QL.156 E (real numbers). 70 primitive Ritz vector. 77-80 convergence to Schur decomposition. 418 rational Krylov transformation.63. 80 rates of convergence. 115 computing eigenvectors. 63 shifting. 75 error bounds for a QR step. 424 Raphson. 113. 421 R(X) (column space of A"). 111. 306 Rayleigh quotient method. 152 starting the double shift. 424 rank. 342-344. 24. W. 7_I unshifted. 343 Rayleigh (J. 70 for generalized eigenvalue problem.169. 150. 70.138 for subspaces. 72. 71 Rayleigh quotient shift. 153-155. 418-419 relation to power method. 110 QR step. 47.156 2x2 blocks.. 73 relation to inverse power method. 396 as factorization of a polynomial. 156 stability. 148 computation of eigenvectors. 111.170. 70 with two vectors.295 Rayleigh quotient. 72-73.

63. see Rayleigh-Ritz method Ritz. 295 rounding unit (eM). 1. 291-292 Vol II: Eigensystems 465 basic algorithm. 123-126 computation by Householder transformation. 24 by rowwise deflation.128 computation by double shift QR algorithm. 72 computation by explicit shift QR algorithm. 296 from Krylov decomposition. 203.295 Hermitian matrix. 344-347 scalar. 291 cross-product algorithm. 289 Ritz blocks and bases.. 407 Schur vector. 110. A. 156 Reinsch. 38 real Schur form. 295-296 general procedure. 295 residual bound.C. 66-67 convergence testing. 159 residual.. 281 harmonic Rayleigh-Ritz method. 279. 280-281 real matrix eigenvectors..380 Rutishauser. 282 convergence.295. 284. 113.. 283 two levels of approximation. 23.231 normwise. 395 Saad. 247 reverse communication. 126-128 in subspace iteration. J.265 computation in the inverse power method. 283 optimal residual. 62. 286-288 residual. 295 oblique. 54 relative error. 2 Ritz pair. J. 284 Petrov method. me also see real Schur form . 291 compared with Ritz vector. 286-287 uniform separation condition. 282 Ritz space. 252. 70. 27 row orientation. 159 in two-dimensional arrays. 367. 126. 289-290 use of Rayleigh quotient. 23. 288 Ritz value. 288 Ritz basis. 295-296 computation. 282 separation condition. 478 Schur decomposition. 291-292. 282-283 Galerkin method. H. 345 Rigal. 289. 347. R.280. computing eigenvectors partial. me history.. see column.and row-oriented algorithms Ruhe. 285-287 Ritz vector. 61.253... 314 convergence.295. 12. 281 failure. 195-196 optimal. 282 Ritz value. 379. 282 Ritz vector. 380 right eigenpair. 394. B. L. see triangular matrix. 391 refined Ritz vector. Y.INDEX Rayleigh-Ritz method. 61_ backward error. 290 Reid. 98-100 computing eigevectors. 282 Ritz block. 28 representation of symmetric matrices.55. 113 relative and structured perturbation theory. 71 and Jordan form.396 and Lanczos algorithm. 348 as a generalized eigenvalue problem. 282 projection method. 284 orthogonal. 264 resolvent. 159 packed storage. 22. 287 exact eigenspace. 283 choosing a primitive Ritz basis. 282 primitive Ritz vector. 43.10-12.381. K. 28 in eigenvalues.280. 293. 252. 421 Schnabel. 282 Ritz pair. 115 computing eigenvectors. 295-296 primitive Ritz basis. 24. 5-6 perturbation of simple eigenvalue. W.

208 second-order expansion for small singular values. 204 singular vector. 46. 206. 67-68 <ri(X) (the ith singular value of X). 9 unitary similarity. 376-377 growth. 204 perturbation theory. 226 <ri(X) (the ith singular value ofX). me singular value factorization. 228 history.380 rounding-error analysis. 245 several eigenspaces. 247 spectral representation.53-54. 395 secular equation.247 also see eigenspace Simpson. 205 differential qd algorithm. 246.15 Schur. 8 eigenvalues and eigenvectors. 50-51. 379 Scott. 256 shift-and-invert enhancement.143. 51 properties. 247 spectral projector.204 and 2-norm. 375-379. 376 defective matrix. 265 INDEX bounds for eigenspace. 209 condition. 204. 380 effect on Rayleigh quotient.256 Hermitian matrices. 254-255. 201 Scott. 245 corresponding left eigenspace. 204 similarity transformation. 365-367 simple eigenspace. 257 limitations. 378-379. 206 ordering convention. 204 singular value decomposition. 261 bounds in terms of the error. I. 206. 260 condition of an eigenspace. 204. 257 resolvent. 380 purging..265 perturbation theory. 261 main theorem. 366-367 singular value. me first-order expansion. 376 effect on Ritz vectors. \2 relation to eigenvectors.. 265 accuracy of the Rayleigh quotient. 379 geometry.205 and Frobenius norm. 418 singular value. 380 sep. S. 205 effects of rounding. 251 biorthogonal basis for left eigenspace.. 375-376 null-space error. 346. 9 invariance of trace. 229 two-sided method of Kogbetliantz. 55. H. 259 residual bounds. 258 normalization of bases.380 assessment. 378.466 Schur vector. 204 and 2-norm. 226 Jacobi methods. 259 condition of an eigenblock. 66 factorization. D. 245 uniqueness. 245-246 complementary eigenspace. 24 Schwarz. 228 divide-and-conquer algorithms. J. A. 67 solving linear systems. 244 angles between right and left eigenespaces. 205 and eigenvalues of the cross-product matrix. 228-229 one-sided method. 27 and spectral decomposition of cross-product matrix. 228 Lanczos algorithm. H. 244-246 several eigenspaces. 260.265 alternate bound. 211 min-max characterization. 245 deflation of an approximate eigenspace. 12. 365. 225 downdating.226-227 analogue of Rayleigh quotient. 24 semidefinite 5-Lanczos and Araoldi algorithms. 246 standard representation. me .. 245 block diagonalization.. 10 Simon. R.255 and physical separation of eigenvalues. 157.

231 ill conditioned. 230 generalized shift and invert transformation. 231 well conditioned eigenvalues not determined to high relative accuracy. 231-233.. 229 eigenvalue. 171 computing eigenvectors. 171. 414-415 ill conditioning. 414-415 Sorensen. 345.420 solution of projected systems. 173 secular equation. G. 177-178 deflation.. 2-norm spectral projector. 173-175 tradeoffs. 177-181 stability. A. see subspace iteration.. 171-173 accumulating transformations. 176-177 spectral enhancement. 231 projective representation. 415 multiple right-hand sides. 210 right singular vector. 346. 14 skew symmetric matrix.235 computation of R~TAR~l.234-235 product problem (AB). 231. 380 SRR. 206-210. 201 assessment. 3^ norm bounds. 234 467 use of spectral decomposition of B. D. 209-210. 157. 235 Chandrasekaran's method. 296. 235 congruence transformation. see spectral decomposition updating spectral decomposition updating. 201. 204 perturbation theory. 4]9 VolII: Eigensystems . 370 ill-conditioned B. 36 spectral representation. 31-33. 175 solving the secular equation. 234 no infinite eigenvalues. see simple eigenspace spectral transformation. 226-227 condition. 346 Steihaug. 204 and eigenvectors of the cross-product matrix. 229 Wilkinson's algorithm. L.INDEX uniqueness. 379 also see generalized shift-and-invert transformation spectrum.205 left. 232 spectral decomposition. 230 perturbation theory. 2 Spence.229 B-orthogonality. 53 spectral radius. 204 and eigenvectors of the cross-product matrix. me first-order expansion. 232-233 operation count. J_3. 346. 27 effects of rounding. 208 left singular vector. 232 storage and overwriting. 211-212 updating. 158 and 2-norm. 64 shift-and-invert enhancement. see singular value decomposition singular vector. 235 S/PD pencil. me spectral norm. 58. 210 separation of singular values. C.239 S/PD generalized eigenproblem. 179 the naive approach. A. 181 basic algorithm. see skew Hermitian matrix Sleijpen. 316. 424 sparse matrix. 229 diagonalization by congruences.. 87 Stathopoulos. 231 eigenvector. 205 singular value factorization. 230 condition number. 365 span(^f) (the space spanned by X}. 157. 205 skew Hermitian matrix. T. G.. 179-181 reduction to standard form. 66. 230 banded problems. 175 operation count. 210 right. 14 eigenvalues. Schur-Rayleigh-Ritz refinement stability in the ususal sense. 233-235 use of pivoted Cholesky decomposition of B. see norm.

384 deflation. 100-104 operation count. N. 159 in two-dimensional arrays. 158 special properties. see singular value decomposition Sylvester's equation. 389. P. 387-388 stability. 9 use in debugging.. W. see S/PD generalized eigenproblem symmetric tridiagonal matrix.156. 16 Sylvester. 391 Schur-Rayleigh-Ritz refinement...201 symmetric band matrix. 190. 189 band width. 422 Trefethen. 186 symmetric matrix inertia. 388 dependence in basis.52.. 104 tridiagonal matrix. 390-391 when to perform. 0.155. D. 316. 395 symmetric matrix. see norm triangular matrix computing eigenvectors. me representation.295. 386. 385 convergence testing.16. 159 simplifications in the QR algorithim. 186 .128 conditions for solution. 381. 395 economization of storage. 201 limitations. 24. 189 operation count. 248 Toumazou.394 shift-and-invert enhancement. 186 computing eigenvalues and eigenvectors. 24. 393-394 Rayleigh-Ritz method. 383 Schur-Rayleigh-Ritz refinement.37 triangle inequlity. 386-387 orthogonalization. 382. 9 invariance under similarity transformations. T.382.. 394. 187-189. 390 software. 24. 395 Stewart. 367 Q(X. 10 trace(^) (trace of A). 18. 389-391. 2. see eigenvectors of a symmetric band matrix pentadiagonal matrix. W. 170. 395 Sturm sequence.394 convergence.. 265. 382 freedom in dimension of the basis.237. 158. 203 Tang. 102-104 computing left eigenvectors. 201 Taussky. P. 388 convergence theory. R. see tridiagonal matrix systolic array. 384 to dominant subspaces.386. 394 and QR algorithm. 391 Chebyshev acceleration. 186 representation. 159 packed storage. 394 Sun. 380 trace as sum of eigenvalues. G. 384 defectiveness. 102 stability and residuals. 16-17 Kronecker product form..y) (canonical angle matrix). V.. 392-394. 187 storage. J. 345. 52 SVD.202 subspace iteration.468 INDEX band tridiagonalization. J.186 Stewart. 157 also see Hermitian matrix symmetric positive definite generalized eigenvalue problem.. 384 general algorithm.394 when to perform.384-385 frequent with shift and invert enhancement. 186 eigenvectors. L. 394 real Schur form. 391-392 Treppeniteration. 189 stability. J. 24 numerical solution.24 Sylvester operator. 23.-G. J.382.264. 52 Taylor.346. 386 superiority to power method.

167 variations.237 vector.. 346.229. 365 Yang. 116 van der Vorst. 132 Weierstrass. 192-193 making a Hermitian matrix real.. 70 Wilkinson.420 Van Loan. 160 tridiagonal QR algorithm. 8J. see norm unitary matrix eigenvalues. 191-193 operation count. 162 operation count. 169. 167-168 Rayleigh quotient shift. 170-171 combination of Rayleigh quotient and Wilkinson shifts. 168-169. 23.171 stability.170. see Hessenberg QR algorithm.. 71.F. H.346. S.H. K.. 423 zero. R. 280. 167.24.R.53. 23. 23.. 264. see exponent exception Underwood. 367 unitarily invariant norm.52. 160 Wilkinson's contribution. 170 inertia. C. 159-162. 345 Zhang.418 Wu.170 Wilkinson shift. 420 Wilkinson diagram. 169 QL algorithm.169. Z. 1. 167 PWK method. 170 . 421 component.. 164 graded matrix. 296. 162 stability. 195 Givens' reduction to.170 using symmetry..365. QR algorithm. 164-167 operation count.170 implicitly shifted QR step.189 first column of the transformation.235. 155 Wielandt. 169 local convergence. D.346 Weierstrass form. 170 deflation. H. A. J. 162. K. 169 explicitly shifted. R.. C. 14 unitary similarity. 169. 193-195.163. 10 VolII: Eigensystems unreduced Hessenberg matrix. 111. 158 calculation of a selected eigenvalue.170 Twain. Mark.. 156 Watkins. 10 backward stability.168. C. 170. J.201-202 accuracy. tridiagonal QR algorithm Wilkinson. 424 Wilkinson shift. 423 von Neumann. 195 use of fast root finders.INDEX assumed symmetric. 192 stability. 20 underflow.. 162-163 representation.156. 167. 70. 203 Vu.394.129. 170 Householder's reduction to.170 graded matrix.345 469 Ward. 421 unit. 112.. 168-169 detecting negligible off-diagonal elements..23. 111-113. 128.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->