You are on page 1of 28

Hannes Thiel

The QR Algorithm
and other methods to compute the eigenvalues of complex matrices

Handed in in April 2005 to Dr. Schachtzabel, Mathematical Department, University of Potsdam


The QR Algorithm

”[The QR algorithm is] one of the most remarkable algorithms in numerical math-
ematics”
(Strang)

”Indeed it is quite remarkable that an algorithm, wich is both effective and easy to
describe, has resisted, and stoutly continues to resist, a full mathematical analysis,
to such an extent that no proof of convergence in the most general case (the matrix
not Hermitian; the QR algorithm with shifts) exists at the present, at the same
time that no counter-example to convergence exists”
(Phillippe G. Ciarlet)

Abstract
This work deals with variants of the power and QR method to determine the eigenvalues of
complex matrices. First, definitions and properties about matrices, the eigenvalues problem
and matrix decompositions are presented. Then the methods are thoroughly discussed. All
presented algorithms were implemented and tested in Java.

Introduction
At first the eigenvalue problem seems quite clear. The eigenvalues of a matrix A are exactly
the roots of its characteristic polynomial pA and in C one can find all the roots of pA . So why
all the excitement?
The reason is the numeric instability of the determinant function. In fact one rather tries
to find the zeroes of a polynomial by applying the QR algorithm to its companion matrix
(which has exactly the zeroes of the polynomial as eigenvalues) than to form the characteristic
polynomial of a matrix and find its zeroes.
The correspondence between (arbitrary) polynomials and the eigenvalue problem shows
also that one cannot find explicit solutions for the eigenvalues of a matrix with more than 4x4
entries. Hence, one has to use iterative solutions. The QR method is a simple algorithm that
does that. It can be improved to make it efficient. Anyway it has some disadvantages: It has
problems with multiple eigenvalues and eigenvalues of equal modulus. Although one can try
to work around theses problems there is no all-embracing algorithm to solve the eigenvalue
problem.

Remarks
The reader needs basic knowledge in linear algebra and analysis (as gained in the first or second
year of studying mathematics).
A ”•” indicates a definition, a ”” indicates a proposition or theorem. In the whole paper
K can be considered as R or C.
 
• •
The notation stands for a 2x2 matrix A with arbitrary entries A11 , A12 , A22 and
· •
zero entry A21 .

i
Table of Contents
1 Matrices 1
1.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 The Eigenvalue Problem 3

3 Matrix Decompositions and Factorizations 5


3.1 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 LDMT and LDLT Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 QR Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.4 Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.5 SVD - Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.6 Polar Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.7 Block Diagonal Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.8 Jordan Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.9 Schur Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.10 Eigen Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Sensitivity of the Eigenvalue Problem, Perturbation Theory 8


4.1 Continuity of the Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2 Spectral Variation of Nonnormal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2.1 Eigenvalue Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.2 Eigenvector Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.3 Invariant Subspace Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Normal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.4 Hermitian and Skew-Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.5 Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Algorithms 14
5.1 Householder Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Givens Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 QR Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4 Hessenberg Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4.2 Hessenberg Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4.3 QR Decomposition of Hessenberg Mastrices . . . . . . . . . . . . . . . . . . . . . 20
5.5 The Power Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.5.1 The Simple Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.5.2 The Inverse Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.6 The QR Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.6.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.6.2 QR Method with Hessenberg Reduction . . . . . . . . . . . . . . . . . . . . . . . 24
5.6.3 QR Method with Shifts and Decoupling . . . . . . . . . . . . . . . . . . . . . . . 24

A Proofs 25

B Bibliographical Reference 25

ii
1 Matrices
1.1 Basic Definitions
Let denote:
• Mnl (K) the set of all n-l matrices over K
• Mn (K) the set of all n-n matrices over K (i.e. Mn (K) := Mnn (K))
• In the n-n identity matrix
• 0nl the n-l zero matrix.
For abbreviation let Mn denote Mn (C).

For a matrix A = (Aij ) ∈ Mnl (K) one defines:


• the transposed matrix AT ∈ Mnl (K) as AT = (Bij ) with Bij := Aji
• the conjugate-transposed matrix AH ∈ Mnl (K) as AH = (Bij ) with Bij := Aji
 For real matrices: AH = AT

A matrix A ∈ Mn (K) is called:


• symmetric :⇔ AT = A
• skew-symmetric :⇔ AT = −A
• Hermitian :⇔ AH = A
• skew-Hermitian :⇔ AH = −A
• regular :⇔ A is invertible
• unitary :⇔ AH A = E
• orthogonal :⇔ AT A = E
• normal :⇔ AAH = AH A

 If A ∈ Mn (R) then: A is (skew-)symmetric ⇔ A is (skew-)Hermitian


A is orthogonal ⇔ A is unitary
 If A is Hermitian, skew-Hermitian or unitary then A is normal.
 If A is Hermitian then xH Ax is real for x ∈ Cn and A is called
• positive (semi)definit (write: A > 0, A ≥ 0) :⇔ ∀x 6= 0 : xH Ax > 0 (≥ 0)
• negative (semi)definit (write: A < 0, A ≤ 0) :⇔ ∀x 6= 0 : xH Ax < 0 (≤ 0)
A positive (negative) semi-definit matrix A is also called positive (negative).
A positive (negative) definit matrix A is also called strictly positive (negative).

One defines:
• GLn (K) := {A ∈ Mn (K) : A regular } = general linear group
• SLn (K) := {A ∈ Mn (K) : det A = 1} = special linear group
• On (K) := {A ∈ Mn (K) : A orthogonal } = orthogonal group
• SOn (K) := {A ∈ Mn (K) : A orthogonal, det A = 1} = special orthogonal group
• Sym(K) := {A ∈ Mn (K) : A symmetric }
• Hn := {A ∈ Mn (C) : A Hermetian }
• Un := {A ∈ Mn (C) : A unitary }
• U Sn := {A ∈ Mn (C) : A unitary, det A = 1}
• HP Dn := {A ∈ Mn (C) : A Hermitian, positive definit }
• SP Dn := {A ∈ Mn (R) : A symmetric, positive definit }

Two matrices A, B ∈ Mnl (K) are called:


• equivalent :⇔ ∃P1 ∈ Gln (K), P2 ∈ Gll (K) s.t. A = P1 BP2
Two matrices A, B ∈ Mn (K) are called:
• similar :⇔ ∃P ∈ Gln (K) s.t. A = P −1 BP
• unitarily similar :⇔ ∃P ∈ Un (K) s.t. A = P −1 BP

1
A matrix A ∈ Mn (K) is called:
• lower triangular :⇔ Aij = 0 whenever i < j
• upper triangular :⇔ Aij = 0 whenever i > j
• diagonal :⇔ Aij = 0 whenever i 6= j
• reducible :⇔ a nontrivial partition {1, ..., n} = I ∪ J exists s.t.
(i) Aij = 0 whenever (i, j) ∈ I × J
• irreducible :⇔ A is not reducible
• projection :⇔ A is idempotent (A2 = A)
• permutation matrix :⇔ Aij = δi,s(j) for some permutation s ∈ Sn
Let denote:
• diag (d1 , ..., dn ) the diagonal n-n-matrix D s.t. Dii = di
• Ps the permutation matrix related to the permutation s ∈ Sn
 Permutation matrices are real, orthogonal and have in every row and column exactly one
nonzero entry. Further: Ps−1 = Ps−1 = PsT  
−1 B C
 A ∈ Mn is reducible iff a permutation matrix P exists s.t. P AP = .
0p,n−p D
 Every Projection P can be characterized by its image M := =P and kernel W := Ker P . We
say that P projects on M along W . It holds Cn = M ⊕ W . On the other side: If Cn = M ⊕ W
then there is an unique projection on M along W .
Every matrix A ∈ Mnl (K) can be considered as linear map from Kl to Kn (and vice versa) via x 7→ Ax.
One defines:
• null space (kernel) Ker A := {x ∈ Cn : Ax = 0}
• range space =A := ACn = {Ax : x ∈ Cn }
• trace tr A := A11 + ... + Ann

1.2 Matrix Norms


Since Mnl (K) is a K-vector space, the definition of a matrix norm is very natural:
• matrix norm = function k.k : Mnl (K) → R+ s.t. ∀A, B ∈ Mnl (K), α ∈ K:
(i) kA + Bk ≤ kAk + kBk (triangular inequality)
(ii) kαAk = |α| · kAk (homogeneity)
(iii) kAk = 0 ⇔ A = 0
One calls a matrix norm k · k on Mn (K) submultiplicative if ∀A, B ∈ Mn (K):
(iv) kABk ≤ kAk·kBk

• A matrix norm k · k is said to consistent with a vector norm k · k if


(i) kAXk ≤ kAkkxk, ∀A ∈ Mn (K), x ∈ Kn

Some important norms on Mn (K) are:


kAxk
• the operator norm regarding a norm k.k on Kn : kAk := max06=x∈Kn kxk = maxkxk=1 kAxk
kAxkp
• the p-norms are special operator norms: kAkp = max06=x∈Kn p = 1, 2, . . . , ∞
kxkp ,
P 1/p
n p
• the Schatten p-norms: kAk<p> = j=1 σj (A)
• the Schatten 2-norm is also called P Frobenius norm and denoted by kAkF
• the Ky Fan k-norms: kAk(k) := kj=1 σj (A)
( σi (A) are the singular values of A, see also section 3)

2
 One has the following characterization of the various norms:
P
(1) kAk1 = maxj Pi |aij |
(2) kAk∞ = maxi j |aij | = kAH k1
P 1/2
(3) kAk<2> = kAkF = (tr AH A)1/2 = i,j |A ij |2

 All norms on Mn (K) are equivalent.


 kAk2 and kAkF are invariant w.r.t. unitary transformations
(i.e. kQARk2 = kAk2 , kQARkF = kAkF for appropriate unitary matrices Q, R)
One usually denotes by k| · k| a unitarily invariant norm.

• One defines for A ∈ GLn (K) the condition number relative to inversion as κp (A) :=
kAkp kA−1 kp .

2 The Eigenvalue Problem


For λ ∈ K one considers EA λ := Ker (A − λI ) = {x ∈ Kn : Ax = λx}.
n
λ
 For all λ: EA ⊆sub X
One is interested in λ0 s where dim EAλ > 0, i.e. where E λ does not only contain 0.
A

The eigenvalue problem for A ∈ Mn (K) is to find 0 6= x ∈ Kn and λ ∈ K s.t. Ax = λx. In


other words: find λ ∈ K s.t. dim EA λ > 0.

One then calls:


• λ an eigenvalue of A
• x an eigenvector of A (associated with λ)
• EAλ the eigenspace of A associated with λ

 For every eigenvalue λ exists a left eigenvector ψ 6= 0 : ψ H A = λψ H (⇔ AH ψ = λψ)


 The eigenspaces of distinct eigenvalues λ1 , ...λr are in direct sum.
(i.e. if x1 ∈ E λ1 , ..., xr ∈ E λr , x1 + ... + xr = 0 then x1 = ... = xr = 0)

The characteristic polynomial of A ∈ Mn is pA (z) := det(A − zIn ), z ∈ C.


 λ is eigenvalue of A iff it is a root of pA (z).
 Therefore A has exactly n (maybe repeated) eigenvalues in C.
Let λ1 (A), ..., λn (A) denote these eigenvalues s.t. equal values stay together and
|λ1 (A)| ≥ ... ≥ |λn (A)| (if possible I will just write λ1 , ..., λn )
Let µ1 (A), ..., µK (A) denote the different eigenvalues of A

One further defines for A:


• the spectrum σ(A) := {λ ∈ C : λ eigenvalue of A} = {λ1 (A), ..., λn (A)}
• the spectral radius rσ (A) := maxλ∈σ(A) |λ|

 pA (z) = cn z n + cn−1 z n−1 + ... + c0 with cn = (−1)n , cn−1 = (−1)n−1 tr A, c0 = det(A)


 If A is similar to B then pA (z) = pB (z), then σ(A) = σ(B), tr A = tr B, det A = det B.
 For A ∈ Mn : σ(A) = σ(AT ) and λ ∈ σ(A) ⇔ λ ∈ σ(AH ), hence rσ (A) = rσ (AH ).

For an eigenvalue λ of A one defines:


• the algebraic multiplicity mλA := multiplicity of the root λ of pA
• the geometric multiplicity gA λ := dim E λ
A
λ λ
 For all λ ∈ σ(A): 1 ≤ gA ≤ mA ≤ n

3
An eigenvalue λ is called:
• defective :⇔ gAλ < mλ
A
• nondefective :⇔ gA λ = mλ
A
• simple :⇔ mλA = 1
• geometrically simple :⇔ gA λ =1

• A matrix is called nondefective if all its eigenvalues are nondefective.


 A ∈L
Mn is nondefective
⇔ λ∈σ(A) EA λ = E µ1 ⊕ ... ⊕ E µK = Kn
A A
⇔ there is an eigenbasis of Kn (a basis consisting of eigenvectors of A)
⇔ A admits an eigen decomposition (see section ...)
⇔ A is diagonalizable (i.e. A is similar to a diagonal matrix)

• A subset S ⊂ Cn is called invariant w.r.t. A ∈ Mn :⇔ AS ⊆ S

There are various propositions concerning eigenvalues:


 If A is Hermitian (or real and symmetric) then all eigenvalues of A are real. (σ(A) ⊂ R)
One can then order the eigenvalues. Let λ1↓ (A) ≥ ... ≥ λn↓ (A) and λ1↑ (A) ≤ ... ≤ λn↑ (A)
denote the
eigenvalues of A in descending and increasing order, respectively.
 If A is skew-Hermitian then all eigenvalues of A are pure complex. (σ(A) ⊂ iR)
 If A is unitary (or real and orthogonal) then all eigenvalues of A are of modulus 1.
(σ(A) ⊂ T = {z ∈ C : |c| = 1})
 Schur Decomposition: Every matrix A ∈ Mn is unitarily similar to an upper triangular
matrix
(see also section 3.9)
 Normal matrices are unitarily diagonalizable: (and only normal matrices are)
A ∈ Mn (K) normal ⇔ ∃U ∈ Un : U AU H = diag (λ1 , ..., λn )
 Real normal matrices are orthogonally block-diagonalizable with blocks of size 1x1 and 2x2:
A ∈ Mn (R) normal ⇒ ∃O ∈ On (R) : OAOT = diag (λ1 , ..., λr , Mr+1 , ..., 
Mn ) 
ak bk
where λk are the real eigenvalues of A and every Mk has the form Mk = , bk 6= 0
−bk ak
for
the conjugate complex eigenvalues ak ± bk i of A.
 Real symmetric matrices are orthogonally diagonalizable. 
M11 M12
 Break Down: If A ∈ Mn can be decomposed as A = then σ(A) = σ(M11 ) ∪
0pp M22
σ(M22 ).

4
3 Matrix Decompositions and Factorizations
This section presents the most important matrix decompositions.

3.1 LU Decomposition
• A LU decomposition of A ∈ Mn (K) is a pair (L, U ) of matrices L, U ∈ Mn (K) s.t.
(i) L is unit lower triangular (i.e. Lii = 1)
(ii) U is upper triangular
(iii) A = LU
 A ∈ GLn (K) admits a (unique) LU decomposition iff its leading principal minors are nonzero.
(i.e. iff det A[1 : p][1 : p] 6= 0 (p = 1 : n))

3.2 LDMT and LDLT Decomposition


• A LDMT decomposition of A ∈ Mn (K) is a triple (L, D, M ) of matrices L, D, M ∈ Mn (K)
s.t.
(i) L is unit lower triangular (i.e. Lii = 1)
(ii) D is diagonal
(ii) M is unit lower triangular
(iv) A = LDM T
 A ∈ GLn (K) admits a (unique) LDMT decomposition iff its leading principal minors are
nonzero.

• If (L, D, M ) is a LDMT decomposition of a symmetric A ∈ Mn (K) then L = M and one


calls this the LDLT decomposition of A.

3.3 QR Decomposition
• A QR decomposition of A ∈ Mn (K) is a pair (Q, R) of matrices Q, R ∈ Mn (K) s.t.
(i) Q is unitary
(ii) R is upper triangular with positive diagonal entries (Rii > 0)
(iii) A = QR
 Every A ∈ GLn (K) admits a (unique) QR decomposition.

• A generalized QR decomposition of A ∈ Mnl (K) is a pair (Q, R) of matrices Q ∈ Mn (K), R ∈


Mnl (K) s.t.
(i) Q is unitary
(ii) R is pseudo upper triangular (Rij = 0 whenever i > j)
(iii) A = QR
 Every A ∈ Mnl (K) admits a (nonunique) generalized QR decomposition.

3.4 Cholesky Decomposition


• A Cholesky decomposition of A ∈ Mn (K) is a matrix L ∈ Mn (K), s.t.
(i) L is lower triangular with positive diagonal entries (Lii > 0)
(ii) A = LLH
 A ∈ HP Dn admits a (unique) Cholesky decomposition. If A ∈ SP Dn then L ∈ Mn (R).

5
3.5 SVD - Singular Value Decomposition
• A SV decomposition of A ∈ Mnl (K) is a triple (U, D, V ) of matrices U ∈ Mn (K), D ∈ Mnl (K),
V ∈ Ml (K) s.t.
(i) U, V are unitary
(ii) D = diag (σ1 , . . . , σp ) with σ1 ≥ ... ≥ σp ≥ 0 and p = min(n, l)
(iii) A = U DV H
 Every A ∈ Mnl (K) admits a (nonunique) SV decomposition. The σ1 , . . . , σp are uniquely de-
termined by A (and thus equal in all SVD’s). In fact σi2 (A) = σi2 (AH ) = λi (AAH ) = λi (AH A)
for i = 1, . . . , p.

• If D = diag (σ1 , . . . , σn ), U = [u1 , . . . , un ], V = [v1 , . . . , vn ] then the σi are called singular


values of A, the ui , vi are called left- and right singular vectors, respectively.

• One denotes by |A| the unique positive square root of AH A.

3.6 Polar Decomposition


• A polar decomposition of A ∈ Mnl (K) is a pair (U, P ) of matrices U ∈ Mn (K), P ∈ Mnl (K)
s.t.
(i) U is unitary
(ii) P is positive
 Every A ∈ Mnl (K) admits a (nonunique) polar decomposition. The matrix P is uniquely
determined by A. In fact P = |A|. The matrix U is unique iff A is regular.

3.7 Block Diagonal Decomposition


• A block diagonal decomposition of A ∈ Mn (K) is a pair (P, B) of matrices P, B ∈ Mn
s.t.
(i) P is regular
(ii) B = diag (λ1 In1 + N1 , ..., λK InK + nK ) is block diagonal where the λ1 are the distinct
eigenvalues of A, Ni ∈ Mni are strictly upper triangular
(ii) A = P BP −1
 Every A ∈ Mn (C) admits a (nonunique) block diagonal decomposition.

3.8 Jordan Decomposition


• A Jordan box is a matrix Jm (λ) ∈ Mn (K) of the form:
 
λ 1
.
λ ..
 
 
 
Jm (λ) =  .. 
 . 1 
 
 λ 1
λ

• A Jordan decomposition of A ∈ Mn (K) is a pair (P, J) of matrices P, J ∈ Mn (K) s.t.


(i) P is regular
(ii) J is block diagonal J = diag (J1 , . . . , JK ) with Ji being Jordan boxes
(ii) P −1 AP = J
 Every A ∈ Mn (C) admits a (nonunique) Jordan decomposition. The basic numbers of the

6
Jordan blocks are eigenvalues of A. The number and dimensions of Jordan blocks correspond-
ing to an eigenvalue are unique, although P can be chosen s.t. they appear in any order.

3.9 Schur Decomposition


• A Schur decomposition of A ∈ Mn (K) is a pair (U, T ) of matrices U, T ∈ Mn (K) s.t.
(i) U is unitary
(ii) T is upper triangular with the eigenvalues of A as diagonal entries
(ii) A = U T U H
This means T = D + N with D = diag (λs(1) , . . . , λs(n) ) for some permutation s ∈ Sn and N
strictly upper triangular.

 Every A ∈ Mn (C) admits a (nonunique) Schur decomposition. U can be chosen so that λi


appear in any order in the diagonal of D.

• The column vectors in U are called Schur vectors.



 Let (U, T ) be a Schur decomposition of A ∈ Mn and U = u1 ... un a partitioning of
 the subspaces Sk = span {u1 , ..., uk } are
U in its Schur vectors, D = diag (λ1 , ..., λn ). Then
invariant under A. Further, if Uk = u1 ... uk then σ(UkH AUk ) = {λ1 , ..., λk }. Since the
eigenvalues of A can be arranged in any order in the diagonal of the Schur decomposition it
follows that for every k-dimensional subset S of eigenvalues there exists a k-dimensional invari-
ant subspace associated with the eigenvalues in S.

• kN kF is independent of the choice of U and called A’s departure from normality

3.10 Eigen Decomposition


• An eigen decomposition of A ∈ Mn (K) is a pair (P, D) of matrices P, D ∈ Mn (K) s.t.
(i) P is regular
(ii) D is diagonal
(iii) A = P DP −1
 A ∈ Mn (K) admits an (nonunique) eigen decomposition iff it is nondefective.

 If A ∈ Mn (K) is nondefective with eigenvalues λ1 , ..., λn ∈ K and corresponding (distinct)


eigenvectors x1 , ..., xn then for every permutation s ∈ Sn :
If one sets D = diag (λs(1) , ..., λs(n) ) and P = [xs(1) , . . . , xs(n) ] then A = P DP −1 .

7
4 Sensitivity of the Eigenvalue Problem, Perturbation Theory
The family of all eigenvalues of an A ∈ Mn can be regarded as unordered n-tuple of complex
numbers. All these tuples form the space Cnsym = Cn/∼ with (a1 , ..., an ) ∼ (b1 , ..., bn ) ⇔ ∃s ∈
Sn : bi = as(i) , i = 1, ..., n. It is a quotient space of Cn and has therefore an induced metric:
(1) d(M1 , M2 ) := mina∈M1 ,b∈M2 ka − bk∞ where M1 = [(a1 , ..., an )]/∼ , M2 = [(b1 , ..., bn )]/∼

One can easily verify that this is equal to the now defined
• spectral variation (=optimal matching distance) of A, B ∈ Mn :
(2) d(σ(A), σ(B)) := inf s∈Sn maxj |λj (A) − λs(j) (B)| (Sn is the group of permutations on {1, ..., n})

Another distance function for the spectra of two matrices is now introduced:
• For closed subsets A, B ⊂ C one defines s(A, B) = supa∈A dist (a, B) = supa∈A inf b∈B |a − b|.
Then one defines the Hausdorff distance of A and B as h(A, B) := max(s(A, B), s(B, A)).
One can think of the spectrum of a matrix as a subset of C. This defines the Hausdorff distance
of the spectra of two matrices.

 For arbitrary A, B ∈ Mn :
(3) h(σ(A), σ(B)) ≤ d(σ(A), σ(B))
Only for n = 2 are the two distances equal.

4.1 Continuity of the Eigenvalue Problem


The next theorem states that the eigenvalues of a matrix are continuous functions of its entries:
 For A ∈ Mn , λ ∈ σ(A) let d := dist (λ, σ(A) \ {λ}) be the distance of λ to the other eigen-
values. Then for every 0 < ρ < d there exists  > 0 s.t.
P
(4) kEk <  ⇒ µ∈σ(A+E)∩D mµ (A + E) = mλ (A)
where D = B(λ, ρ) = {z ∈ Cn : |z − λ| < ρ}, E ∈ Mn .

With the notation of spectral variation this can be formulated more elegantly:
 Let A, B ∈ Mn then ∀α > 0 ∃ > 0 s.t.
(5) kA − Bk <  ⇒ d(σ(A), σ(B)) < α

4.2 Spectral Variation of Nonnormal Matrices


In this section some important results for nonnormal matrices are presented. These results can
be greatly improved when one works with normal or Hermitian matrices.

The following theorem answers the question how good the diagonal entries of a matrix approx-
imate its eigenvalues:
 Gershgorin Theorems:
1. If A ∈ Mn then σ(A) ⊆ SR ∩ SC where
(6) SR = ni=1 Ri , Ri = {z ∈ C : |z − Aii | ≤ nj=1,j6=i |aij |}
S P
Sn Pn
(7) SC = j=1 Cj , Cj = {z ∈ C : |z − Ajj | ≤ i=1,i6=j |aij |}
The Ri areScalled row Gershgorin circles, the Ci column Gershgorin circles.
2. If S1 = m
Sn
R
i=1 i 2, S = i=m+1 i and S1 ∩ S2 = ∅ then S1 contains exactly m eigenvalues of
R
A

8
(counted with algebraic multiplicity).

The next theorems establish bounds for the variation of the spectra of two matrices:
 If A, B ∈ Mn then:
1/n
(8) h(σ(A), σ(B)) ≤ (kAk2 + kBk2 )1−1/n kA − Bk2
1/n
(9) d(σ(A), σ(B)) ≤ 4(kAk2 + kBk2 )1−1/n kA − Bk2

 Bauer-Fike Theorem: Assume A ∈ Mn is diagonalizable with X −1 AX = diag (λ1 , ..., λn ).


Then for any B ∈ Mn :
(10) s(σ(B), σ(A)) ≤ κp (X)kA − Bkp for any p-norm k · kp

 Let U H AU = T = D + N a Schur decomposition of A ∈ Mn with D diagonal and N strictly


upper triangular. If p is the smallest positive integer s.t. |N |p = 0 the for any B ∈ Mn :
(11) s(σ(B), σ(A)) ≤ max(θ, θ1/p ) where θ = kA − Bk2 p−1 k
P
k=0 kN k2

4.2.1 Eigenvalue Sensitivity


• If λ is a simple eigenvalue of A ∈ Mn then on defines the condition number of λ as:
(12) κ(λ) := |yH1 x|

The following proposition shows why this is justified:


 Assume A ∈ Mn is diagonalizable and has a simple eigenvalue λ with associated normalized
right and left eigenvectors x, y (i.e. Ax = λx, y H A = λy H , kxk2 = kyk2 = 1). Further, let
A() = A + E be a perturbation of A with kEk2 = 1. Then exist in a neighborhood of zero
two differentiable functions x() and λ() with:
(13) A()x() = λ()x()
(14) kx()k2 = 1, λ(0) = λ, x(0) = x
H
(15) ∂λ (0) = y HEx ≤ H1 = κ(λ)

∂ y x |y x|

This means roughly that order  perturbations of A lead to κ(λ) perturbations of λ. Thus, if
κ(λ) is small then λ is regarded as well-conditioned.

The condition of an eigenvalue is not affected by unitary similarity transformations:


 Let λ be an simple eigenvalue of A ∈ Mn with left- and right eigenvector x, y. For U ∈ Un
let à denote U H AU . Then x̃ = U H x and ỹ = U H y are the right- and left eigenvectors of Ã
corresponding to λ. If κ(λ) and κ̃(λ) denote the condition of λ in A and Ã, respectively, then:
(16) κ(λ) = |yH1 x| = |yH U1U H x| = κ̃(λ)

4.2.2 Eigenvector Sensitivity


• If A ∈ Mn has n distinct eigenvalues and x is the normalized eigenvector associated with
λ ∈ σ(A) then on defines the condition number of x as:
1
(17) κ(x) := dist (λ,σ(A)\{λ}) = 1/(minµ∈σ(A),µ6=λ |µ − λ|)

The following proposition shows why this is justified:


 Assume A ∈ Mn has n distinct eigenvalues λi with associated normalized right and left

9
eigenvectors xi , yi (i.e. Axi = λi xi , yiH A = λi yiH , kxi k2 = kyi k2 = 1, i = 1, ..., n). Further, let
A() = A + E be a perturbation of A with kEk2 = 1. Then exist in a neighborhood of zero
differentiable functions xi (), yi () and λi () with:
(18) A()xi () = λi ()xi ()
(19) yi ()H A() = λi ()yi ()H
(20) kxi ()k2 = kyi ()k2 = 1, λi (0) = λi , xi (0) = xi , yi (0) = yi
(21) kxk () − xk k2 ≤ /(minj6=k |λk − λj |)kEk2 + O(2 ) = κ(xk )kEk2  + O(2 )

This means that the sensitivity of xk depends upon the separation of λk from the other eigen-
values.

4.2.3 Invariant Subspace Sensitivity


Suppose
 
T11 T12
(22) U H AU = , T11 ∈ Mp , T22 ∈ Mn−p
0 T22
is a Schur decomposition of A ∈ Mn and U = (U1 U2 ) with U1 ∈ Mn,p , U2 ∈ Mn,n−p .
It is known that span (U1 ) is an invariant subspace of A. Its sensitivity should depend upon
the separation of σ(T11 ) and σ(T22 ). The appropriate measure is defined now:

• Under the assumptions above one defines the separation of T11 and T22 as:
kT11 X−XT22 kF
(23) sep (T11 , T22 ) = minx6=0 kXkF

 From [1]: Suppose that (22) holds and that for any matrix E ∈ Mn we partition U H EU as
follows:  
E11 E12
(24) U H EU = , E11 ∈ Mp , E22 ∈ Mn−p
E21 E22
If δ = sep (T11 , T22 ) − kE11 k2 − kE2 2k2 > 0 and
(25) kE21 k2 (kT12 k2 + kE21 k2 ) ≤ δ 2 /4
then there exists a P ∈ Mn−p,p s.t.
(26) kP k2 ≤ 2kE21 k2 /δ
and the columns of Û = (U1 + U2 P )(Ip + P H P )−1/2 form an orthonormal basis for a subspace
that is invariant for A + E.

4.3 Normal Matrices


 Let A ∈ Mn be normal and B ∈ Mn arbitrary. Then:
(27) s(σ(A), σ(B)) ≤ kA − Bk2
Thus, if A and B are normal then:
(28) h(σ(A), σ(B)) ≤ kA − Bk2
If n = 2 then:
(29) d(σ(A), σ(B)) ≤ kA − Bk2

 Hoffman-Wielandt Theorem: Let A, B ∈ Mn be normal. Then:


Pn 1/2 Pn 1/2
(30) mins∈Sn i=1 |λi (A) − λs(i) (B)| ≤ kA − BkF ≤ maxs∈Sn i=1 |λi (A) − λs(i) (B)|

10
1
 Let A ∈ Mn be normal and B ∈ Mn . If kA − Bk2 ≤ 2 minλ,µ∈σ(A),λ6=µ |λ − µ|, Then:
(31) d(σ(A), σ(B)) ≤ kA − Bk2

4.4 Hermitian and Skew-Hermitian Matrices


Let recall and invite some notations: A Hermitian matrix A ∈ Mn has real eigenvalues that
can be ordered as λ1↓ (A) ≥ . . . ≥ λn↓ (A) and λ1↑ (A) ≤ . . . ≤ λn↑ (A). One uses the denota-
tion Eig ↓ (A) = diag (λ1↓ (A), . . . , λn↓ (A)) and Eig ↑ (A) = diag (λ1↑ (A), . . . , λn↑ (A)). Since the
eigenvalues of skew - Hermitian matrices cannot be ordered (in a natural way) one introduces
the denotation Eig |↓| (A) and Eig |↑| (A) for any diagonal matrix of the eigenvalues of A that
are ordered with decreasing (increasing) modulus. I.e. Eig |↓| (A) = diag (µ1 , . . . , µn ) with
|µ1 | ≥ . . . ≥ |µn | and Eig |↑| (A) = diag (η1 , . . . , ηn ) with |η1 | ≤ . . . ≤ |ηn | and the µi and ηi are
the eigenvalues of A.

For Hermitian matrices one can formulate some strong perturbation theorems. But let us
first consider a characterization of the eigenvalues of a Hermitian matrix. One therefore intro-
duces Rayleigh quotients: • For A ∈ Mn one defines the Rayleigh quotient (function) as
H
RA : Cn \ {0} → C s.t. v 7→ RA (v) = vvHAv v
.
 If A ∈ Mn is Hermitian then RA (v) is real and RA (αv) = RA (v) for α ∈ C \ {0}.

Now one can give the characterizations of the eigenvalues of a Hermitian matrix:
 Let A ∈ Mn be Hermitian with eigenvalues λ1 ≥ . . . ≥ λn and associated eigenvectors xi
that form an orthonormal basis of Cn . With the notation Vk := span {x1 , . . . , xk }, V0 = {0}
and
Vk := {V ⊂sub Cn | dim V = k}, V0 = {V0 } one has:
(32) λk = RA (xk )
(33) λk = minv∈Vk ,v6=0 RA (v) = minv∈Vk ,kvk2 =1 v H Av
(34) λk = maxv⊥Vk−1 ,v6=0 RA (v)
(35) λk = maxW ∈Vk maxv∈W,v6=0 RA (v) = maxW ∈Vk maxv∈w,kvk2 =1 v H Av
(36) λk = minW ∈Vk−1 maxv⊥W,v6=0 RA (v) = minW ∈Vn−k+1 maxv∈W,kvk2 =1 v H Av
Equation (35) and (36) are also called the Minimax Principal. Furthermore:
(37) RA (Cn \ {0}) = {RA (v)|v ∈ Cn , v 6= 0} = [λ1 , λn ] ⊂ R

 Let A, B ∈ Mn be Hermitian. Then:


(38) λj↓ (A + B) ≤ λi↓ (A) + λj−i+1

(B) for i ≤ j
↓ ↓ ↓
(39) λj (A + B) ≥ λi (A) + λj−i+n (B) for i ≥ j
(40) λj↓ (A) + λn↓ (B) ≤ λj↓ (A + B) ≤ λj↓ (A) + λ1↓ (B) for j = 1, . . . , n

 Weyl’s Monotonicity Theorem: Let A ∈ Mn be Hermitian and H ∈ Mn be positive.


Then:
(41) λj↓ (A + H) ≥ λj↓ (A) for j = 1, . . . , n

 Weyl’s Perturbation Theorem: Let A, B ∈ Mn be Hermitian. Then:


(42) d(σ(A), σ(B)) = maxj |λj↓ (A) − λj↓ (B)| ≤ kA − Bk2 ≤ maxj |λj↓ (A) − λj↑ (B)|
This can be generalized: For Hermitian A, B ∈ Mn and any unitarily invariant norm k| · k|:
(43) k|Eig ↓ (A) − Eig ↓ (B)k| ≤ kA − Bk2

11
 Lidskii’s Theorem Let A, B ∈ Mn be Hermitian. Then:
Pk ↓ Pk ↓ Pk ↓
(44) j=1 λij (A + B) ≤ j=1 λij (A) + j=1 λij (B) for 1 ≤ i1 < . . . < ik ≤ n

 Let A ∈ Mn be Hermitian and B ∈ Mn skew-Hermitian. Let their eigenvalues λi (A) and


λi (B) be arranged as |λi (A)| ≥ . . . ≥ |λn (A)| and |λi (B)| ≥ . . . ≥ |λi (B)|. Then:
(45) d(σ(A), σ(B)) ≤ maxj |λj (A) − λn−j+1 (B)| ≤ kA − Bk2

12
 Let A ∈ Mn be Hermitian and B ∈ Mn skew-Hermitian.
Then for 2 ≤ p ≤ ∞ inequality (46) and (47) hold while for 1 ≤ p ≤ 2 inequality (48) and
(49) hold:
1
−1
(46) kEig |↓| (A) − Eig |↑| (B)k<p> ≤ kA − Bk<p> ≤ 2 2 p kEig |↓| (A) − Eig |↓| (B)k<p>
(47) kEig |↓| (A) − Eig |↑| (B)k<p> ≤ kEig(A) − Eigs (B)k<p> ≤ kEig |↓| (A) − Eig |↓| (B)k<p> s ∈ Sn
1
− p1 |↓| |↓| |↓| |↑|
(48) 22 kEig (A) − Eig (B)k<p> ≤ kA − Bk<p> ≤ kEig (A) − Eig (B)k<p>
(49) kEig (A) − Eig |↓| (B)k<p> ≤ kEig(A) − Eigs (B)k<p> ≤ kEig |↓| (A) − Eig |↑| (B)k<p>
|↓|
s ∈ Sn
Further: √
|↓|
(50) 1
2 k|Eig (A) − Eig |↓| (B)k| ≤ k|A − Bk| ≤ 2k|Eig |↓| (A) − Eig |↓| (B)k|

 To sum up: If A, B ∈ Mn and one one the following holds:


(i) A, B Hermitian
(ii) A Hermitian, B skew-Hermitian
(iii) A, B scalar multiples of unitary matrices
(iv) A, B normal and n = 2
then:
(51) h(σ(A), σ(B)) ≤ d(σ(A), σ(B)) ≤ kA − Bk2

4.5 Residuals
• Let (λ̂, x̂) be the estimation for an eigenvalue/eigenvector pair of A ∈ Mn . Then one defines
its residual as:
(52) r̂ = Ax̂ − λ̂x̂

The next two propositions give a posteriori estimates for eigenvalues and eigenvectors of a
Hermitian matrix:
 Let A ∈ Mn be Hermitian and let r̂ be the residual of the estimated eigenvector/eigenvalue
pair (λ̂, x̂). Then:
kr̂k2
(53) minµ∈σ(A) |λ̂ − µ| ≤ kx̂k2

 Let A ∈ Mn be Hermitian and let r̂ be the residual of the estimated eigenvector/eigenvalue


pair (λ̂, x̂). Suppose that |λi (A) − λ̂| ≤ kr̂k2 for i = 1, . . . , m and |λi (A) − λ̂| ≥ δ > 0 for
i = m + 1, . . . , n. Then:
kr̂k2
(54) dist 2 (x̂, Um ) = inf u∈Um kx̂ − uk2 ≤ δ
where Um = span {x1 , . . . , xm } and xi are the eigenvectors of A associated with λi (A).

For Nonhermitian matrices one has:


 Let A ∈ Mn be diagonalizable with X −1 AX = diag (λ1 , . . . , λn ). If kr̂k2 ≤ kx̂k2 for some
 > 0 then:
(55) minλ∈σ(A) |λ̂ − λ| ≤ κ2 (X) = kXk2 kX −1 k2

13
5 Algorithms
Algorithms are presented in a pseudo java code. The assignment of objects (like vectors) should
not be understand as the assignment of references. The following example makes that clear:

Vector x = (0, 2, 3)T // x = (0, 2, 3)T


Vector y = x // y = (0, 2, 3)T
y + = (1, 1, 1)T // y = (1, 3, 4)T , x = (0, 2, 3)T

Complexity of algorithms is noted as ”complexity ≤ [ a ; b ; c ; d ]R ” which means that the


computation needs less than or equal to a additions, b multiplications, c divisions and d roots
(of real numbers).

5.1 Householder Transformations


• Define Hn : Cn → Mn (C) as:
(
In − vH2 v vv H if v 6= 0
(1) Hn (v) :=
In if v = 0
and let Housen = =Hn = Hn (Cn ) be the set of matrices that one then calls Householder
matrices (also Householder reflection or transformation).

 Householder matrices are unitary and Hermitian.


 Hn (v) describes for v 6= 0 a reflection in the hyperplane {v}⊥ .

The Householder Transformation can be used to zero all but one entries of a vector:
 Assume x = (x1 , ..., xn )T ∈ Cn and 1 ≤ i ≤ n are fixed. Set:
(2) v = (x1 , ..., xi−1 , (1 ± kxk 2 T xi
|xi | )xi , xi+1 , ..., xn ) = x ± kxk2 |xi | ei if xi 6= 0
(3) v = (x1 , ..., xi−1 , ±kxk2 c, xi+1 , ..., xn )T = x + kxk2 cei for some c ∈ T if xi = 0
Then Hn (v)x = ∓kxk2 |xxii | ei and Hn (v)x = −kxk2 cei , respectively.
Proof: see appendix

• Define h : Cn → Cn as:

0
 if x2 = ... = xn = 0
(4) h(x) := x + kxk2 e1 if x1 = 0, x2 ·. . .·xn 6= 0
x + kxk2 |xx11 | e1

if x1 6= 0, x2 ·. . .·xn 6= 0

The Vector h(x) will be called Householder vector for x. Then:



x
 if x2 = ... = xn = 0
(5) Hn (h(x))x := −kxk2 e1 if x1 = 0, x2 ·. . .·xn 6= 0
x1

−kxk2 |x1 | e1 if x1 6= 0, x2 ·. . .·xn 6= 0

If x is the i − th column vector of X then B = Hn (h(x))X will have zero entries Bi,2 = ... =
Bi,n = 0. One uses this to zero some entries of a matrix. First some important algorithms are
presented.

14
One uses the special structure of Hn (v) to compute the Householder pre- and post-multiplication
Hn (v)A and AHn (v) (A ∈ Mnl , A ∈ Mln respectively). The complexity is then ≤ [ 8nl +
2n ; 8nl+2n+2l ; 1 ]R . An ordinary matrix multiplication would have complexity [ 4n2 l ; 4n2 l ]R .

Listing 1: Householder Vector

r e q u i r e s : x ∈ Cn
r e t u r n s : h(x)
c o m p l e x i t y ≤ [ 2n + 1 ; 2n + 4 ; 1 ; 1 ]R

Vector getHouseVec ( Vector x ) {


Vector h = x ;
i f { x[2 : n] == 0 ) // I f x2 = ... = xn = 0 t h e n r e t u r n z e r o v e c t o r
return 0 ;
i f ( x1 6= 0 ) {
h1 ∗ = (1 + kxk2 /|x1 |) ;
return h ;
}
h1 + = kxk2 ;
return h ;
}

Listing 2: Householder Pre-Multiplication

r e q u i r e s : A ∈ Mnl , v ∈ Cn
r e t u r n s : Hn (v)H A (= Hn (v)A)
c o m p l e x i t y ≤ [ 8nl + 2n ; 8nl + 2n + 2l ; 1 ]R

Matrix getTimesHouseMatPre ( Matrix A , Vector v ) {


double norm2Sq = kvk22 ; // compute v12 + ... + vn2
i f ( norm2Sq == 0 ) // i f v = 0 t h e n Hn (v)A = In A = A
return A ;
double b = −2./norm2Sq ;
Vector w = bAH v ;
return A + vwH ;
}

Listing 3: Householder Post-Multiplication

r e q u i r e s : A ∈ Mln , v ∈ Cn
r e t u r n s : AHn (v)
c o m p l e x i t y ≤ [ 8nl + 2n ; 8nl + 2n + 2l ; 1 ]R

Matrix getTimesHouseMatPost ( Matrix A , Vector v ) {


double norm2Sq = kvk22 ; // compute v12 + ... + vn2
i f ( norm2Sq == 0 ) // i f v = 0 t h e n AHn (v) = AIn = A
return A ;
double b = −2./norm2Sq ;
Vector w = bAv ;
return A + wv H ;
}

15
Consider a matrix A ∈ Mnl (C). One can zero the elements Ai1 ,j , ..., Ai2 ,j for 1 < i1 ≤ i2 ≤ n.
To do so, compute the Householder vector v for A[i1 : i2 ][j]. Complete this vector with zero’s
to obtain a fitting vector h of length n: h = ( 0| ... T T
{z 0} v 0| ...
{z 0}) . One can easily verify that:
i1 −1 n−i2
 
Ii1 −1
(6) Hn (h) =  Hi2 −i1 +1 (v) 
In−i2
Therefore the matrix B = Hn (h)A has zero entries Bi1 +1,j = ... = Bi2 ,j = 0.

Due to roundoff errors, it could happen that the Bi1 +1,j ...Bi2 ,j are only very small. The next
algorithm zeros the entries (i1 + 1, j)...(i2 , j) of a given matrix by pre-multiplying it with the
Householder matrix. The entries are explicitly zeroed. Vector h returns the Householder vector
of the performed transformation for later usage.

Listing 4: Matrix Entry Zeroing

r e q u i r e s : A ∈ Mnl , 1 ≤ j ≤ l, 1 ≤ i1 < i2 ≤ n
r e t u r n s : [Hn (h)A, h] where h i s c h o s e n s . t . (Hn (h)A)[i1 + 1 : i2 ][j] = 0
c o m p l e x i t y ≤ [ 8st + 4s + 1 ; 8st + 4s + 2t + 4 ; 1 ; 1 ]R where s = i2 − i1 + 1, t = n − j + 1

[ Matrix , Vector ] applyHouse ( Matrix A , int j , int i1 , int i2 ) {


Vector v =getHouse ( A[i1 : i2 ][j] ) ;
Matrix Sub = A[i1 : i2 ][j : n] ;
Sub =getTimesHouseMatPre ( Sub , v ) ;
A[i1 : i2 ][j] = Sub;
A[i1 + 1 : i2 ][j] = 0; // s e t t h e s e e n t r i e s t o z e r o
i1 −1
z }| {
h = ( 0 ... 0 v T |0 ...
{z 0})
T

n−i2
return [A; h] ;
}

16
5.2 Givens Transformations
• For c, s ∈ C with |c|2 + |s|2 = 1 the n-n matrix Gn (p, q, c, s) is called Givens Matrix:
 
Iq−1
 c s  ←q
 
(7) Gn (p, q, c, s) = 
 Ip−q−1 
 (1 ≤ q < p ≤ n)
 -s c  ←p
In−p
↑ ↑
q p

 Givens matrices are unitary.

The Givens Transformation can be used to zero one specific entry of a vector:
 
a
 Assume ∈ C2 and set c = √ 2a 2 , s = √ −b . Then:
b |a| +|b| |a|2 +|b|2
 H    
c s a •
(8) =
−s c b ·

The following algorithm computes the Givens numbers:


Listing 5: Givens Numbers

r e q u i r e s : a, b ∈ C
 H    
c s a •
r e t u r n s : [c, s] s . t . =
−s c b ·
c o m p l e x i t y ≤ [ 3 ; 8 ; 1 ; 1 ]R

[ C, C ] getGivensNumbers (C a , C b ) {
i f ( b == 0 )
return [1; 0] ;
p
double f = 1./ |a|2 + |b|2 ;
return [f a; −f b] ;
}

Consider the product B = Gn (p, q, c, s)H A where A ∈ Mnl . Then only the p − th and q − th
row of A changes, i.e.

sAqj + cApj if i = p

(9) Bij = cAqj − sApj if i = q

Aij else

Similar, in the product AGn (p, q, c, s) only the p − th and q − th column changes. One uses
this to compute the Givens pre- and post-multiplication:

17
Listing 6: Givens Pre-Multiplication

r e q u i r e s : A ∈ Mnl ; c, s ∈ C; 1 ≤ q < p ≤ n
r e t u r n s : Gn (p, q, c, s)H A
c o m p l e x i t y ≤ [ 12l ; 16l ]R

Matrix getTimesGivensMatPre ( Matrix A , int p , int q , C c , C s ) {


Matrix A(p) = A[p][] ; // Row p o f A
Matrix A(q) = A[q][] ; // Row q o f A
Matrix B = sA(q) ;
A(q) = cA(q) − sA(p) ;
A(p) = cA(p) + B ;
Matrix R = A ;
R[p][] = A(p) ;
R[q][] = A(q) ;
return R ;
}

Listing 7: Givens Post-Multiplication

r e q u i r e s : A ∈ Mnl ; c, s ∈ C; 1 ≤ q < p ≤ n
r e t u r n s : AGn (p, q, c, s)
c o m p l e x i t y ≤ [ 12l ; 16l ]R

Matrix getTimesGivensMatPost ( Matrix A , int p , int q , C c , C s ) {


Matrix A(p) = A[][p] ; // Column p o f A
Matrix A(q) = A[][q] ; // Column q o f A
Matrix B = sA(p) ;
A(p) = cA(p) − sA(q) ;
A(q) = cA(q) + B ;
Matrix R = A ;
R[][p] = A(p) ;
R[][q] = A(q) ;
return R ;
}

Because of roundoff errors we need another method to zero a specific entry of a matrix:

Listing 8: Matrix Entry Zeroing

r e q u i r e s : A ∈ Mnl ; 1 ≤ q < p ≤ n
r e t u r n s : [Gn (p, q, c, s)H A, c, s] where [c, s] =getGivensNumbers ( Aqq , Apq )
c o m p l e x i t y ≤ [ 12l + 3 ; 16l + 8 ; 1 ; 1 ]R

[ Matrix , C , C ] applyGive ns ( Matrix A , int p , int q ) {


C c = 0;
C s = 0;
[ c , s ] = getGivensNumbers ( Aqq , Apq ) ;
Matrix R = getTimesGivensMatPre ( A, p, q, c, s ) ;
Rpq = 0 ;
return [R, c, s] ;
}

18
5.3 QR Decomposition
One uses the Householder transformation to compute the (generalized) QR decomposition of
a matrix:

Listing 9: QR Decomposition

r e q u i r e s : A ∈ Mnl
r e t u r n s : [U, R] s . t . (U, R) i s t h e ( g e n e r a l i z e d ) QR d e c o m p o s i t i o n o f A
c o m p l e x i t y ≤ [ 11n3 + 3n ; 11n3 + 3n2 + 5n ; 2n − 2 ; n − 1 ]R f o r n = l

[ Matrix , Matrix ] getQR ( Matrix A ) {


Matrix R = A ;
Matrix U = In ;
Vector h = 0 ;
int nStep = min(n − 1, l) ;
f or ( int k = 1; k ≤ nStep; k++ ) {
[R, h] = applyHouse (R , k , k , n − 1 ) ;
U = getTimesHouseMatPost ( U, h ) ;
}
return [U, R] ;
}

5.4 Hessenberg Matrices


5.4.1 Definition and Properties
• A matrix A ∈ Mn is called Hessenberg, if Aij = 0 whenever i ≥ j + 2.
 
• • • • •
• • • • •
 
 · • • • •
It has the form:  
 · · • • •
· · · • •
 Every upper triangular matrix is Hessenberg.

5.4.2 Hessenberg Reduction


Every matrix is unitarily similar to a Hessenberg matrix:
 Hessenberg Reduction: For every A ∈ Mn there exists a unitary U s.t.
U H AU is Hessenberg. If A ∈ Mn (R) one may take U real, orthogonal.

Listing 10: Hessenberg Reduction


R e q u i r e s : A ∈ Mnl
Returns : [U, H] s . t . U i s u n i t a r y , H i s a H e s s e n b e r g matrix and H = U H AU
c o m p l e x i t y ≤ [ 19n3 − 25n2 ; 19n3 − 25n2 ; 3n − 6 ; n − 2 ]R

[ Matrix , Matrix ] g e t H e s s e n b e r g ( Matrix A ) {


Matrix H = A ;
Matrix U = In ;
Vector h = 0 ;
f or ( int j = 1; j ≤ n − 2; j++ ) {
[ H, h ] = applyHouse ( H, j, j + 1, n − 1 ) ;

19
H = getTimesHouseMatPost ( H, h ) ;
U = getTimesHouseMatPost ( U, h ) ;
}
return [H, U ] ;
}

5.4.3 QR Decomposition of Hessenberg Mastrices


The importance of Hessenberg matrices is due to the fast computability of their QR decompo-
sition. Instead of complexity [ 19n3 ; 19n3 ; 3n ; n ]R it needs only [ 24n2 ; 32n2 ; n ; n ]R :
Listing 11: QR Decomposition of Hessenberg Matrices

R e q u i r e s : H ∈ Mnl i n H e s s e n b e r g form
Returns : [U, R] s . t . (U, R) i s t h e ( g e n e r a l i z e d ) QR d e c o m p o s i t i o n o f H
c o m p l e x i t y ≤ [ 24n2 − 21n ; 32n2 − 24n ; n − 1 ; n − 1 ]R f o r n = l

[ Matrix , Matrix ] getQR Hess ( Matrix H ) {


Matrix R = H ;
Matrix U = In ;
C c, s = 0 ;
int nStep = min(n − 1, l) ;
f or ( int k = 1; k ≤ nStep; k++ ) {
[R, c, s] = applyGive ns ( R, k + 1, k ) ;
getTimesGivensMatPost ( U, k + 1, k, c, s ) ;
}
return [U, R] ;
}

5.5 The Power Methods


5.5.1 The Simple Power Method
k (0)
The simple power Method constructs for A ∈ Mn (C), q (0) ∈ Cn the sequence q (k) = kAAk qq(0) k .
2
This can be used to compute the dominant eigenvalue of a matrix: Suppose A ∈ Mn (C) is
diagonalizable with X −1 AX = diag (λ1 , . . . , λn ), X = [x1 , . . . , xn ] and that A has a dominant
eigenvalue, i.e. |λ1 | > |λ2 | ≥ . . . ≥ |λn |. The xi span Cn and so one can write every q (0) ∈ Cn
as q (0) = a1 x1 + . . . + an xn . If a1 6= 0 then:
 
Pn aj  λj k
(10) A q = a1 λ1 x1 + j=2 a1 λ1 xj = a1 λk1 (x1 + s(k) )
k k

 k
k (0) a λk (x +s(k) ) +s(k)
(11) q (k) = kAAk qq(0) k = ka 1λk1(x 1+s(k) )k = |aa11 | |λλ11 | kxx1+s (k) k
2 1 1 1 2 1 2

with s(k) → 0 for k → ∞. This means that q (k) tends to lie in span {x1 }. Further:
(12) kAx(k) k2 → rσ (A) (k → ∞)
H (k)
(13) q(k) Aq → λ1 (k → ∞)

The results above are summarized in the following theorem:


 Suppose A ∈ Mn (C) is diagonalizable with X −1 AX = diag (λ1 , . . . , λn ), X = [x1 , . . . , xn ]
and that A has a dominant eigenvalue, i.e. |λ1 | > |λ2 | ≥ . . . ≥ |λn |. Let further q (0) ∈ Cn
be an arbitrary vector. q (0) can be represented as q (0) = a + b with a ∈ span {x1 } and
b ∈ span {x2 , . . . , xn }. If a 6= 0 then construct the sequences:

20
Aq (k−1)
(14) q (k) = kAq (k−1) k2
(15) λ(k) = (q (k) )H Aq (k)
Then:  
k
(16) dist (span {x1 )}) = O λλ21
{q (k) }, span
 
k
(17) |λ1 − λ | = O λλ12
(k)

Thus λ(k) → λ1 (k → ∞).

It follows, that this method is well suited when |λ2 |/|λ1 | is small. A possible stopping criterium
is to monitor the differences between λ(k) and λ(k−1) and to stop when |λ(k) − λ(k−1) | is small.
Another stopping criterium uses the residual r(k) = Aq (k) − λ(k) q (k) :
kr(k) k2
(18) |λ1 − λ(k) | ≈ H
|w(k) q (k) |
(AH )k w(0)
where w(k) are approximate left eigenvectors w(k) = k(AH )k w(0) k2
. The necessity to compute
w(k) nearly doubles the costs for the power method.

An algorithm that uses the first stopping criterium is given in the following listing:

Listing 12: Power Method

R e q u i r e s : A ∈ Mnl (C), q (0) ∈ Cn


Returns : [λ(k) , q (k) ] t h a t were c o n s t r u c t e d a s i n d i c a t e d above
c o m p l e x i t y ≤ [ 4n2 + 4n ; 4n2 + 6n ; 2n ; 1 ]R p e r i t e r a t i o n

[ C , Vector ] powerMethod ( Matrix A , Vector q (0) , double acc ) {


Vector q1 = q (0) ;
Vector q2 = 0 ;
C λ1 = 0 ;
C λ2 = 0 ;
f or ( int k = 0; k < 1000; k++ ) {
q2 = Aq1
λ2 = q1H q2 ;
q1 = q2 /kq2 k2 ;
i f ( |λ1 − λ2 | < acc )
break ;
λ1 = λ2 ;
}
return [ λ2 , q2 ] ;
}

5.5.2 The Inverse Power Method


The inverse power method is a modification of the simple power method that finds the eigen-
value nearest to a given value. If λi are the eigenvalues of A ∈ Mn then λi −µ are the eigenvalues
of A − µIn and λi1−µ are the eigenvalues of (A − µIn )−1 . Thus, the closer µ is to one λi , the
more dominant is the largest eigenvalue of (A − µIn )−1 . The inverse power method uses that
and applies the standard power method to (A − µIn )−1 .

If the simple power method returns the approximate eigenvalue/eigenvector pair (λ, q) for
(A − µIn )−1 then ( λ1 + µ, q) is an approximate eigenvalue/eigenvector pair for A.

21
22
Listing 13: Inverse Power Method

R e q u i r e s : A ∈ Mnl (C), q (0) ∈ Cn


Returns : [ λ1 , q] t h a t were c o n s t r u c t e d a s i n d i c a t e d above
c o m p l e x i t y ≤ [ 4n2 + 4n ; 4n2 + 6n ; 2n ; 1 ]R p e r i t e r a t i o n
(+ c o m p l e x i t y o f one matrix i n v e r s i o n )

[ C , Vector ] powerMethod Inv ( Matrix A , Vector q (0) , C µ , double acc ) {


Matrix B = (A − µIn )−1
C λ = 0;
Vector q = 0 ;
[λ, q] = powerMethod ( B, q (0) , acc ) ;
return [ λ1 + µ; q] ;
}

5.6 The QR Method


5.6.1 Definition and Properties
The simple QR method constructs for A(0) ∈ Mn (C) a sequence of unitarily similar matrices
via successive QR decompositions. The following proposition shows how this can be used to
compute all eigenvalues:
 Suppose A ∈ Mn (C) is diagonalizable with P −1 AP = diag (λ1 , . . . , λn ) and that all eigen-
values have distinct moduli, i.e. |λ1 | > . . . > |λn |. Let further A admit an LU decomposition.
Construct the sequence
A(k) = R(k−1) Q(k−1) where A(k−1) = Q(k−1) R(k−1) is the QR decomposition of A(k−1)
Then:
(k)
Aii → λi
(k)
Aij → 0 if i > j
That means thelower part of A(k) converges to diag (λ1 , . . . , λn ). The convergence rate is:
k 
(k)
|Ai,i−1 | = O λλi−1
i

An algorithm is given in the following listing:

Listing 14: Simple QR Method

R e q u i r e s : A ∈ Mn
Returns : A(k) t h a t was c o n s t r u c t e d a s i n d i c a t e d above
c o m p l e x i t y ≤ [ 15n3 + 3n ; 15n3 + 3n2 + 5n ; 2n − 2 ; n − 1 ]R p e r i t e r a t i o n

Matrix qr ( Matrix A ) {
Matrix B = A ;
Matrix U, R = 0 ;
f or ( int k = 0; k < 1000; k++ ) {
[U, R] =getQR ( B ) ;
B = RU ;
}
return B ;
}

23
5.6.2 QR Method with Hessenberg Reduction
The QR method can be improved when one first reduces A to Hessenberg form. Then one can
compute the QR decomposition much faster. An algorithm is given in the following listing:

Listing 15: Simple QR Method with Hessenberg Reduction

R e q u i r e s : A ∈ Mn
Returns : A(k) t h a t was c o n s t r u c t e d a s i n d i c a t e d above
c o m p l e x i t y ≤ [ 24n2 ; 32n2 ; n − 1 ; n − 1 ]R p e r i t e r a t i o n

Matrix q r H e s s ( Matrix A ) {
Matrix U0 , U, R, H = 0 ;
[H, U0 ] = g e t H e s s e n b e r g ( A ) ;
f or ( int k = 0; k < 100; k++ ) {
// f a s t QR s t e p :
for ( int j = 1; j < n; j++ ) {
applyGivens ( H, j + 1, j, c, s ) ;
timesGivensMatPost ( H, j + 1, j, c, s ) ;
}
}
return H ;
}

5.6.3 QR Method with Shifts and Decoupling


A QR step with shift ρ ∈ C has the form: (H in Hessenberg form)

H (k) − ρI = U R // QR f a c t o r i z a t i o n
H (k+1) = RU + ρI

Then H (k+1) is still orthogonally similar to H (k) since H (k+1) = RU + ρI = U H (U R + ρI)U =


U H H (k) U . The shifted QR steps fasten the algorithm when ρ is near to an eigenvalue of H. If
0 0
one orders the  of H in such a way that |λ1 − ρ| ≥ . . . ≥ |λh − ρ| then one sees that
 eigenvalues
λ0 −ρ k

(k) (k)
|Hi,i−1 | = O λ0 i −ρ . Thus, if ρ ' λ0i then Hi,i−1 tends to zero very fast.
i−1

(k) (k) (k) (k)


It is useful to monitor the entries Ai,i+1 . If |Ai+1,i | ≤ (|Ai,i | + |Ai+1,i+1 |) for some small 
(k)
than Ai,i+1 is set to zero. Then one decouples the matrix A(k) into two submatrices that are
considered independently. The spectrum of the original matrix is the union of the spectra of
the decoupled submatrices.
(k)
There are several strategies how to choose the shift. One can use always the entry Ann as shift.
(k)
This is called the single shift strategy. One hopes that An,n−1 tends to zero. Then one would
(k)
decouple A(k) into the matrices A(k) [1 : n − 1][1 : n − 1] and A(k) [n : n][n : n] = Ann . This is
(k)
called deflating. The value Ann would be an approximate eigenvalue of A. This method works
only as long as there is only one eigenvalue of minimal modulus.
(k)
Another method uses the smallest subdiagonal entry to find a shift. If Ap+1,p is the smallest
(k)
subdiagonal entry then one uses the shift Ap+1,p+1 . The reason is that if Ap+1,p is small then
(k)
A(k) almost decouples at position p and one wants to fasten the convergence of Ap+1,p to zero.

24
A Proofs
Proof: of Theorem 3 in Section 5.1 (for convenience: k.k = k.k2 )
xi 6= 0: v H v = kvk2 = |x1 |2 + ...|(1 ± kxk 2 2
|xi | )xi | ... + |xn |
2

= |x1 |2 + ...(|xi | ± kxk)2 ... + |xn |2


= |x1 |2 + ... + |xn |2 ± 2kxk|xi | + kxk2 = 2(kxk2 ± |xi |)
P (v)x = x − kxk21+|xi | (x ± kxk |xxii | ei )(xH ± kxk |xxii | eTi )x
= x − kxk21+|xi | (x ± kxk |xxii | ei )(kxk2 ± kxk |xxii | xi )
2
kxk +|xi | xi
= x − kxk 2 +|x | (x ± kxk |x | ei )
i i
= ∓kxk |xxii | ei
xi = 0: v H v = |x1 |2 + ...| ± kxkc|2 ... + |xn |2 = 2kxk2
1 H + kxkceT )x
P (v)x = x − kxk 2 (x + kxkcei )(x i
1 2
= x − kxk 2 (x + kxkcei )(kxk + kxkcxi )
kxk2
=x− kxk2
(x + kxkcei ) = −kxk2 cei

B Bibliographical Reference
[1] Gene H. Golub, Charles F. Van Loan: ”Matrix Computations”, 2nd Edition, 1989 The
John Hopkins University Press

[2] Denis Serre: ”Matrices. Theory and Applications”, Graduate Texts in Mathematics, 2002
Springer Verlag New York

[3] Philippe G. Ciarlet: ”Introduction to numerical linear algebra and optimisation”, 1989
Cambridge University Press

[4] Alfio Quarteroni, Riccardo Sacco, Fausto Saleri: ”Numerical Mathematics”, Texts in
Applied Mathematics, 2000 Springer Verlag New York

[5] Rajendra Bhatia: ”Matrix Analysis”, Graduate Texts in Mathematics, 1997 Springer
Verlag New York

25

You might also like