This action might not be possible to undo. Are you sure you want to continue?

David S. Watkins

Abstract

John Francis’s implicitly shifted QR algorithm turned the problem of

matrix eigenvalue computation from diﬃcult to routine almost overnight

about ﬁfty years ago. It was named one of the top ten algorithms of the

twentieth century by Dongarra and Sullivan, and it deserves to be more

widely known and understood by the general mathematical community.

This article provides an eﬃcient introduction to Francis’s algorithm that

follows a novel path. Eﬃciency is gained by omitting the traditional

but wholly unnecessary detour through the basic QR algorithm. A brief

history of the algorithm is also included. It was not a one-man show; some

other important names are Rutishauser, Wilkinson, and Kublanovskaya.

Francis was never a specialist in matrix computations. He was employed in

the early computer industry, spent some time on the problem of eigenvalue

computation and did amazing work, and then moved on to other things.

He never looked back, and he remained unaware of the huge impact of his

work until many years later.

1 Introduction.

A problem that arises frequently in scientiﬁc computing applications is that of

computing the eigenvalues and eigenvectors of a real or complex square matrix.

Nowadays we take it for granted that we can do this. For example, I was able

to compute a complete set of eigenvalues and eigenvectors of a real 1000 1000

matrix on my laptop in about 15 seconds using MATLAB

R (

. Not only MATLAB

has this capability; it is built into a large number of packages, both proprietary

and open source. Not long ago [7] the complete eigensystem of a dense matrix

of order 10

5

was computed on a parallel supercomputer.

1

These feats are made

possible by good software and good hardware. Today’s computers are fast and

have large memories, and they are reliable.

Fifty years ago the situation was very diﬀerent. Computers were slow, had

small memories, and were unreliable. The software situation was equally bad.

Nobody knew how to compute the eigenvalues of, say, a 10 10 matrix in an

eﬃcient, reliable way. It was in this environment that young John Francis found

1

Most large matrices that arise in applications are sparse, not dense. That is, the vast

majority of their entries are zeros. For sparse matrices there are special methods [1] that

can compute selected eigenvalues of extremely large matrices (much larger than 10

5

). These

methods are important, but they are not the focus of this paper.

1

himself writing programs for the Pegasus computer at the National Defense Re-

search Corporation in London. A major problem at that time was the ﬂutter

of aircraft wings, and for the analysis there was a need to compute eigenvalues.

Francis was given the task of writing some matrix computation routines, in-

cluding some eigenvalue programs. His interest in the problem grew far beyond

what he had been assigned to do. He had obtained a copy of Heinz Rutishauser’s

paper on the LR algorithm [13], he had attended seminar talks by James Wilkin-

son

2

in which he learned about the advantages of computing with orthogonal

matrices, and he saw the possibility of creating a superior method. He worked

on this project on the side, with the tolerance (and perhaps support) of his

supervisor, Christopher Strachey. In 1959 he created and tested the procedure

that is now commonly known as the implicitly-shifted QR algorithm, which I

prefer to call Francis’s algorithm. It became and has continued to be the big

workhorse of eigensystem computations. A version of Francis’s algorithm was

used by MATLAB when I asked it to compute the eigensystem of a 1000 1000

matrix on my laptop. Another version was used for the computation on the

matrix of order 10

5

reported in [7].

Once Francis ﬁnished his work on the QR project, he moved on to other

things and never looked back. In the infant computer industry there were many

urgent tasks, for example, compiler development. Within a few years the im-

portance of Francis’s algorithm was recognized, but by then Francis had left the

world of numerical analysis. Only many years later did he ﬁnd out what a huge

impact his algorithm had had.

2 Precursors of Francis’s Algorithm.

Most of us learned in an introductory linear algebra course that the way to

compute eigenvalues is to form the characteristic polynomial and factor it to

get the zeros. Because of the equivalence of the eigenvalue problem with that

of ﬁnding the roots of a polynomial equation, it is diﬃcult to put a date on the

beginning of the history of eigenvalue computations. The polynomial problem

is quite old, and we all know something about its history. In particular, early in

the 19th century Abel proved that there is no general formula (using only addi-

tion, subtraction, multiplication, division, and the extraction of roots) for the

roots of a polynomial of degree ﬁve. This result, which is one of the crowning

achievements of Galois theory, has the practical consequence for numerical ana-

lysts that all methods for computing eigenvalues are iterative. Direct methods,

such as Gaussian elimination for solving a linear system Ax = b, do not exist

for the eigenvalue problem.

The early history of eigenvalue computation is intertwined with that of com-

puting roots of polynomial equations. Eventually it was realized that forming

the characteristic polynomial might not be a good idea, as the zeros of a poly-

nomial are often extremely sensitive to small perturbations in the coeﬃcients

2

A few years later, Wilkinson wrote the highly inﬂuential book The Algebraic Eigenvalue

Problem [20].

2

[19, 21]. All modern methods for computing eigenvalues work directly with

the matrix, avoiding the characteristic polynomial altogether. Moreover, the

eigenvalue problem has become a much more important part of computational

mathematics than the polynomial root ﬁnding problem is.

For our story, an important early contribution was the 1892 dissertation of

Jacques Hadamard [9], in which he proposed a method for computing the poles

of a meromorphic function

f(z) =

∞

¸

k=0

s

k

z

k+1

, (1)

given the sequence of coeﬃcients (s

k

). Some sixty years later Eduard Stiefel,

founder of the Institute for Applied Mathematics at ETH Z¨ urich, set a related

problem for his young assistant, Heinz Rutishauser: given a matrix A, determine

its eigenvalues from the sequence of moments

s

k

= y

T

A

k

x, k = 0, 1, 2, . . . ,

where x and y are (more or less) arbitrary vectors. If the moments are used to

deﬁne a function f as in (1), every pole of f is an eigenvalue of A. Moreover, for

almost any choice of vectors x and y, the complete set of poles of f is exactly

the complete set of eigenvalues of A. Thus the problem posed by Stiefel is

essentially equivalent to the problem addressed by Hadamard. Rutishauser came

up with a new method, which he called the quotient-diﬀerence (qd) algorithm

[11, 12], that solves this problem. Today we know that this is a bad way to

compute eigenvalues, as the poles of f (like the zeros of a polynomial) can be

extremely sensitive to small changes in the coeﬃcients. However, Rutishauser

saw that there were multiple ways to organize and interpret his qd algorithm.

He reformulated it as a process of factorization and recombination of tridiagonal

matrices, and from there he was able to generalize it to obtain the LR algorithm

[13]. For more information about this seminal work see [8], in addition to the

works of Rutishauser cited above.

Francis knew about Rutishauser’s work, and so did Vera Kublanovskaya,

working in Leningrad. Francis [3] and Kublanovskaya [10] independently pro-

posed the QR algorithm, which promises greater stability by replacing LR de-

compositions with QR decompositions. The basic QR algorithm is as follows:

Factor your matrix A into a product A = QR, where Q is unitary and R is

upper triangular. Then reverse the factors and multiply them back together to

get a new matrix: RQ =

ˆ

A. Brieﬂy we have

A = QR, RQ =

ˆ

A.

This is one iteration of the basic QR algorithm. It is easy to check that

ˆ

A =

Q

−1

AQ, so

ˆ

A is unitarily similar to A and therefore has the same eigenvalues.

If the process is now iterated:

A

j−1

= Q

j

R

j

, R

j

Q

j

= A

j

,

3

the sequence of unitarily similar matrices so produced will (usually) tend to-

ward (block) triangular form, eventually revealing the eigenvalues on the main

diagonal. If A is real, the entire process can be done in real arithmetic.

The basic QR algorithm converges rather slowly. A remedy is to incorporate

shifts of origin:

A

j−1

−ρ

j

I = Q

j

R

j

, R

j

Q

j

+ ρ

j

I = A

j

. (2)

Again it is easy to show that the matrices so produced are unitarily similar.

Good choices of shifts ρ

j

can improve the convergence rate dramatically. A

good shift is one that approximates an eigenvalue well.

Many applications feature real matrices that have, however, lots of complex

eigenvalues. If one wants rapid convergence of complex eigenvalues, one needs

to use complex shifts and carry out (2) in complex arithmetic. This posed a

problem for Francis: Working in complex arithmetic on the Pegasus computer

would have been really diﬃcult. Moreover, complex matrices require twice as

much storage space as real matrices do, and Pegasus didn’t have much storage

space. He therefore looked for a method that avoids complex arithmetic.

If iterate A

0

is real and we do an iteration of (2) with a complex shift

ρ, the resulting matrix A

1

is, of course, complex. However, if we then do

another iteration using the complex conjugate shift ρ, the resulting matrix A

2

is again real. Francis knew this, and he looked for a method that would get him

directly from A

0

to A

2

. He found such a method, the double-shift QR algorithm

[4], which does two iterations of the QR algorithm implicitly, entirely in real

arithmetic. This algorithm does not cause convergence to triangular form; it

is impossible for complex eigenvalues to emerge on the main diagonal of a real

matrix. Instead pairs of complex conjugate eigenvalues emerge in 2 2 packets

along the main diagonal of a block triangular matrix.

Francis coded his new algorithm in assembly language on the Pegasus com-

puter and tried it out. Using this new method, the ﬂying horse was able to

compute all of the eigenvalues of a 24 24 matrix in just over ten minutes [4].

We will describe Francis’s double-shift QR algorithm below, and the reader

will see that it is completely diﬀerent from the shifted QR algorithm (2). Francis

had to expend some eﬀort to demonstrate that his new method does indeed eﬀect

two steps of (2).

When we teach students about Francis’s algorithm today, we still follow this

path. We introduce the shifted QR algorithm (2) and talk about why it works,

then we introduce Francis’s algorithm, and then we demonstrate equivalence

using the so-called implicit-Q theorem. A primary objective of this paper is to

demonstrate that this approach is ineﬃcient. It is simpler to introduce Francis’s

algorithm straightaway and explain why it works, bypassing (2) altogether.

3

3

It is only fair to admit that his is not my ﬁrst attempt. The earlier work [16], which I now

view as premature, failed to expose fully the role of Krylov subspaces (explained below) for

the functioning of Francis’s algorithm. It is my fervent hope that I will not view the current

work as premature a few years from now.

4

3 Reﬂectors and the Reduction to Hessenberg

Form.

Before we can describe Francis’s algorithm, we need to introduce some basic

tools of numerical linear algebra. For simplicity we will restrict our attention to

the real case, but everything we do here can easily be extended to the complex

setting.

Let x and y be two distinct vectors in R

n

with the same Euclidean norm:

|x|

2

= |y |

2

, and let o denote the hyperplane (through the origin) orthogonal

to x−y. Then the linear transformation Q that reﬂects vectors through o clearly

maps x to y and vice versa. Since Q preserves norms, it is unitary: Q

∗

= Q

−1

.

It is also clearly involutory: Q

−1

= Q, so it is also Hermitian: Q

∗

= Q. Viewing

Q as a matrix, it is real, orthogonal (Q

T

= Q

−1

), and symmetric (Q

T

= Q). A

precise expression for the matrix Q is

Q = I −2uu

T

, where u = (x −y)/|x −y |

2

.

Now consider an arbitrary nonzero x and a special y:

x =

x

1

x

2

.

.

.

x

n

¸

¸

¸

¸

¸

, y =

y

1

0

.

.

.

0

¸

¸

¸

¸

¸

, where y

1

= ±|x|

2

.

Since x and y have the same norm, we know that there is a reﬂector Q such

that Qx = y. Such an operator is clearly a boon to numerical linear algebraists,

as it gives us a means of introducing large numbers of zeros. Letting e

1

denote

the vector with a 1 in the ﬁrst position and zeros elsewhere, as usual, we can

restate our result as follows.

Theorem 1. Let x ∈ R

n

be any nonzero vector. Then there is a reﬂector Q

such that Qx = αe

1

, where α = ±|x|

2

.

In the world of matrix computations, reﬂectors are also known as House-

holder transformations. For details on the practical construction and use of

reﬂectors see [17] or [6], for example.

With reﬂectors in hand we can easily transform any matrix nearly all the

way to triangular form. A matrix A is called upper Hessenberg if it satisﬁes

a

ij

= 0 whenever i > j + 1. For example, a 5 5 upper Hessenberg matrix has

the form

∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗

∗ ∗ ∗

∗ ∗

¸

¸

¸

¸

¸

¸

,

where blank spots denote zeros and asterisks denote numbers that can have any

real value.

5

Theorem 2. Every A ∈ R

n×n

is orthogonally similar to an upper Hessenberg

matrix: H = Q

−1

AQ, where Q is a product of n −2 reﬂectors.

Proof. The ﬁrst reﬂector Q

1

has the form

Q

1

=

¸

1

˜

Q

1

, (3)

where

˜

Q

1

is an (n −1) (n −1) reﬂector such that

˜

Q

1

a

21

a

31

.

.

.

a

n1

¸

¸

¸

¸

¸

=

∗

0

.

.

.

0

¸

¸

¸

¸

¸

.

If we multiply A on the left by Q

1

, we obtain a matrix Q

1

A that has zeros in the

ﬁrst column from the third entry on. To complete a similarity transformation,

we must multiply on the right by Q

1

(= Q

−1

1

). Because Q

1

has the form (3), the

transformation Q

1

A → Q

1

AQ

1

does not alter the ﬁrst column. Thus Q

1

AQ

1

has the desired zeros in the ﬁrst column.

The second reﬂector Q

2

has the form

Q

2

=

1

1

˜

Q

2

¸

¸

and is constructed so that Q

2

Q

1

AQ

1

Q

2

has zeros in the second column from

the fourth position on. This transformation does not disturb the zeros that were

created in the ﬁrst column in the ﬁrst step. The details are left for the reader

(or see [17], for example). The third reﬂector Q

3

creates zeros in the third

column, and so on. After n −2 reﬂectors, we will have produced a matrix H =

Q

n−2

Q

1

AQ

1

Q

n−2

in upper Hessenberg form. Letting Q = Q

1

Q

n−2

,

we have Q

−1

= Q

n−2

Q

1

, and H = Q

−1

AQ.

The proof of Theorem 2 is constructive; that is, it gives a ﬁnite algorithm

(a direct method) for computing H and Q. The operations used are those

required for setting up and applying reﬂectors, namely addition, subtraction,

multiplication, division, and the extraction of square roots [6, 17]. What we

really want is an algorithm to get us all the way to upper triangular form, from

which we can read oﬀ the eigenvalues. But Hessenberg form is as far as we can

get with a ﬁnite algorithm using the speciﬁed operations. If we had a general

procedure for creating even one more zero below the main diagonal, we would

be able to split the eigenvalue problem into two smaller eigenvalue problems. If

we could do this, we would be able to split each of the smaller problems into still

smaller problems and eventually get to triangular form by a ﬁnite algorithm.

This would imply the existence of a formula for the zeros of a general nth degree

polynomial, in violation of Galois theory.

6

The ability to reduce a matrix to upper Hessenberg form is quite useful. If

the QR algorithm (2) is initiated with a matrix A

0

that is upper Hessenberg,

then (except in some special cases that can be avoided) all iterates A

j

are

upper Hessenberg. This results in much more economical iterations, as both

the QR decomposition and the subsequent RQ multiplication can be done in

many fewer operations in the Hessenberg case. Thus an initial reduction to

upper Hessenberg form is well worth the extra expense. In fact, the algorithm

would not be competitive without it.

For Francis’s implicitly-shifted QR algorithm, the Hessenberg form is abso-

lutely essential.

4 Francis’s Algorithm.

Suppose we want to ﬁnd the eigenvalues of some matrix A ∈ R

n×n

. We continue

to focus on the real case; the extension to complex matrices is straightforward.

We know from Theorem 2 that we can reduce A to upper Hessenberg form, so

let us assume from the outset that A is upper Hessenberg. We can assume,

moreover, that A is properly upper Hessenberg. This means that all of the

subdiagonal entries of A are nonzero: a

j+1,j

= 0 for j = 1, 2, . . . , n − 1.

Indeed, if A is not properly upper Hessenberg, say a

k+1,k

= 0, then A has the

block-triangular form

A =

¸

A

11

A

12

0 A

22

,

where A

11

is k k. Thus the eigenvalue problem for A reduces to eigenvalue

problems for the smaller upper Hessenberg matrices A

11

and A

22

. If either

of these is not properly upper Hessenberg, we can break it apart further until

we are ﬁnally left with proper upper Hessenberg matrices. We will assume,

therefore, that A is properly upper Hessenberg from the outset.

An iteration of Francis’s algorithm of degree m begins by picking m shifts

ρ

1

, . . . , ρ

m

. In principal m can be any positive integer, but in practice it should

be fairly small. Francis took m = 2. The rationale for shift selection will

be explained later. There are many reasonable ways to choose shifts, but the

simplest is to take ρ

1

, . . . , ρ

m

to be the eigenvalues of the mm submatrix in

the lower right-hand corner of A. This is an easy computation if m = 2.

If m > 1, this shifting strategy can produce complex shifts, but they always

occur in conjugate pairs. Whether we use this or some other strategy, we will

always insist that it have this property: when the matrix is real, complex shifts

must be produced in conjugate pairs.

Now let

p(A) = (A −ρ

m

I) (A −ρ

1

I).

We do not actually compute p(A), as this would be too expensive. Francis’s

algorithm just needs the ﬁrst column

x = p(A)e

1

,

7

which is easily computed by m successive matrix-vector multiplications. The

computation is especially cheap because A is upper Hessenberg. It is easy

to check that (A − ρ

1

I)e

1

has nonzero entries only in its ﬁrst two positions,

(A − ρ

2

I)(A − ρ

1

I)e

1

has nonzero entries only in its ﬁrst three positions, and

so on. As a consequence, x has nonzero entries only in its ﬁrst m+1 positions.

The entire computation of x depends only on the ﬁrst m columns of A (and

the shifts) and requires negligible computational eﬀort if m is small. Since the

complex shifts occur in conjugate pairs, p(A) is real. Therefore x is real.

Let ˜ x ∈ R

m+1

denote the vector consisting of the ﬁrst m+1 entries of x (the

nonzero part). Theorem 1 guarantees that there is an (m+1)(m+1) reﬂector

˜

Q

0

such that

˜

Q

0

˜ x = αe

1

, where α = ±| ˜ x|

2

. Let Q

0

be an nn reﬂector given

by

Q

0

=

¸

˜

Q

0

I

. (4)

Clearly Q

0

x = αe

1

and, since Q

0

is an involution, Q

0

e

1

= α

−1

x. Thus the ﬁrst

column of Q

0

is proportional to x.

Now use Q

0

to perform a similarity transformation: A → Q

−1

0

AQ

0

=

Q

0

AQ

0

. This disturbs the Hessenberg form but only slightly. Because Q

0

has the form (4), the transformation A → Q

0

A aﬀects only the ﬁrst m+1 rows.

Typically the ﬁrst m + 1 rows are completely ﬁlled in. The transformation

Q

0

A → Q

0

AQ

0

aﬀects only the ﬁrst m + 1 columns. Since a

m+2,m+1

= 0, row

m + 2 gets ﬁlled in by this transformation. Below row m + 2, we have a big

block of zeros, and these remain zero. The total eﬀect is that there is a bulge

in the Hessenberg form. In the case m = 3 and n = 7, the matrix Q

0

AQ

0

looks

like

∗ ∗ ∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗ ∗ ∗ ∗

∗ ∗ ∗

∗ ∗

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

.

In this 7 7 matrix the bulge looks huge. If you envision, say, the 100 100

case (keeping m = 3), you will see that the matrix is almost upper Hessenberg

with a tiny bulge at the top.

The rest of the Francis iteration consists of returning this matrix to upper

Hessenberg form by the algorithm sketched in the proof of Theorem 2. This

begins with a transformation Q

1

that acts only on rows 2 through n and creates

the desired zeros in the ﬁrst column. Since the ﬁrst column already consists of

zeros after row m + 2, the scope of Q

1

can be restricted a lot further in this

case. It needs to act on rows 2 through m + 2 and create zeros in positions

(3, 1), . . . , (m+2, 1). Applying Q

1

on the left, we get a matrix Q

1

Q

0

AQ

0

that

has the Hessenberg form restored in the ﬁrst column. Completing the similarity

transformation, we multiply by Q

1

on the right. This recombines columns 2

through m + 2. Since a

m+3,m+2

= 0, additional nonzero entries are created in

8

positions (m+3, 2), . . . , (m+3, m+1). In the case m = 3 and n = 7 the matrix

Q

1

Q

0

AQ

0

Q

1

looks like

∗ ∗ ∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗ ∗ ∗

∗ ∗

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

.

The bulge has not gotten any smaller, but it has been pushed one position to

the right and downward. This establishes the pattern for the process. The

next transformation will push the bulge over and down one more position, and

so on. Thinking again of the 100 100 case, we see that a long sequence

of such transformations will chase the bulge down through the matrix until

it is ﬁnally pushed oﬀ the bottom. At this point, Hessenberg form will have

been restored and the Francis iteration will be complete. For obvious reasons,

Francis’s algorithm is sometimes referred to as a bulge chasing algorithm.

An iteration of Francis’s algorithm of degree m can be summarized brieﬂy

as follows.

1. Pick some shifts ρ

1

, . . . , ρ

m

.

2. Compute x = p(A)e

1

= (A −ρ

m

I) (A −ρ

1

I)e

1

.

3. Compute a reﬂector Q

0

whose ﬁrst column is proportional to x.

4. Do a similarity transformation A → Q

0

AQ

0

, creating a bulge.

5. Return the matrix to upper Hessenberg form by chasing the bulge.

Notice that, although this algorithm is commonly known as the implicitly-

shifted QR algorithm, its execution does not require any QR decompositions.

Why, then, should we call it the QR algorithm? The name is misleading, and

for this reason I prefer to call it Francis’s algorithm.

Letting

ˆ

A denote the ﬁnal result of the Francis iteration, we have

ˆ

A = Q

n−2

Q

1

Q

0

AQ

0

Q

1

Q

n−2

,

where Q

0

is the transformation that creates the bulge, and Q

1

, . . . , Q

n−2

are

the transformations that chase it. Letting

Q = Q

0

Q

1

Q

n−2

,

we have

ˆ

A = Q

−1

AQ.

9

Recall that Q

0

was built in such a way that Q

0

e

1

= βx = βp(A)e

1

. Looking

back to the proof of Theorem 2, we recall that each of the other Q

i

has e

1

as

its ﬁrst column. Thus Q

i

e

1

= e

1

, i = 1, . . . , n −2. We conclude that

Qe

1

= Q

0

Q

1

Q

n−2

e

1

= Q

0

e

1

= βx.

We summarize these ﬁndings in a theorem.

Theorem 3. A Francis iteration of degree m with shifts ρ

1

, . . . , ρ

m

eﬀects an

orthogonal similarity transformation

ˆ

A = Q

−1

AQ,

where

Qe

1

= βp(A)e

1

= β(A −ρ

m

I) (A −ρ

1

I)e

1

for some nonzero β. In words, the ﬁrst column of Q is proportional to the ﬁrst

column of p(A).

In the next section we will see why this procedure is so powerful. Repeated

Francis iterations with well-chosen shifts typically result in rapid convergence, in

the sense that a

n−m+1,n−m

→ 0 quadratically.

4

In a few iterations, a

n−m+1,n−m

will be small enough to be considered zero. We can then deﬂate the problem:

A =

¸

A

11

A

12

0 A

22

.

The (small) mm matrix A

22

can be resolved into m eigenvalues with negligible

work. (Think of the case m = 2, for example.) We can then focus on the

remaining (n − m) (n − m) submatrix A

11

and go after another set of m

eigenvalues.

5 Why Francis’s Algorithm Works.

Subspace Iteration.

At the core of Francis’s algorithm lies the humble power method. In the k-

dimensional version, which is known as subspace iteration, we pick a k-dimensional

subspace o of R

n

(or perhaps C

n

) and, through repeated multiplication by A,

build the sequence of subspaces

o, Ao, A

2

o, A

3

o, . . . . (5)

Here by Ao we mean ¦Ax [ x ∈ o¦. To avoid complications in the discussion,

we will make some simplifying assumptions. We will suppose that all of the

4

This is a simpliﬁcation. It often happens that some shifts are much better than others,

resulting in a

n−k+1,n−k

→0 for some k < m.

10

spaces in the sequence have the same dimension k.

5

We also assume that A

is a diagonalizable matrix with n linearly independent eigenvectors v

1

, . . . , v

n

and associated eigenvalues λ

1

, . . . , λ

n

.

6

Sort the eigenvalues and eigenvectors

so that [ λ

1

[ ≥ [ λ

2

[ ≥ ≥ [ λ

n

[. Any vector x can be expressed as a linear

combination

x = c

1

v

1

+ c

2

v

2

+ + c

n

v

n

for some (unknown) c

1

, . . . , c

n

. Thus clearly

A

j

x = c

1

λ

j

1

v

1

+ + c

k

λ

j

k

v

k

+ c

k+1

λ

j

k+1

v

k+1

+ + c

n

λ

j

n

v

n

, j = 1, 2, 3 . . . .

If [ λ

k

[ > [ λ

k+1

[, the components in the directions v

1

, . . . , v

k

will grow relative

to the components in the directions v

k+1

, . . . , v

n

as j increases. As a conse-

quence, unless our choice of o was very unlucky, the sequence (5) will converge

to the k-dimensional invariant subspace spanned by v

1

, . . . , v

k

. The conver-

gence is linear with ratio [ λ

k+1

/λ

k

[, which means that the error is reduced by

a factor of approximately [ λ

k+1

/λ

k

[ on each iteration [18, 15].

Often the ratio [ λ

k+1

/λ

k

[ will be close to 1, so convergence will be slow. In

an eﬀort to speed up convergence, we could consider replacing A by some p(A)

in (5), giving the iteration

o, p(A)o, p(A)

2

o, p(A)

3

o, . . . . (6)

If p(z) = (z − ρ

m

) (z − ρ

1

) is a polynomial of degree m, each step of (6)

amounts to m steps of (5) with shifts ρ

1

, . . . , ρ

m

. The eigenvalues of p(A) are

p(λ

1

), . . . , p(λ

n

). If we renumber them so that [ p(λ

1

)[ ≥ ≥ [ p(λ

n

)[, the

rate of convergence of (6) will be [ p(λ

k+1

)/p(λ

k

)[. Now we have a bit more

ﬂexibility. Perhaps by a wise choice choice of shifts ρ

1

, . . . , ρ

m

, we can make

this ratio small, at least for some values of k.

Subspace Iteration with Changes of Coordinate System.

As we all know, a similarity transformation

ˆ

A = Q

−1

AQ is just a change of

coordinate system. A and

ˆ

A are two matrices that represent the same linear

operator with respect to two diﬀerent bases. Each vector in R

n

is the coordinate

vector of some vector v in the vector space on which the operator acts. If the

vector v has coordinate vector x before the change of coordinate system, it will

have coordinate vector Q

−1

x afterwards.

Now consider a step of subspace iteration applied to the special subspace

c

k

= span¦e

1

, . . . , e

k

¦,

where e

i

is the standard basis vector with a 1 in the ith position and zeros

elsewhere. The vectors p(A)e

1

, . . . , p(A)e

k

are a basis for the space p(A)c

k

.

5

Of course it can happen that the dimension decreases in the course of the iterations. This

does not cause any problems for us. In fact, it is good news [15], but we will not discuss that

case here.

6

The non-diagonalizable case is more complicated but leads to similar conclusions. See [18]

or [15], for example.

11

Let q

1

, . . . , q

k

be an orthonormal basis for p(A)c

k

, which could be obtained

by some variant of the Gram-Schmidt process, for example. Let q

k+1

, . . . ,

q

n

be additional orthonormal vectors such that q

1

, . . . , q

n

together form an

orthonormal basis of R

n

, and let Q =

q

1

q

n

∈ R

n×n

. Since q

1

, . . . ,

q

n

are orthonormal, Q is an orthogonal matrix.

Now use Q to make a change of coordinate system

ˆ

A = Q

−1

AQ = Q

T

AQ.

Let us see what this change of basis does to the space p(A)c

k

. To this end we

check the basis vectors q

1

, . . . , q

k

. Under the change of coordinate system, these

get mapped to Q

T

q

1

, . . . , Q

T

q

k

. Because the columns of Q are the orthonormal

vectors q

1

, . . . , q

n

, the vectors Q

T

q

1

, . . . , Q

T

q

k

are exactly e

1

, . . . , e

k

. Thus

the change of coordinate system maps p(A)c

k

back to c

k

.

Now, if we want to do another step of subspace iteration, we can work in the

new coordinate system, applying p(

ˆ

A) to c

k

. Then we can do another change

of coordinate system and map p(

ˆ

A)c

k

back to c

k

. If we continue to iterate

in this manner, we produce a sequence of orthogonally similar matrices (A

j

)

through successive changes of coordinate system, and we are always dealing

with the same subspace c

k

. This is a version of subspace iteration for which

the subspace stays ﬁxed and the matrix changes.

What does convergence mean in this case? As j increases, c

k

comes closer

and closer to being invariant under A

j

. The special space c

k

is invariant under

A

j

if and only if A

j

has the block-triangular form

A

j

=

¸

A

(j)

11

A

(j)

12

0 A

(j)

22

¸

,

where A

(j)

11

is k k. Of course, we just approach invariance; we never attain it

exactly. In practice we will have

A

j

=

¸

A

(j)

11

A

(j)

12

A

(j)

21

A

(j)

22

¸

,

where A

(j)

21

→ 0 linearly with ratio [ p(λ

k+1

)/p(λ

k

)[. Eventually A

(j)

21

will get

small enough that we can set it to zero and split the eigenvalue problem into

two smaller problems.

If one thinks about implementing subspace iteration with changes of coordi-

nate system in a straightforward manner, as outlined above, it appears to be a

fairly expensive procedure. Fortunately there is a way to implement this method

on upper Hessenberg matrices at reasonable computational cost, namely Fran-

cis’s algorithm. This amazing procedure eﬀects subspace iteration with changes

of coordinate system, not just for one k, but for k = 1, . . . , n − 1 all at once.

At this point we are not yet ready to demonstrate this fact, but we can get a

glimpse at what is going on.

12

Francis’s algorithm begins by computing the vector p(A)e

1

. This is a step

of the power method, or subspace iteration with k = 1, mapping c

1

= span¦e

1

¦

to p(A)c

1

. Then, according to Theorem 3, this is followed by a change of

coordinate system

ˆ

A = Q

−1

AQ in which span¦q

1

¦ = p(A)c

1

. This is exactly

the case k = 1 of subspace iteration with a change of coordinate system. Thus

if we perform iterations repeatedly with the same shifts (the eﬀect of changing

shifts will be discussed later), the sequence of iterates (A

j

) so produced will

converge linearly to the form

λ

1

∗ ∗

0

.

.

.

0

∗ ∗

.

.

.

.

.

.

∗ ∗

¸

¸

¸

¸

¸

.

Since the iterates are upper Hessenberg, this just means that a

(j)

21

→ 0. The

convergence is linear with ratio [ p(λ

2

)/p(λ

1

)[.

This is just the tip of the iceberg. To get the complete picture, we must

bring one more concept into play.

Krylov Subspaces.

Given a nonzero vector x, the sequence of Krylov subspaces associated with x

is deﬁned by

/

1

(A, x) = span¦x¦,

/

2

(A, x) = span¦x, Ax¦,

/

3

(A, x) = span

¸

x, Ax, A

2

x

¸

,

and in general /

k

(A, x) = span

¸

x, Ax, . . . , A

k−1

x

¸

, k = 1, 2, . . . , n. We need

to make a couple of observations about Krylov subspaces. The ﬁrst is that

wherever there are Hessenberg matrices, there are Krylov subspaces lurking in

the background.

Theorem 4. Let A be properly upper Hessenberg. Then

span¦e

1

, . . . , e

k

¦ = /

k

(A, e

1

), k = 1, . . . , n.

The proof is an easy exercise.

Theorem 5. Suppose H = Q

−1

AQ, where H is properly upper Hessenberg. Let

q

1

, . . . , q

n

denote the columns of Q. Then

span¦q

1

, . . . , q

k

¦ = /

k

(A, q

1

), k = 1, . . . , n.

In our application, Q is orthogonal, but this theorem is valid for any non-

singular Q. In the special case Q = I, it reduces to Theorem 4.

13

Proof. We use induction on k. For k = 1, the result holds trivially. Now, for the

induction step, rewrite the similarity transformation as AQ = QH. Equating

kth columns of this equation we have, for k = 1, . . . , n −1,

Aq

k

=

k+1

¸

i=1

q

i

h

ik

.

The sum stops at k +1 because H is upper Hessenberg. We can rewrite this as

q

k+1

h

k+1,k

= Aq

k

−

k

¸

i=1

q

i

h

ik

. (7)

Our induction hypothesis is that span¦q

1

, . . . , q

k

¦ = /

k

(A, q

1

). Since q

k

∈

/

k

(A, q

1

), it is clear that Aq

k

∈ /

k+1

(A, q

1

). Then, since h

k+1,k

= 0, equation

(7) implies that q

k+1

∈ /

k+1

(A, q

1

). Thus span¦q

1

, . . . , q

k+1

¦ ⊆ /

k+1

(A, q

1

).

Since q

1

, . . . , q

k+1

are linearly independent, and the dimension of /

k+1

(A, q

1

)

is at most k + 1, these subspaces must be equal.

We remark in passing that equation (7) has computational as well as theo-

retical importance. It is the central equation of the Arnoldi process, one of the

most important algorithms for large, sparse matrix computations [1].

Our second observation about Krylov subspaces is in connection with sub-

space iteration. Given a matrix A, which we take for granted as our given object

of study, we can say that each nonzero vector x contains the information to build

a whole sequence of Krylov subspaces /

1

(A, x), /

2

(A, x), /

3

(A, x), . . . . Now

consider a step of subspace iteration applied to a Krylov subspace. One easily

checks that

p(A)/

k

(A, x) = /

k

(A, p(A)x),

as a consequence of the equation Ap(A) = p(A)A. Thus the result of a step

of subspace iteration on a Krylov subspace generated by x is another Krylov

subspace, namely the one generated by p(A)x. It follows that

the power method x → p(A)x contains all the information about a

whole sequence of nested subspace iterations,

in the following sense. The vector x generates a whole sequence of Krylov sub-

spaces /

k

(A, x), k = 1, . . . , n, and p(A)x generates the sequence /

k

(A, p(A)x),

k = 1, . . . , n. Moreover, the kth space /

k

(A, p(A)x) is the result of one step of

subspace iteration on the kth space /

k

(A, x).

Back to Francis’s Algorithm.

Each iteration of Francis’s algorithm executes a step of the power method e

1

→

p(A)e

1

followed by a change of coordinate system. As we have just seen, the

step of the power method induces subspace iterations on the corresponding

Krylov subspaces: /

k

(A, e

1

) → /

k

(A, p(A)e

1

), k = 1, . . . , n. This is not just

14

some theoretical action. Since Francis’s algorithm operates on properly upper

Hessenberg matrices, the Krylov subspaces in question reside in the columns of

the transforming matrices.

Indeed, a Francis iteration makes a change of coordinate system

ˆ

A = Q

−1

AQ.

It can happen that

ˆ

A has one or more zeros on the subdiagonal. This is a rare

and lucky event, which allows us to reduce the problem immediately to two

smaller eigenvalue problems. Let us assume we have not been so lucky. Then

ˆ

A is properly upper Hessenberg, and we can apply Theorem 5, along with The-

orems 3 and 4, to deduce that, for k = 1, . . . , n,

span¦q

1

, . . . , q

k

¦ = /

k

(A, q

1

)

= /

k

(A, p(A)e

1

)

= p(A)/

k

(A, e

1

)

= p(A)span¦e

1

, . . . , e

k

¦.

These equations show that Francis’s algorithm eﬀects subspace iteration with a

change of coordinate system for dimensions k = 1, . . . , n −1.

Now suppose we pick some shifts ρ

1

, . . . , ρ

m

, let p(z) = (z −ρ

m

) (z −ρ

1

),

and do a sequence of Francis iterations with this ﬁxed p to produce the sequence

(A

j

). Then, for each k for which [ p(λ

k+1

)/p(λ

k

)[ < 1, we have a

(j)

k+1,k

→ 0

linearly with rate [ p(λ

k+1

)/p(λ

k

)[. Thus all of the ratios

[ p(λ

k+1

)/p(λ

k

)[, k = 1, . . . , n −1,

are important. If any one of them is small, then one of the subdiagonal entries

will converge rapidly to zero, allowing us to break the problem into smaller

eigenvalue problems.

Choice of Shifts.

The convergence results we have so far are based on the assumption that we

are going to pick some shifts ρ

1

, . . . , ρ

m

in advance and use them over and over

again. Of course, this is not what really happens. In practice, we get access

to better and better shifts as we proceed, so we might as well use them. How,

then, does one choose good shifts?

To answer this question we start with a thought experiment. Suppose that

we are somehow able to ﬁnd m shifts that are excellent in the sense that each

of them approximates one eigenvalue well (good to several decimal places, say)

and is not nearly so close to any of the other eigenvalues. Assume the m shifts

approximate m diﬀerent eigenvalues. We have

p(z) = (z −ρ

m

) (z −ρ

1

),

where each zero of p approximates an eigenvalue well. Then, for each eigenvalue

λ

k

that is well approximated by a shift, [ p(λ

k

)[ will be tiny. There are m

such eigenvalues. For each eigenvalue that is not well approximated by a shift,

15

[ p(λ

k

)[ will not be small. Thus, exactly m of the numbers [ p(λ

k

)[ are small. If

we renumber the eigenvalues so that [ p(λ

1

)[ ≥ [ p(λ

2

)[ ≥ ≥ [ p(λ

n

)[, we will

have [ p(λ

n−m

)[ ≫ [ p(λ

n−m+1

)[, and

[ p(λ

n−m+1

)/p(λ

n−m

)[ ≪ 1.

This will cause a

(j)

n−m+1,n−m

to converge to zero rapidly. Once this entry gets

very small, the bottom mm submatrix

a

(j)

n−m+1,n−m+1

a

(j)

n−m+1,n

.

.

.

.

.

.

a

(j)

n,n−m+1

a

(j)

n,n

¸

¸

¸

¸

(8)

will be nearly separated from the rest of the matrix. Its eigenvalues will be ex-

cellent approximations to eigenvalues of A, eventually even substantially better

than the shifts that were used to ﬁnd them.

7

It makes sense, then, to take the

eigenvalues of (8) as new shifts, replacing the old ones. This will result in a

reduction of the ratio [ p(λ

n−m+1

)/p(λ

n−m

)[ and even faster convergence. At

some future point it would make sense to replace these shifts by even better

ones to get even better convergence.

This thought experiment shows that at some point it makes sense to use the

eigenvalues of (8) as shifts, but several questions remain. How should the shifts

be chosen initially? At what point should we switch to the strategy of using

eigenvalues of (8) as shifts? How often should we update the shifts?

The answers to these questions were not at all evident to Francis and the

other eigenvalue computation pioneers. A great deal of testing and experimenta-

tion has led to the following empirical conclusions. The shifts should be updated

on every iteration, and it is okay to use the eigenvalues of (8) as shifts right from

the very start. At ﬁrst they will be poor approximations, and progress will be

slow. After a few (or sometimes more) iterations, they will begin to home in

on eigenvalues and the convergence rate will improve. With new shifts chosen

on each iteration, the method normally converges quadratically [18, 15], which

means that [ a

(j+1)

n−m+1,n−m

[ will be roughly the square of [ a

(j)

n−m+1,n−m

[. Thus

successive values of [ a

(j)

n−m+1,n−m

[ could be approximately 10

−3

, 10

−6

, 10

−12

,

10

−24

, for example. Very quickly this number gets small enough that we can

declare it to be zero and split oﬀ an mm submatrix and m eigenvalues. Then

we can deﬂate the problem and go after the next set of m eigenvalues.

There is one caveat. The strategy of using the eigenvalues of (8) as shifts

does not always work, so variants have been introduced. And it is safe to say

that all shifting strategies now in use are indeed variants of this simple strategy.

The stategies used by the codes in MATLAB and other modern software are

quite good, but they might not be unbreakable. Certainly nobody has been

7

The eigenvalues that are well approximated by (8) are indeed the same as the eigenvalues

that are well approximated by the shifts.

16

able to prove that they are. It is an open question to come up with a shifting

strategy that provably always works and normally yields rapid convergence.

It is a common phenomenon that numerical methods work better than we

can prove they work. In Francis’s algorithm the introduction of dynamic shifting

(new shifts on each iteration) dramatically improves the convergence rate but

at the same time makes the analysis much more diﬃcult.

Next-to-Last Words.

Francis’s algorithm is commonly known as the implicitly-shifted QR algorithm,

but it bears no resemblance to the basic QR algorithm (2). We have described

Francis’s algorithm and explained why it works without referring to (2) in any

way.

One might eventually like to see the connection between Francis’s algorithm

and the QR decomposition:

Theorem 6. Consider a Francis iteration

ˆ

A = Q

−1

AQ

of degree m with shifts ρ

1

, . . . , ρ

m

. Let p(z) = (z − ρ

m

) (z − ρ

1

), as usual.

Then there is an upper-triangular matrix R such that

p(A) = QR.

In words, the transforming matrix Q of a Francis iteration is exactly the orthog-

onal factor in the QR decomposition of p(A).

The proof can be found in [17], for example. Theorem 6 is also valid for

a long sequence of Francis iterations, for which ρ

1

, . . . , ρ

m

is the long list of

all shifts used in all iterations (m could be a million). This is a useful tool for

formal convergence proofs. In fact, all formal convergence results for QR and

related algorithms that I am aware of make use of this tool or a variant of it.

In this paper we have focused on computation of eigenvalues. However,

Francis’s algorithm can be adapted in straightforward ways to the computation

of eigenvectors and invariant subspaces as well [6, 15, 17].

6 What Became of John Francis?

From about 1970 on, Francis’s algorithm was ﬁrmly established as the most

important algorithm for the eigenvalue problem. Francis’s two papers [3, 4]

accumulated hundreds of citations. As a part of the celebration of the arrival

of the year 2000, Dongarra and Sullivan [2] published a list of the top ten algo-

rithms of the twentieth century. “The QR algorithm” was on that list. Francis’s

work had become famous, but by the year 2000 nobody in the community of nu-

merical analysts knew what had become of the man or could recall ever having

met him. It was even speculated that he had died.

17

Figure 1: John Francis speaks at the University of Strathclyde, Glasgow, in

June 2009. Photo: Frank Uhlig.

Two members of our community, Gene Golub and Frank Uhlig, working

independently, began to search for him. With the help of the internet they were

able to ﬁnd him, alive and well, retired in the South of England. In August

of 2007 Golub visited Francis at his home. This was just a few months before

Gene’s untimely death in November. Golub reported that Francis had been

completely unaware of the huge impact of his work. The following summer

Uhlig was able to visit Francis and chat with him several times over the course

of a few days. See Uhlig’s papers [5, 14] for more information about Francis’s

life and career.

Uhlig managed to persuade Francis to attend and speak at the Biennial

Numerical Analysis conference at the University of Strathclyde, Glasgow, in

June of 2009. With some help from Andy Wathen, Uhlig organized a minisym-

posium at that conference honoring Francis and discussing his work and the

related foundational work of Rutishauser and Kublanovskaya. I had known of

the Biennial Numerical Analysis meetings in Scotland for many years and had

many times thought of attending, but I had never actually gotten around to

doing so. John Francis changed all that. No way was I going to pass up the

opportunity to meet the man and speak in the same minisymposium with him.

As the accompanying photograph shows, Francis remains in great shape at age

75. His mind is sharp, and he was able to recall a lot about the QR algorithm

and how it came to be. We enjoyed his reminiscences very much.

A few months later I spoke about Francis and his algorithm at the Paciﬁc

Northwest Numerical Analysis Seminar at the University of British Columbia.

I displayed this photograph and remarked that Francis has done a better job

than I have at keeping his weight under control over the years. Afterward Randy

LeVeque explained it to me: Francis knows how to chase the bulge.

18

References

[1] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, eds.,

Templates for the Solution of Algebraic Eigenvalue Problems: A Practical

Guide, Society for Industrial and Applied Mathematics, Philadelphia, 2000.

[2] J. Dongarra and F. Sullivan, The top 10 algorithms, Comput. Sci. Eng. 2

(2000) 22–23.

[3] J. G. F. Francis, The QR transformation, part I, Computer J. 4 (1961)

265–272.

[4] , The QR transformation, part II, Computer J. 4 (1961) 332–345.

[5] G. H. Golub and F. Uhlig, The QR algorithm: 50 years later its genesis by

John Francis and Vera Kublanovskaya and subsequent developments, IMA

J. Numer. Anal. 29 (2009) 467–485.

[6] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed., Johns

Hopkins University Press, Baltimore, MD, 1996.

[7] R. Granat, B. K˚agstr¨ om, and D. Kressner, A novel parallel QR algorithm

for hybrid distributed memory HPC systems, SIAM J. Sci. Stat. Comput.

32 (2010) 2345–2378.

[8] M. Gutknecht and B. Parlett, From qd to LR, or, How were the qd and

LR algorithms discovered? IMA J. Numer. Anal. (to appear).

[9] J. Hadamard, Essai sur l’´etude des fonctions donn´ees par leur

d´eveloppement de Taylor, J. Math. 4 (1892) 101–182.

[10] V. N. Kublanovskaya, On some algorithms for the solution of the complete

eigenvalue problem, USSR Comput. Math. and Math. Phys. 1 (1962) 637–

657.

[11] H. Rutishauser, Der Quotienten-Diﬀerenzen-Algorithmus, Z. Angew. Math.

Phys. 5 (1954) 233–251.

[12] , Der Quotienten-Diﬀerenzen-Algorithmus, Mitt. Inst. Angew.

Math. ETH, no. 7, Birkh¨auser, Basel, 1957.

[13] , Solution of eigenvalue problems with the LR-transformation, Nat.

Bur. Standards Appl. Math. Ser. 49 (1958) 47–81.

[14] F. Uhlig, Finding John Francis who found QR ﬁfty years ago, IMAGE, Bul-

letin of the International Linear Algebra Society 43 (2009) 19–21; available

at http://www.ilasic.math.uregina.ca/iic/IMAGE/IMAGES/image43.pdf.

[15] D. S. Watkins, The Matrix Eigenvalue Problem: GR and Krylov Subspace

Methods, Society for Industrial and Applied Mathematics, Philadelphia,

2007.

19

[16] , The QR algorithm revisited, SIAM Rev. 50 (2008) 133–145.

[17] , Fundamentals of Matrix Computations, 3rd ed., John Wiley, Hobo-

ken, NJ, 2010.

[18] D. S. Watkins and L. Elsner, Convergence of algorithms of decomposition

type for the eigenvalue problem, Linear Algebra Appl. 143 (1991) 19–47.

[19] J. H. Wilkinson, The evaluation of the zeros of ill-conditioned polynomials,

parts I and II, Numer. Math. 1 (1959) 150–166, 167–180.

[20] , The Algebraic Eigenvalue Problem, Oxford University Press, Ox-

ford, 1965.

[21] , The perﬁdious polynomial, in Studies in Numerical Analysis, MAA

Studies in Mathematics, vol. 24, G. H. Golub, ed., Mathematical Associa-

tion of America, Washington, DC, 1984, 1–28.

David S. Watkins received his B.A. from the University of California at Santa

Barbara in 1970, his M.Sc. from the University of Toronto in 1971, and his Ph.D.

from the University of Calgary in 1974 under the supervision of Peter Lancaster.

He enjoys ﬁguring out mathematics and explaining it to others in the simplest

and clearest terms. It happens that the mathematics that most frequently draws

his attention is related to matrix computations.

Department of Mathematics, Washington State University, Pullman, WA 99164-

3113

watkins@wsu.edu

20

himself writing programs for the Pegasus computer at the National Defense Research Corporation in London. A major problem at that time was the ﬂutter of aircraft wings, and for the analysis there was a need to compute eigenvalues. Francis was given the task of writing some matrix computation routines, including some eigenvalue programs. His interest in the problem grew far beyond what he had been assigned to do. He had obtained a copy of Heinz Rutishauser’s paper on the LR algorithm [13], he had attended seminar talks by James Wilkinson2 in which he learned about the advantages of computing with orthogonal matrices, and he saw the possibility of creating a superior method. He worked on this project on the side, with the tolerance (and perhaps support) of his supervisor, Christopher Strachey. In 1959 he created and tested the procedure that is now commonly known as the implicitly-shifted QR algorithm, which I prefer to call Francis’s algorithm. It became and has continued to be the big workhorse of eigensystem computations. A version of Francis’s algorithm was used by MATLAB when I asked it to compute the eigensystem of a 1000 × 1000 matrix on my laptop. Another version was used for the computation on the matrix of order 105 reported in [7]. Once Francis ﬁnished his work on the QR project, he moved on to other things and never looked back. In the infant computer industry there were many urgent tasks, for example, compiler development. Within a few years the importance of Francis’s algorithm was recognized, but by then Francis had left the world of numerical analysis. Only many years later did he ﬁnd out what a huge impact his algorithm had had.

2

Precursors of Francis’s Algorithm.

Most of us learned in an introductory linear algebra course that the way to compute eigenvalues is to form the characteristic polynomial and factor it to get the zeros. Because of the equivalence of the eigenvalue problem with that of ﬁnding the roots of a polynomial equation, it is diﬃcult to put a date on the beginning of the history of eigenvalue computations. The polynomial problem is quite old, and we all know something about its history. In particular, early in the 19th century Abel proved that there is no general formula (using only addition, subtraction, multiplication, division, and the extraction of roots) for the roots of a polynomial of degree ﬁve. This result, which is one of the crowning achievements of Galois theory, has the practical consequence for numerical analysts that all methods for computing eigenvalues are iterative. Direct methods, such as Gaussian elimination for solving a linear system Ax = b, do not exist for the eigenvalue problem. The early history of eigenvalue computation is intertwined with that of computing roots of polynomial equations. Eventually it was realized that forming the characteristic polynomial might not be a good idea, as the zeros of a polynomial are often extremely sensitive to small perturbations in the coeﬃcients

2 A few years later, Wilkinson wrote the highly inﬂuential book The Algebraic Eigenvalue Problem [20].

2

working in Leningrad. However. the complete set of poles of f is exactly the complete set of eigenvalues of A. Heinz Rutishauser: given a matrix A. where Q is unitary and R is upper triangular. every pole of f is an eigenvalue of A. Rutishauser came up with a new method. For our story. Thus the problem posed by Stiefel is essentially equivalent to the problem addressed by Hadamard. For more information about this seminal work see [8]. 12]. Moreover.[19. Rutishauser saw that there were multiple ways to organize and interpret his qd algorithm. which promises greater stability by replacing LR decompositions with QR decompositions. Then reverse the factors and multiply them back together to ˆ get a new matrix: RQ = A. Francis [3] and Kublanovskaya [10] independently proposed the QR algorithm. in which he proposed a method for computing the poles of a meromorphic function ∞ f (z) = k=0 sk . If the process is now iterated: Aj−1 = Qj Rj . Moreover. Francis knew about Rutishauser’s work. 2. k = 0. . It is easy to check that A = ˆ Q−1 AQ. which he called the quotient-diﬀerence (qd) algorithm [11. . in addition to the works of Rutishauser cited above. z k+1 (1) given the sequence of coeﬃcients (sk ). 21]. All modern methods for computing eigenvalues work directly with the matrix. so A is unitarily similar to A and therefore has the same eigenvalues. He reformulated it as a process of factorization and recombination of tridiagonal matrices. as the poles of f (like the zeros of a polynomial) can be extremely sensitive to small changes in the coeﬃcients. and so did Vera Kublanovskaya. for almost any choice of vectors x and y. ˆ This is one iteration of the basic QR algorithm. ˆ RQ = A. and from there he was able to generalize it to obtain the LR algorithm [13]. the eigenvalue problem has become a much more important part of computational mathematics than the polynomial root ﬁnding problem is. If the moments are used to deﬁne a function f as in (1). 1. . Brieﬂy we have A = QR. founder of the Institute for Applied Mathematics at ETH Z¨ rich. avoiding the characteristic polynomial altogether. an important early contribution was the 1892 dissertation of Jacques Hadamard [9]. . set a related u problem for his young assistant. determine its eigenvalues from the sequence of moments sk = y T Ak x. that solves this problem. . Some sixty years later Eduard Stiefel. Today we know that this is a bad way to compute eigenvalues. 3 Rj Qj = Aj . The basic QR algorithm is as follows: Factor your matrix A into a product A = QR. where x and y are (more or less) arbitrary vectors.

and Pegasus didn’t have much storage space. the double-shift QR algorithm [4]. lots of complex eigenvalues. one needs to use complex shifts and carry out (2) in complex arithmetic. 4 . complex. Francis coded his new algorithm in assembly language on the Pegasus computer and tried it out. When we teach students about Francis’s algorithm today. This posed a problem for Francis: Working in complex arithmetic on the Pegasus computer would have been really diﬃcult. the entire process can be done in real arithmetic.3 3 It is only fair to admit that his is not my ﬁrst attempt. then we introduce Francis’s algorithm. if we then do another iteration using the complex conjugate shift ρ. He found such a method. bypassing (2) altogether. The basic QR algorithm converges rather slowly. (2) Again it is easy to show that the matrices so produced are unitarily similar. If A is real. the resulting matrix A1 is. A remedy is to incorporate shifts of origin: Aj−1 − ρj I = Qj Rj . The earlier work [16]. eventually revealing the eigenvalues on the main diagonal.the sequence of unitarily similar matrices so produced will (usually) tend toward (block) triangular form. A primary objective of this paper is to demonstrate that this approach is ineﬃcient. Good choices of shifts ρj can improve the convergence rate dramatically. We will describe Francis’s double-shift QR algorithm below. and he looked for a method that would get him directly from A0 to A2 . failed to expose fully the role of Krylov subspaces (explained below) for the functioning of Francis’s algorithm. Using this new method. This algorithm does not cause convergence to triangular form. If iterate A0 is real and we do an iteration of (2) with a complex shift ρ. Moreover. and the reader will see that it is completely diﬀerent from the shifted QR algorithm (2). Instead pairs of complex conjugate eigenvalues emerge in 2 × 2 packets along the main diagonal of a block triangular matrix. which does two iterations of the QR algorithm implicitly. A good shift is one that approximates an eigenvalue well. and then we demonstrate equivalence using the so-called implicit-Q theorem. Many applications feature real matrices that have. we still follow this path. complex matrices require twice as much storage space as real matrices do. Rj Qj + ρj I = Aj . He therefore looked for a method that avoids complex arithmetic. the ﬂying horse was able to compute all of the eigenvalues of a 24 × 24 matrix in just over ten minutes [4]. We introduce the shifted QR algorithm (2) and talk about why it works. the resulting matrix A2 is again real. Francis knew this. however. of course. It is simpler to introduce Francis’s algorithm straightaway and explain why it works. entirely in real arithmetic. it is impossible for complex eigenvalues to emerge on the main diagonal of a real matrix. If one wants rapid convergence of complex eigenvalues. Francis had to expend some eﬀort to demonstrate that his new method does indeed eﬀect two steps of (2). It is my fervent hope that I will not view the current work as premature a few years from now. which I now view as premature. However.

A precise expression for the matrix Q is Q = I − 2uuT . Letting e1 denote the vector with a 1 in the ﬁrst position and zeros elsewhere. but everything we do here can easily be extended to the complex setting. ∗ ∗ ∗ ∗ ∗ Now consider an arbitrary nonzero x and a special y: x1 y1 x2 0 x = . as usual. A matrix A is called upper Hessenberg if it satisﬁes aij = 0 whenever i > j + 1. we need to introduce some basic tools of numerical linear algebra. y = .3 Reﬂectors and the Reduction to Hessenberg Form. where blank spots denote zeros and asterisks denote numbers that can have any real value. as it gives us a means of introducing large numbers of zeros. . orthogonal (QT = Q−1 ). For details on the practical construction and use of reﬂectors see [17] or [6]. For simplicity we will restrict our attention to the real case. . a 5 × 5 upper Hessenberg matrix has the form ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ . Let x ∈ Rn be any nonzero vector. Such an operator is clearly a boon to numerical linear algebraists. where u = (x − y)/ x − y 2. and symmetric (QT = Q). where y1 = ± x . Since Q preserves norms. In the world of matrix computations. so it is also Hermitian: Q∗ = Q. Then the linear transformation Q that reﬂects vectors through S clearly maps x to y and vice versa. . . it is unitary: Q∗ = Q−1 . Theorem 1. we know that there is a reﬂector Q such that Qx = y. where α = ± x 2 . For example. and let S denote the hyperplane (through the origin) orthogonal to x−y. xn 0 2. we can restate our result as follows. With reﬂectors in hand we can easily transform any matrix nearly all the way to triangular form. Before we can describe Francis’s algorithm. Then there is a reﬂector Q such that Qx = αe1 . reﬂectors are also known as Householder transformations. Viewing Q as a matrix. it is real. Let x and y be two distinct vectors in Rn with the same Euclidean norm: x 2 = y 2 . Since x and y have the same norm. for example. 5 . . It is also clearly involutory: Q−1 = Q.

multiplication. we must multiply on the right by Q1 (= Q−1 ). The third reﬂector Q3 creates zeros in the third column. The details are left for the reader (or see [17]. Because Q1 has the form (3). from which we can read oﬀ the eigenvalues. What we really want is an algorithm to get us all the way to upper triangular form. . namely addition. where Q is a product of n − 2 reﬂectors. The operations used are those required for setting up and applying reﬂectors. subtraction. it gives a ﬁnite algorithm (a direct method) for computing H and Q. = . that is. we will have produced a matrix H = Qn−2 · · · Q1 AQ1 · · · Qn−2 in upper Hessenberg form. 6 If we multiply A on the left by Q1 . we would be able to split each of the smaller problems into still smaller problems and eventually get to triangular form by a ﬁnite algorithm. the 1 transformation Q1 A → Q1 AQ1 does not alter the ﬁrst column. If we could do this. To complete a similarity transformation. This transformation does not disturb the zeros that were created in the ﬁrst column in the ﬁrst step. This would imply the existence of a formula for the zeros of a general nth degree polynomial. . Proof. . division. After n − 2 reﬂectors. The ﬁrst reﬂector Q1 has the form Q1 = 1 ˜ Q1 . we have Q−1 = Qn−2 · · · Q1 . for example). Every A ∈ Rn×n is orthogonally similar to an upper Hessenberg matrix: H = Q−1 AQ. . The proof of Theorem 2 is constructive. 0 an1 . But Hessenberg form is as far as we can get with a ﬁnite algorithm using the speciﬁed operations. (3) and is constructed so that Q2 Q1 AQ1 Q2 has zeros in the second column from the fourth position on. 17]. and H = Q−1 AQ. The second reﬂector Q2 has the form 1 1 Q2 = ˜ Q2 ˜ where Q1 is an (n − 1) × (n − 1) reﬂector such that ∗ a21 a31 0 ˜ Q1 . we would be able to split the eigenvalue problem into two smaller eigenvalue problems. we obtain a matrix Q1 A that has zeros in the ﬁrst column from the third entry on. and the extraction of square roots [6. and so on. . in violation of Galois theory. Thus Q1 AQ1 has the desired zeros in the ﬁrst column.Theorem 2. Letting Q = Q1 · · · Qn−2 . If we had a general procedure for creating even one more zero below the main diagonal.

4 Francis’s Algorithm. . . . n − 1. In principal m can be any positive integer. The rationale for shift selection will be explained later. complex shifts must be produced in conjugate pairs. Indeed.The ability to reduce a matrix to upper Hessenberg form is quite useful. then (except in some special cases that can be avoided) all iterates Aj are upper Hessenberg. An iteration of Francis’s algorithm of degree m begins by picking m shifts ρ1 . . This is an easy computation if m = 2. . If m > 1. we can break it apart further until we are ﬁnally left with proper upper Hessenberg matrices. so let us assume from the outset that A is upper Hessenberg. the Hessenberg form is absolutely essential. Whether we use this or some other strategy. the algorithm would not be competitive without it. Francis took m = 2. A= 0 A22 where A11 is k × k. therefore. the extension to complex matrices is straightforward. but in practice it should be fairly small. but the simplest is to take ρ1 . Francis’s algorithm just needs the ﬁrst column x = p(A)e1 . We will assume. . that A is properly upper Hessenberg. that A is properly upper Hessenberg from the outset. then A has the block-triangular form A11 A12 . this shifting strategy can produce complex shifts. There are many reasonable ways to choose shifts.j = 0 for j = 1. . ρm to be the eigenvalues of the m × m submatrix in the lower right-hand corner of A. If either of these is not properly upper Hessenberg. moreover. In fact. we will always insist that it have this property: when the matrix is real. . We can assume. This means that all of the subdiagonal entries of A are nonzero: aj+1. as both the QR decomposition and the subsequent RQ multiplication can be done in many fewer operations in the Hessenberg case. We continue to focus on the real case. This results in much more economical iterations. Suppose we want to ﬁnd the eigenvalues of some matrix A ∈ Rn×n . We do not actually compute p(A). . Now let p(A) = (A − ρm I) · · · (A − ρ1 I). as this would be too expensive. Thus an initial reduction to upper Hessenberg form is well worth the extra expense. We know from Theorem 2 that we can reduce A to upper Hessenberg form. ρm . If the QR algorithm (2) is initiated with a matrix A0 that is upper Hessenberg. 7 .k = 0. but they always occur in conjugate pairs. . . Thus the eigenvalue problem for A reduces to eigenvalue problems for the smaller upper Hessenberg matrices A11 and A22 . . 2. For Francis’s implicitly-shifted QR algorithm. if A is not properly upper Hessenberg. say ak+1.

the transformation A → Q0 A aﬀects only the ﬁrst m + 1 rows. Below row m + 2. Because Q0 has the form (4). the matrix Q0 AQ0 looks like ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ . The transformation Q0 A → Q0 AQ0 aﬀects only the ﬁrst m + 1 columns. since Q0 is an involution. ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ In this 7 × 7 matrix the bulge looks huge. The entire computation of x depends only on the ﬁrst m columns of A (and the shifts) and requires negligible computational eﬀort if m is small. Let x ∈ Rm+1 denote the vector consisting of the ﬁrst m + 1 entries of x (the ˜ nonzero part). Theorem 1 guarantees that there is an (m + 1) × (m + 1) reﬂector ˜ ˜ ˜ Q0 such that Q0 x = αe1 . where α = ± x 2 . This disturbs the Hessenberg form but only slightly. Now use Q0 to perform a similarity transformation: A → Q−1 AQ0 = 0 Q0 AQ0 . (A − ρ2 I)(A − ρ1 I)e1 has nonzero entries only in its ﬁrst three positions. Since the complex shifts occur in conjugate pairs. x has nonzero entries only in its ﬁrst m + 1 positions. say. Applying Q1 on the left. (4) Q0 = I Clearly Q0 x = αe1 and. 1). This begins with a transformation Q1 that acts only on rows 2 through n and creates the desired zeros in the ﬁrst column. Since am+3.m+1 = 0. you will see that the matrix is almost upper Hessenberg with a tiny bulge at the top.m+2 = 0. row m + 2 gets ﬁlled in by this transformation. It needs to act on rows 2 through m + 2 and create zeros in positions (3. additional nonzero entries are created in 8 . p(A) is real. we have a big block of zeros. Q0 e1 = α−1 x. In the case m = 3 and n = 7. Completing the similarity transformation. This recombines columns 2 through m + 2. It is easy to check that (A − ρ1 I)e1 has nonzero entries only in its ﬁrst two positions. and so on. . 1). If you envision. the scope of Q1 can be restricted a lot further in this case. The total eﬀect is that there is a bulge in the Hessenberg form. Therefore x is real. The computation is especially cheap because A is upper Hessenberg. the 100 × 100 case (keeping m = 3). we multiply by Q1 on the right. . . The rest of the Francis iteration consists of returning this matrix to upper Hessenberg form by the algorithm sketched in the proof of Theorem 2. we get a matrix Q1 Q0 AQ0 that has the Hessenberg form restored in the ﬁrst column. . Typically the ﬁrst m + 1 rows are completely ﬁlled in. Thus the ﬁrst column of Q0 is proportional to x. Since the ﬁrst column already consists of zeros after row m + 2. and these remain zero. (m + 2. As a consequence. Since am+2. Let Q0 be an n × n reﬂector given ˜ by ˜ Q0 .which is easily computed by m successive matrix-vector multiplications.

then. For obvious reasons. 4. we have ˆ A = Qn−2 · · · Q1 Q0 AQ0 Q1 · · · Qn−2 . Qn−2 are the transformations that chase it. Pick some shifts ρ1 . 1. 9 . ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ The bulge has not gotten any smaller. 2). Francis’s algorithm is sometimes referred to as a bulge chasing algorithm. we have ˆ A = Q−1 AQ. . . and Q1 . An iteration of Francis’s algorithm of degree m can be summarized brieﬂy as follows. should we call it the QR algorithm? The name is misleading. In the case m = 3 and n = 7 the matrix Q1 Q0 AQ0 Q1 looks like ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ . Notice that. The next transformation will push the bulge over and down one more position. Do a similarity transformation A → Q0 AQ0 . Why. and for this reason I prefer to call it Francis’s algorithm. Compute x = p(A)e1 = (A − ρm I) · · · (A − ρ1 I)e1 . .positions (m + 3. . . and so on. where Q0 is the transformation that creates the bulge. At this point. Return the matrix to upper Hessenberg form by chasing the bulge. This establishes the pattern for the process. Thinking again of the 100 × 100 case. 5. but it has been pushed one position to the right and downward. 2. . (m + 3. Compute a reﬂector Q0 whose ﬁrst column is proportional to x. . . although this algorithm is commonly known as the implicitlyshifted QR algorithm. 3. we see that a long sequence of such transformations will chase the bulge down through the matrix until it is ﬁnally pushed oﬀ the bottom. . m + 1). ρm . . its execution does not require any QR decompositions. . Hessenberg form will have been restored and the Francis iteration will be complete. creating a bulge. ˆ Letting A denote the ﬁnal result of the Francis iteration. Letting Q = Q0 Q1 · · · Qn−2 . .

where Qe1 = βp(A)e1 = β(A − ρm I) · · · (A − ρ1 I)e1 for some nonzero β. . In the kdimensional version. .) We can then focus on the remaining (n − m) × (n − m) submatrix A11 and go after another set of m eigenvalues. . . We summarize these ﬁndings in a theorem. A Francis iteration of degree m with shifts ρ1 . Theorem 3. 10 . . In words. an−m+1. . . The (small) m×m matrix A22 can be resolved into m eigenvalues with negligible work. the ﬁrst column of Q is proportional to the ﬁrst column of p(A). . To avoid complications in the discussion. we pick a k-dimensional subspace S of Rn (or perhaps Cn ) and. . Thus Qi e1 = e1 . for example. . we recall that each of the other Qi has e1 as its ﬁrst column. we will make some simplifying assumptions. (5) Here by AS we mean {Ax | x ∈ S}.n−m → 0 quadratically. AS. which is known as subspace iteration. We conclude that Qe1 = Q0 Q1 · · · Qn−2 e1 = Q0 e1 = βx. through repeated multiplication by A. . Repeated Francis iterations with well-chosen shifts typically result in rapid convergence.4 In a few iterations. At the core of Francis’s algorithm lies the humble power method. Subspace Iteration. ρm eﬀects an orthogonal similarity transformation ˆ A = Q−1 AQ. in the sense that an−m+1. i = 1. resulting in an−k+1. It often happens that some shifts are much better than others. In the next section we will see why this procedure is so powerful.n−m will be small enough to be considered zero. (Think of the case m = 2. n − 2. A2 S. Looking back to the proof of Theorem 2. We will suppose that all of the 4 This is a simpliﬁcation.n−k → 0 for some k < m. 5 Why Francis’s Algorithm Works.Recall that Q0 was built in such a way that Q0 e1 = βx = βp(A)e1 . We can then deﬂate the problem: A= A11 0 A12 A22 . A3 S. . build the sequence of subspaces S.

. . at least for some values of k. . . . . If the vector v has coordinate vector x before the change of coordinate system. Each vector in Rn is the coordinate vector of some vector v in the vector space on which the operator acts.spaces in the sequence have the same dimension k. As a consequence. . . . 5 Of course it can happen that the dimension decreases in the course of the iterations. 6 The non-diagonalizable case is more complicated but leads to similar conclusions. In fact. Any vector x can be expressed as a linear combination x = c1 v1 + c2 v2 + · · · + cn vn for some (unknown) c1 . for example. . 15]. 11 . See [18] or [15]. . The eigenvalues of p(A) are p(λ1 ). If we renumber them so that | p(λ1 ) | ≥ · · · ≥ | p(λn ) |. ek }. ˆ As we all know. . . Often the ratio | λk+1 /λk | will be close to 1. it will have coordinate vector Q−1 x afterwards. 3 . . . . . (6) If p(z) = (z − ρm ) · · · (z − ρ1 ) is a polynomial of degree m. Now we have a bit more ﬂexibility. ρm . p(A)S. the sequence (5) will converge to the k-dimensional invariant subspace spanned by v1 . . . p(A)2 S. vn and associated eigenvalues λ1 . . it is good news [15]. The vectors p(A)e1 . . the rate of convergence of (6) will be | p(λk+1 )/p(λk ) |. In an eﬀort to speed up convergence. each step of (6) amounts to m steps of (5) with shifts ρ1 .5 We also assume that A is a diagonalizable matrix with n linearly independent eigenvectors v1 . . . . . . . . Perhaps by a wise choice choice of shifts ρ1 . If | λk | > | λk+1 |. . . . . the components in the directions v1 . . . which means that the error is reduced by a factor of approximately | λk+1 /λk | on each iteration [18. . λn . A and A operator with respect to two diﬀerent bases. . . n 1 k k+1 j = 1. . .6 Sort the eigenvalues and eigenvectors so that | λ1 | ≥ | λ2 | ≥ · · · ≥ | λn |. we could consider replacing A by some p(A) in (5). . but we will not discuss that case here. . . vn as j increases. Subspace Iteration with Changes of Coordinate System. Thus clearly Aj x = c1 λj v1 + · · · + ck λj vk + ck+1 λj vk+1 + · · · + cn λj vn . unless our choice of S was very unlucky. cn . vk . ρm . . p(A)ek are a basis for the space p(A)E k . a similarity transformation A = Q−1 AQ is just a change of ˆ are two matrices that represent the same linear coordinate system. Now consider a step of subspace iteration applied to the special subspace E k = span{e1 . p(λn ). . . we can make this ratio small. where ei is the standard basis vector with a 1 in the ith position and zeros elsewhere. . . vk will grow relative to the components in the directions vk+1 . p(A)3 S. 2. The convergence is linear with ratio | λk+1 /λk |. . giving the iteration S. so convergence will be slow. . . This does not cause any problems for us. .

. Of course. Then we can do another change ˆ k back to E k . . Under the change of coordinate system. . . 12 . . . If one thinks about implementing subspace iteration with changes of coordinate system in a straightforward manner. Eventually A21 will get small enough that we can set it to zero and split the eigenvalue problem into two smaller problems. Let us see what this change of basis does to the space p(A)E k . qn together form an orthonormal basis of Rn . not just for one k. (j) where A21 → 0 linearly with ratio | p(λk+1 )/p(λk ) |. In practice we will have Aj = (j) A11 (j) A21 (j) A12 (j) A22 (j) . it appears to be a fairly expensive procedure. qk be an orthonormal basis for p(A)E k . . . If we continue to iterate of coordinate system and map p(A)E in this manner. . This amazing procedure eﬀects subspace iteration with changes of coordinate system. ek . . Thus the change of coordinate system maps p(A)E k back to E k . Now use Q to make a change of coordinate system ˆ A = Q−1 AQ = QT AQ. . . we produce a sequence of orthogonally similar matrices (Aj ) through successive changes of coordinate system. E k comes closer and closer to being invariant under Aj . which could be obtained by some variant of the Gram-Schmidt process. Since q1 . Q is an orthogonal matrix. . . qn are orthonormal. qk . . applying p(A) to E k . . . QT qk are exactly e1 . namely Francis’s algorithm. . if we want to do another step of subspace iteration. where A11 is k × k. but for k = 1. . At this point we are not yet ready to demonstrate this fact. To this end we check the basis vectors q1 . . Fortunately there is a way to implement this method on upper Hessenberg matrices at reasonable computational cost. . the vectors QT q1 . Because the columns of Q are the orthonormal vectors q1 . . . . n − 1 all at once. This is a version of subspace iteration for which the subspace stays ﬁxed and the matrix changes. . . Let qk+1 . QT qk . as outlined above. Now. . . these get mapped to QT q1 . we never attain it exactly. . . . . . . and we are always dealing with the same subspace E k . for example. . qn be additional orthonormal vectors such that q1 . qn . we just approach invariance. and let Q = q1 · · · qn ∈ Rn×n . but we can get a glimpse at what is going on. What does convergence mean in this case? As j increases. .Let q1 . we can work in the ˆ new coordinate system. . . . . The special space E k is invariant under Aj if and only if Aj has the block-triangular form Aj = (j) A11 0 (j) A12 (j) A22 (j) .

The convergence is linear with ratio | p(λ2 )/p(λ1 ) |. . k = 1. . n. the sequence of Krylov subspaces associated with x is deﬁned by K1 (A. Ax. we must bring one more concept into play. mapping E 1 = span{e1 } to p(A)E 1 . . . Thus if we perform iterations repeatedly with the same shifts (the eﬀect of changing shifts will be discussed later). x) K2 (A. . Ax. . . the sequence of iterates (Aj ) so produced will converge linearly to the form ∗ ··· ∗ λ1 0 ∗ ··· ∗ . (j) Krylov Subspaces. this is followed by a change of ˆ coordinate system A = Q−1 AQ in which span{q1 } = p(A)E 1 . = span{x. Theorem 4. or subspace iteration with k = 1. In the special case Q = I. . . Ak−1 x . = span x. . . . . . 2. Ax}. The ﬁrst is that wherever there are Hessenberg matrices. Theorem 5. it reduces to Theorem 4. e1 ). Then span{q1 . . . . Suppose H = Q−1 AQ. . . In our application. this just means that a21 → 0. k = 1. . Let A be properly upper Hessenberg. according to Theorem 3. The proof is an easy exercise. Given a nonzero vector x. . . . 13 . 0 ∗ ··· ∗ Since the iterates are upper Hessenberg. . and in general Kk (A. . . . . Q is orthogonal.Francis’s algorithm begins by computing the vector p(A)e1 . there are Krylov subspaces lurking in the background. To get the complete picture. This is exactly the case k = 1 of subspace iteration with a change of coordinate system. Let q1 . n. x) = span x. A2 x . k = 1. Then. This is a step of the power method. x) = span{x}. . but this theorem is valid for any nonsingular Q. ek } = Kk (A. . . qk } = Kk (A. We need to make a couple of observations about Krylov subspaces. q1 ). Then span{e1 . . . . x) K3 (A. . This is just the tip of the iceberg. n. qn denote the columns of Q. where H is properly upper Hessenberg. . .

. q1 ). . We use induction on k. p(A)x) is the result of one step of subspace iteration on the kth space Kk (A. Moreover. . p(A)x). qk+1 } ⊆ Kk+1 (A. Equating kth columns of this equation we have. As we have just seen. rewrite the similarity transformation as AQ = QH. q1 ). the step of the power method induces subspace iterations on the corresponding Krylov subspaces: Kk (A. The vector x generates a whole sequence of Krylov subspaces Kk (A. n. x). . Each iteration of Francis’s algorithm executes a step of the power method e1 → p(A)e1 followed by a change of coordinate system. the kth space Kk (A. Since q1 . Then. . as a consequence of the equation Ap(A) = p(A)A. . p(A)e1 ). . . . k = 1. and the dimension of Kk+1 (A. Now. we can say that each nonzero vector x contains the information to build a whole sequence of Krylov subspaces K1 (A. We can rewrite this as k qk+1 hk+1. qk+1 are linearly independent. . namely the one generated by p(A)x. . q1 ). x) = Kk (A. and p(A)x generates the sequence Kk (A. k = 1. It is the central equation of the Arnoldi process. . x). The sum stops at k + 1 because H is upper Hessenberg. For k = 1. for the induction step. n. Thus the result of a step of subspace iteration on a Krylov subspace generated by x is another Krylov subspace. . q1 ). . e1 ) → Kk (A. Thus span{q1 . It follows that the power method x → p(A)x contains all the information about a whole sequence of nested subspace iterations. . . these subspaces must be equal. One easily checks that p(A)Kk (A. K3 (A. the result holds trivially. q1 ). n. since hk+1. . q1 ) is at most k + 1. equation (7) implies that qk+1 ∈ Kk+1 (A. . k = 1. . x). Our second observation about Krylov subspaces is in connection with subspace iteration. . x). Given a matrix A. Since qk ∈ Kk (A. . it is clear that Aqk ∈ Kk+1 (A. Now consider a step of subspace iteration applied to a Krylov subspace.k = Aqk − i=1 qi hik . .Proof. in the following sense. k+1 Aqk = i=1 qi hik . for k = 1.k = 0. (7) Our induction hypothesis is that span{q1 . We remark in passing that equation (7) has computational as well as theoretical importance. . qk } = Kk (A. . . . . . x). This is not just 14 . one of the most important algorithms for large. . . p(A)x). sparse matrix computations [1]. . K2 (A. Back to Francis’s Algorithm. . n − 1. which we take for granted as our given object of study.

for k = 1. . For each eigenvalue that is not well approximated by a shift. These equations show that Francis’s algorithm eﬀects subspace iteration with a change of coordinate system for dimensions k = 1. . for each eigenvalue λk that is well approximated by a shift. ˆ It can happen that A has one or more zeros on the subdiagonal. . where each zero of p approximates an eigenvalue well. Thus all of the ratios | p(λk+1 )/p(λk ) |. span{q1 . . If any one of them is small. a Francis iteration makes a change of coordinate system A = Q−1 AQ. the Krylov subspaces in question reside in the columns of the transforming matrices. e1 ) p(A)span{e1 . The convergence results we have so far are based on the assumption that we are going to pick some shifts ρ1 . There are m such eigenvalues. qk } = = = = Kk (A. . 15 . . Of course. p(A)e1 ) p(A)Kk (A. k = 1. . In practice. n. | p(λk ) | will be tiny. Since Francis’s algorithm operates on properly upper Hessenberg matrices. . Now suppose we pick some shifts ρ1 . . are important. We have p(z) = (z − ρm ) · · · (z − ρ1 ). we have ak+1.some theoretical action. then one of the subdiagonal entries will converge rapidly to zero. say) and is not nearly so close to any of the other eigenvalues. . and we can apply Theorem 5. . n − 1. . . Assume the m shifts approximate m diﬀerent eigenvalues. so we might as well use them. then. Then. This is a rare and lucky event. . and do a sequence of Francis iterations with this ﬁxed p to produce the sequence (j) (Aj ). does one choose good shifts? To answer this question we start with a thought experiment. . . let p(z) = (z − ρm ) · · · (z − ρ1 ). Suppose that we are somehow able to ﬁnd m shifts that are excellent in the sense that each of them approximates one eigenvalue well (good to several decimal places. How. Let us assume we have not been so lucky. . . Choice of Shifts. . . which allows us to reduce the problem immediately to two smaller eigenvalue problems. . allowing us to break the problem into smaller eigenvalue problems. ρm in advance and use them over and over again. for each k for which | p(λk+1 )/p(λk ) | < 1. n − 1. . ˆ Indeed. Then.k → 0 linearly with rate | p(λk+1 )/p(λk ) |. . ρm . . along with Theorems 3 and 4. . this is not what really happens. . Then ˆ A is properly upper Hessenberg. . we get access to better and better shifts as we proceed. to deduce that. . q1 ) Kk (A. ek }.

for example. A great deal of testing and experimentation has led to the following empirical conclusions.n−m+1 ··· an.n . and progress will be slow. 10−12 . 15].n−m+1 · · · an−m+1. so variants have been introduced. After a few (or sometimes more) iterations.n−m | could be approximately 10−3 .n−m |. . eventually even substantially better than the shifts that were used to ﬁnd them. to take the eigenvalues of (8) as new shifts. they will begin to home in on eigenvalues and the convergence rate will improve. Thus. With new shifts chosen on each iteration. (j) (j) an. Then we can deﬂate the problem and go after the next set of m eigenvalues. The shifts should be updated on every iteration.7 It makes sense. This will result in a reduction of the ratio | p(λn−m+1 )/p(λn−m ) | and even faster convergence. If we renumber the eigenvalues so that | p(λ1 ) | ≥ | p(λ2 ) | ≥ · · · ≥ | p(λn ) |. The strategy of using the eigenvalues of (8) as shifts does not always work. Thus successive values of | an−m+1. This thought experiment shows that at some point it makes sense to use the eigenvalues of (8) as shifts. Very quickly this number gets small enough that we can declare it to be zero and split oﬀ an m × m submatrix and m eigenvalues. the bottom m × m submatrix (j) (j) an−m+1. (j) 16 . . . the method normally converges quadratically [18.n (j) will be nearly separated from the rest of the matrix. And it is safe to say that all shifting strategies now in use are indeed variants of this simple strategy. exactly m of the numbers | p(λk ) | are small. and | p(λn−m+1 )/p(λn−m ) | ≪ 1. Once this entry gets very small. There is one caveat.n−m | will be roughly the square of | an−m+1. and it is okay to use the eigenvalues of (8) as shifts right from the very start. we will have | p(λn−m ) | ≫ | p(λn−m+1 ) |. but several questions remain. Certainly nobody has been 7 The eigenvalues that are well approximated by (8) are indeed the same as the eigenvalues that are well approximated by the shifts. then. How should the shifts be chosen initially? At what point should we switch to the strategy of using eigenvalues of (8) as shifts? How often should we update the shifts? The answers to these questions were not at all evident to Francis and the other eigenvalue computation pioneers. . (8) . This will cause an−m+1.| p(λk ) | will not be small. but they might not be unbreakable. 10−6 . Its eigenvalues will be excellent approximations to eigenvalues of A. 10−24 . replacing the old ones. At some future point it would make sense to replace these shifts by even better ones to get even better convergence.n−m to converge to zero rapidly. At ﬁrst they will be poor approximations. The stategies used by the codes in MATLAB and other modern software are quite good. which (j+1) (j) means that | an−m+1.

. This is a useful tool for formal convergence proofs. In fact. 15. all formal convergence results for QR and related algorithms that I am aware of make use of this tool or a variant of it. . We have described Francis’s algorithm and explained why it works without referring to (2) in any way. In words. Francis’s algorithm can be adapted in straightforward ways to the computation of eigenvectors and invariant subspaces as well [6. Consider a Francis iteration ˆ A = Q−1 AQ of degree m with shifts ρ1 . Francis’s algorithm is commonly known as the implicitly-shifted QR algorithm. It is a common phenomenon that numerical methods work better than we can prove they work. 4] accumulated hundreds of citations. but by the year 2000 nobody in the community of numerical analysts knew what had become of the man or could recall ever having met him. One might eventually like to see the connection between Francis’s algorithm and the QR decomposition: Theorem 6. ρm . for which ρ1 . Next-to-Last Words. . 17 . . . . However. as usual. It is an open question to come up with a shifting strategy that provably always works and normally yields rapid convergence. In this paper we have focused on computation of eigenvalues. “The QR algorithm” was on that list. Francis’s work had become famous. . The proof can be found in [17]. for example. Theorem 6 is also valid for a long sequence of Francis iterations. the transforming matrix Q of a Francis iteration is exactly the orthogonal factor in the QR decomposition of p(A). Francis’s two papers [3. Francis’s algorithm was ﬁrmly established as the most important algorithm for the eigenvalue problem. It was even speculated that he had died. Let p(z) = (z − ρm ) · · · (z − ρ1 ). ρm is the long list of all shifts used in all iterations (m could be a million). but it bears no resemblance to the basic QR algorithm (2). As a part of the celebration of the arrival of the year 2000. 17]. Then there is an upper-triangular matrix R such that p(A) = QR.able to prove that they are. In Francis’s algorithm the introduction of dynamic shifting (new shifts on each iteration) dramatically improves the convergence rate but at the same time makes the analysis much more diﬃcult. Dongarra and Sullivan [2] published a list of the top ten algorithms of the twentieth century. . 6 What Became of John Francis? From about 1970 on.

Photo: Frank Uhlig. No way was I going to pass up the opportunity to meet the man and speak in the same minisymposium with him. See Uhlig’s papers [5. Golub reported that Francis had been completely unaware of the huge impact of his work. 18 . Francis remains in great shape at age 75. In August of 2007 Golub visited Francis at his home.Figure 1: John Francis speaks at the University of Strathclyde. John Francis changed all that. With some help from Andy Wathen. in June of 2009. His mind is sharp. Uhlig organized a minisymposium at that conference honoring Francis and discussing his work and the related foundational work of Rutishauser and Kublanovskaya. We enjoyed his reminiscences very much. With the help of the internet they were able to ﬁnd him. This was just a few months before Gene’s untimely death in November. The following summer Uhlig was able to visit Francis and chat with him several times over the course of a few days. As the accompanying photograph shows. working independently. Glasgow. in June 2009. 14] for more information about Francis’s life and career. I displayed this photograph and remarked that Francis has done a better job than I have at keeping his weight under control over the years. Afterward Randy LeVeque explained it to me: Francis knows how to chase the bulge. Uhlig managed to persuade Francis to attend and speak at the Biennial Numerical Analysis conference at the University of Strathclyde. I had known of the Biennial Numerical Analysis meetings in Scotland for many years and had many times thought of attending. Glasgow. but I had never actually gotten around to doing so. Gene Golub and Frank Uhlig. and he was able to recall a lot about the QR algorithm and how it came to be. A few months later I spoke about Francis and his algorithm at the Paciﬁc Northwest Numerical Analysis Seminar at the University of British Columbia. alive and well. retired in the South of England. Two members of our community. began to search for him.

or. K˚ agstr¨m. 32 (2010) 2345–2378.References [1] Z. Comput. 49 (1958) 47–81. Numer. Solution of eigenvalue problems with the LR-transformation. [7] R. 3rd ed. Inst. Eng. J. G. Parlett. Angew. Society for Industrial and Applied Mathematics. and Math. Kublanovskaya. [2] J. Math. [15] D. J. Computer J. 19 . Numer. 29 (2009) 467–485. Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Phys. 1957. Math. Math. available at http://www. Watkins. A novel parallel QR algorithm o for hybrid distributed memory HPC systems. Matrix Computations. Basel. 2000. Bai. Golub and C.uregina. Hadamard. Kressner. F. Sullivan. Demmel. Z. (to appear). ETH. [5] G.ilasic. Van Loan. van der Vorst. H. The QR transformation. and D. [8] M. Ser. Dongarra and F. [9] J. Mitt. IMAGE. Finding John Francis who found QR ﬁfty years ago. [11] H. Computer J.. [3] J. [12] [13] . A. Essai sur l’´tude des fonctions donn´es par leur e e d´veloppement de Taylor. Gutknecht and B. part I. 1 (1962) 637– 657. 7. [6] G. no. Philadelphia. Philadelphia. Angew. From qd to LR. Phys. Nat. Anal. Der Quotienten-Diﬀerenzen-Algorithmus. and H. Anal. USSR Comput. 4 (1892) 101–182. Birkh¨user. Golub and F. S. Uhlig. Bulletin of the International Linear Algebra Society 43 (2009) 19–21. J. On some algorithms for the solution of the complete eigenvalue problem.. The top 10 algorithms. Ruhe. 2 (2000) 22–23. H. MD. Dongarra. Sci. The QR transformation. How were the qd and LR algorithms discovered? IMA J. [14] F. Sci. N. 4 (1961) 265–272. part II. IMA J.math. The QR algorithm: 50 years later its genesis by John Francis and Vera Kublanovskaya and subsequent developments. B. Baltimore. 4 (1961) 332–345. Society for Industrial and Applied Mathematics. Johns Hopkins University Press. [4] . Standards Appl. Bur. a . eds. Francis. Der Quotienten-Diﬀerenzen-Algorithmus. Rutishauser.ca/iic/IMAGE/IMAGES/image43. 2007. The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods. 1996. Stat. Uhlig. Comput. F. Math. Math. Granat. SIAM J.pdf. 5 (1954) 233–251. e [10] V.

The evaluation of the zeros of ill-conditioned polynomials. Watkins received his B. S. Golub. 2010. H. David S. [18] D.A. . Pullman. 3rd ed.. Mathematical Association of America.edu 20 . The Algebraic Eigenvalue Problem. 167–180. Elsner. 24. parts I and II. G. [20] [21] . Oxford. Oxford University Press. ed. his M. 50 (2008) 133–145. Wilkinson. He enjoys ﬁguring out mathematics and explaining it to others in the simplest and clearest terms. John Wiley.. from the University of Calgary in 1974 under the supervision of Peter Lancaster. in Studies in Numerical Analysis. [19] J.[16] [17] . and his Ph. Watkins and L.D. vol. Fundamentals of Matrix Computations. Convergence of algorithms of decomposition type for the eigenvalue problem. Numer. It happens that the mathematics that most frequently draws his attention is related to matrix computations. Linear Algebra Appl. SIAM Rev. NJ. DC. H. WA 991643113 watkins@wsu. 1 (1959) 150–166. Washington.Sc. Department of Mathematics. from the University of California at Santa Barbara in 1970. Washington State University. 1965. from the University of Toronto in 1971. The QR algorithm revisited. 1–28. Math. 143 (1991) 19–47. The perﬁdious polynomial. Hoboken. MAA Studies in Mathematics. 1984. .

Sign up to vote on this title

UsefulNot useful- dqds_v2
- Appendix C- Modes of Linear Systems
- s2notes
- lag18
- Tute week 7
- Envrnmntl Unit 10
- UT Dallas Syllabus for math2418.5u1.09u taught by Cyrus Malek (sirous)
- AddMinEnt
- Math 275 Spring 2012 3-7
- 134f6f7f597d3d3922053e292e021a18
- Shaft FEM.pdf
- Linear Algebra Study Information 2013
- Prac 1 Control
- 2012
- UT Dallas Syllabus for math2418.001.08f taught by (xxx)
- Lecture 4.pdf
- 7 Matrices
- GATE Mechanical Solved Paper (2000-2013)
- Subjective Emotions vs. Verbalizable Emotions in Web Texts
- Dumpster Lab
- AIAA-54293-921
- GanstererPaper
- Mws Gen Sle Spe Eigenvalues
- Blind Source Separation
- J. J. Sakurai, Jim J. Napolitano-Instructor’s Solutions Manual to Modern Quantum Mechanics (2nd Edition)-Pearson (2010)
- Direction Finding Using 3d Array Geometry
- Feynrules tutorial
- Calibrating Cap_Floors Volatilities
- Matrix and Determinants(2012)
- 07383724ewrewr
- Francis Watkins

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.