You are on page 1of 13

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, VOL.

19, 859-871 (1983)

LANCZOS VERSUS SUBSPACE ITERATION


FOR SOLUTION OF EIGENVALUE PROBLEMS
BAHRAM NOUR-OMID? BERESFORD N. PARLETTS AND ROBERT L. TAYLOR+
University of California,Berkeley, California, U.S.A.

SUMMARY
In this paper we consider solution of the eigen problem in structural analysis using a recent version
of the Lanczos method and the subspace method. The two methods are applied to examples and we
conclude that the Lanczos method has advantages which are too good to overlook.

INTRODUCTION
Subspace iteration has become a standard tool for solving the large eigen problems which
occur in the dynamic analysis of structures, typically
[K -AM]+ =0 (1)
where K and M are, respectively, the stiffness and mass matrices and (A, +) is an eigen pair.
Our goal is not to do a definitive comparison of subspace iteration with the Lanczos algorithm
but simply to point out that, when used properly, that latest versions of Lanczos''2'6 are too
good to ignore. While our comparisons (Figures 2, 4 and 6 ) considered only small examples,
the trends (number of matrix-vector operations) for the two methods have already become
clear. A requirement for very large problems is the need to solve equations whose Choleski
factors are larger than the available core. We discuss further the solution of large problems
in the sections. 'Storage requirements' and 'A larger example'.
It is all too easy to load the dice, even inadvertently, against one side in a comparison of
numberical methods. We tried to be aware of this throughout the study.

LANCZOS
The Lanczos algorithm is less familiar to structural engineers than is subspace iteration. A
full discussion can be found in Reference 1. Here we will just say that the simple Lanczos
algorithm behaves as though it retained-and used to great advantage-every vector computed
in a run of the power method (Stodola's iteration). In fact, this information is coded in a
tridiagonal matrix and it is some of the eigenvalues of this tridiagonal matrix which converge
to the eigenvalues of the big matrix. Convergence of the outer eigenvalues occurs surprisingly
fast. Storage needs for Lanczos are considerably less than for subspace iteration.
There was a defect in the Lanczos algorithm which may have prevented its acceptance:
when should it be stopped? In contrast to subspace iteration there is no need to compute the

t Civil Engineering Department


$ Mathematics Department.

0029-5981/83/060859-13$01.30 Received 5 August 1980


@ 1983 by John Wiley & Sons, Ltd. Revised 11 June 1981 and 5 May 1982
860 B. NOUR-OMID, B. N. PARLETT AND R. L. TAYLOR

smallest p eigenvalues of the tridiagonal T at each step. Even worse, only some of the
eigenvalues of T approximate those of (K, M). These blemishes have been overcome (see
Chapter 13 in Reference 1) and now the Lanczos iteration will be halted at the earliest possible
step. The user does not have to get involved beyond designating the desired accuracy in the
eigen vectors. Of course, the eigenvalues will be far more accurate.
Block versions of Lanczos are more complicated and more powerful, just as subspace
iteration is more powerful than simple inverse iteration. Naturally, block Lanczos produces
a block tridiagonal matrix. However, the fact that Lanczos can be used with blocks of any
convenient size must not distract attention from the power of Lanczos with blocks of size 1,
2 or 3.
The program we obtained was aimed at big problems ( n > 1000) and it allows the user
to choose the block size. The smallest effective size is 2, and this is what we used on our small
problems. The results would have been still more favourable to Lanczos had we used a simple
version, i.e. block size = 1. The code has been used at Oak Ridge National Laboratory to
compute 89 eigenvalues when n = 3631.
We turn now to an important misconception concerning the Lanczos algorithm. It is usually
presented as a way to compute eigenvalues and eigenvectors of a symmetric matrix A. Although
the algorithm can be formulated to work on a pair (K, M), some form of inversion or factoring
of a matrix, either explicitly or implicitly, is required. If M is positive definite and of narrow
bandwidth, then the usual recommendation is to factor M once and for all as M =LILT (this
is the Choleski factorization, not the square root) and then take L;'K(L;')T as A. On the
other hand, it is always possible to factor K into L2LT and then find the largest few eigenvalues
of L;'M(Lgl)T (of course, the products given above are not computed explicitly because that
would destroy sparsity). Neither of these techniques is really satisfactory on realistic problems.
There has been no mention of shifts and it is sometimes asserted that Lanczos does not or
cannot take advantage of good shifts. The truth is a bit more subtle. Lanczos is so powerful
that, indeed, the eigenvalue problem can be solved with either L;'K(L;')T or L;'M(L;')T.
Nevertheless if it is possible to factor K - a M for a well chosen value of cr, then it pays
handsomely to run Lanczos with

A =M~/~(L;~)TD-'L;~M~/* (2)
where
K - a~ = L,DL: (3)
Note that M need not be positive definite.
This important way of shifting was introduced by Ericsson and Ruhe3 as the specfral
transformation.The eigenvalues aiof A are related to the eigenvalues hi of (K, M) by

i l/(Ai
( ~ = -u) (4)

The message is that although shifts can be used, fewer of them are needed than with subspace
iteration.
There were defects in the Lanczos algorithm which may have delayed its acceptance. The
complete loss of orthogonality among the Lanczos vectors is not a disaster. It serves only to
delay convergence somewhat beyond the theoretical optimum, not to prevent it. This was the
contribution of Paige's thesis, in 1971. The technique of selective orthognalization keeps the
Lanczos vectors robustly linearly independent and keeps the number of iterations close to the
minimum (see Reference 6 for more details). More troublesome is, or was, the choice of the
LANCZOS V. SUBSPACE ITERATION 861

right moment to stop. It is overkill to compute all the eigenvalues of the tridiagonal matrix at
each step; only a few are needed (see Reference 1 for more details).
For the sake of completeness we give below the algorithm we prefer. It is not quite the
same as the one suggested by Ericsson and Ruhe. When M is not diagonal, then our algorithm
avoids the fill-in which comes with factoring M.

LANSO
Pick shift u and factor (K -uM)= LDTT. Choose 4 0 = 0, a starting vector ro, and PI =
(rEMro)1’2.For j = 1 , 2 , . . . , lanmax do
(a) if so indicated then orthogonalize against a computed eigenvector y and recompute PI,
(b) 41 + r,-1/&.
(c) solve (K - vM)r, = Mq, for r,.
(4 r, + r, -q,-lPl.
(el a,+ rT(Mq, ).
(f) rl rJ - qIaJ*
(g) (rT(Mr,))”*.If small compared to IaIIand P, then repeat (e), (f), (g) once.
(h) update certain eigenvalues of T, and their error bounds. If enough eigenvalues have
converged then quit.
(i) if error bound ((E)IIT,II then compute the eigenvectors of T,, recall Lanczos vectors in
sequence, to accumulate eigenvector y, set flag for (a).
T, is the tridiagonal matrix with diagonal (al,. . . ,a,)and off diagonal (P2,. . . ,P,).

SUBSPACE ITERATION (SI)


This approach is widely used by structural engineers for extracting some natural modes and
frequencies of dynamic systems. There are many implementations available. The one used to
obtain our graphs is described in reference 5 and used in several SAP programs. As such, it
is typical of current usage. On the other hand it does not represent the latest research on
subspace iteration.
The very success of the method tempted users to tackle more and more demanding problems.
This experience brought out limitations in the early programs and prompted further reserarch.
The most sophisticated version of which we know was described by Bathe and Ramaswamy4
in 1979. It is used in the ADINA package. For the sake of historical scholarship it should be
mentioned that Reference 4 rediscovred several techniques which had already been discussed
in the numerical analysis literature, where subspace iteration is usually called simultaneous
iteration.
For further discussion on Reference 4, see the section ‘A larger example’.

STORAGE REQUIREMENTS
For large problems it is essential to use secondary stoarge devices (such as discs or magnetic
tapes) and the variety of computing systems makes it difficult to generalize about their use.
As far as we can see, the use of secondary storage favours the latest versions of Lanczos
even more strongly than do our in-core results. There are a significant number of intermediate
size problems in which the factors of K and M can be held in core but not all the subspace
vectors. All Lanczos needs is to write out an old Lanczos vector at each step and then, from
862 B. NOUR-OMID, B. N. PARLETT A N D R. L. TAYLOR

time to time, read in all the Lanczos vectors successively. In addition to the factors of K - v M ,
Lanczos must hold in core only the tridiagonal matrix and five working vectors. When space
is available, it is advantageous if a few of the earliest eigenvectors to converge are held in
fast memory for use in selective orthogonalization (reference 1, chapter 13). In contrast,
subspace iteration would need to access the basis vectors at every step and to hold in core
the small eigen problem which must be solved at every step.
For very large problems the Choleski factor L of K - u M will have to reside out of core
and the solution of triangular systems Lv= w will have to be done in pieces. It is here that
the fact that Lanczos requires an almost minimal number of matrix-vector operations is most
attrative. For each problem there is a special number, r say, such that the cost of computing
Av for r vectors v is the same as the cost for just one vector v. This facilitates the multiplication
of the basis vectors in subspace iteration but, for the same reason, if favours Lanczos with
blocks of size r.
In extreme cases the Lanczos algorithm can function even when only 7 or 8 vectors can be
held in core at one time: 2 for the Lanczos step, 2 for the tridiagonal, 1 for selective
orthgonalization, and 2 or 3 for parts of the Choleski factors.

COMPUTER PROCEDURES
In our study two different programs were used (almost as ‘black boxes’). The Lanczos algorithm
was the block version developed at Oak Ridge National Laboratory by David Scott.’ The
program is designed to solve the standard eigen problem, and was originally developed on an
IBM machine. ANSI FORTRAN was used to assure portability. The software was transferred
to the CDC 6400 at U.C. Berkeley. The Oak Ridge program is set to find a given number
of eigenvalues at either end of the spectrum (but not both). All the communication between
the program and the eigen problem is done through subroutine OP (supplied by the user), so
the modifications required to extend the program to the generalized eigen problem using
M’/’(K - vM)-lM1/’ were done through this subroutine.
The subspace iteration algorithm described in Reference 5 was adapted to the finite element
program FEAP (see Chapter 24 of Reference 7). FEAP was also used to generate the stiffnness
and mass matrices of our test problems. We followed a standard practice for choosing q, the
subspace dimension, min ( 2 p , p + S), where p is the number of wanted eigenvalues.

EFFECT OF SHIFITNG
To determine the effect of a shifting strategy on each algorithm, a series of tests was performed
in which various shifts were applied to the test problems and the C.P.U. times to extract the
first three eigen pairs were obtained. The results were plotted in Figures 7 and 8. A good
shifting strategy will reduce the cost of each algorithm. However, Figures 5 and 6 show that
the saving is somewhat more in Lanczos than in subspace iteration. Therefore, the use of a
sophisticated shifting strategy in Lanczos, as in Ericsson and R ~ h ewould
, ~ only enhance the
cost ratio between the two methods.

TEST PROBLEMS
The problems used to test and compare the above algorithms were typical problems occurring
in structural analysis. They were set up in such a way as to reflect the problems encountered
in very large systems. The details of the problems are laid out in Figures 1 and 3.
!
LANCZOS V. SUBSPACE ITERATION 863

0
N
e
c)

I- 4 @ 3.0 m = 12.0 m I

For a l l beams and columns: No. o f Beam Elements = 27


Young's Modulus = 1.0 KN/m2 No. o f Nodes = 20
Mass Density = 1.0 Kg/m3 T o t a l No. o f D.O.F. = 45
Area = 1.0 m2
Moment o f I n e r t i a = 1.0 m"

Figure l(a). Building frame of example 1

A LARGER EXAMPLE
In this section we present operation counts for a specific medium sized problem. Our analysis
suggests that a modern, simple (non-block) Lanczos program requires at most one-thirtieth
of the arithmetic operations used by a basic SI code and at most one-tenth of the operations
used by the accelerated SI program of Bathe and Rama~wamy.~
The task is to compute the 60 most important mode shapes associated with the building
frame indicated in Figure 9. On this example the accelerated SI program was nearly three
times faster than the basic version, which is probably close to the SI program we used in the
previous examples.
The key parameters are:
n = degrees-of-freedom = 468.
rn = average number of nonzeros per row in the profile of L (the triangular factor in K - 6M =
L D L ~=) 45.
p = number of eigenvectors wanted = 60.
o p . count for triangular factorization: $m2n+ . . . .
op. count for solving L D L ~ X = r : 2mn + . . . .
The mass matrix M is diagonal.
864 B. NOUR-OMID. B. N. PARLETT AND R. L. TAYLOR

A
300 . I

//
/
,
I
I
I

250 - /
I
I
I
I

I
I

Subspace I j
I
200 - I
I
#

I
I
I
I
L I
0 I
0.I
+

>
150 -
Y- /
/
I

-
I
z
O /
100
/
/
/
I
I

P
,
/

50 - ,

I I I I I I b
0 2 4 6 8 10
No. o f Eigenvalues

Figure 2. Comparison of the matrix-vector operations for obtaining increasing numbers of eigen pairs for the building
frame of example 1

Before plunging into details we wish to mention the fundamental difference between the
three algorithms SI (basic), SI (accelerated), and LANSO. One important calculation is common
to them all-the repeated solution of (K - uM)x = r for x given r. We call this a matrix-vector
operation. The parameter u is a shift; each new (T requires a new triangular factorization.
Table I gives the number of matrix-vector operation required to compute the 60 mode shapes.
It should be mentioned that SI(acce1erated) used 12 different values of u,whereas LANSO
used only one. The marginal cost of these 11 extra factorizations is small relative to the total
expenditure but significant relative to LANSO.
The details which follow show that the ratios in the last column of Table I are reliable
guides to the actual arithmetic costs involved. Of course, transfers between primary and
secondary storage are an extra burden.
LANCZOS V. SUBSPACE ITERATION 865

I- - 1

5 @ 3 . 0 m = 15.0 m

For a l l beams and columns: No. o f Beam Elements = 55


Young's Modulus = 1.0 KN/m No. o f Nodes = 36
Mass Density = 1.0 Kg/m Total No. o f D.O.F. = 90
Area = 1.0 m
Moment of Inertia = 1.0 m

Figure 3(a). Building frame of example 2

Figure 3(b), Eigenvalues of the system in Figure 3(a)

Table I t

4 No. of steps Matrix-vector ops.

SI(basic) 68 47 3196
SI(acce1erated) 68 36 2448
SI(acce1erated) 20 80 1600
LANSO 1 120 120

iq = dimension of subspaces used.


866 B. NOUR-OMID, B. N. PARLETT AND R. L. TAYLOR

L
0
+J
V

>

x
.r
+J
L
i?
0
'c

z
0

No. o f Eigenvalues

Figure 4. Comparison of the matrix-vector operations for obtaining increasing numbers of eigen pairs for the building
frame of example 2

I. One step of SI with subspace dimension q


Form right-hand sides R = MX
Solve LDLTx = R
Form projection matrices X T R and x T M x
solve q x q eigen problem
Form New Ritz vectors

Total: nq[2m + 2 q + 4 + 1 0 ( q * / n ) I (5)


In this particular example each triangular factorization is equivalent to m/4(= 1 1 ) matrix-
vector operations, exclusive of extra 1/0 costs.
LANCZOS V. SUBSPACE ITERATION 867

I- 1
160 @ 0.125 m = 20.0 m

For a l l beams and columns: No. o f Beam Elements = 320


Young's Modulus = 1.0 KN/m2 No. o f Nodes = 321
Mass Oensi t y = 1.0 Kg/m3 Total No. o f D.O.F. = 951
Area = 1.0 in2
b m e n t o f I n e r t i a = 10.0 m4

Figure 5 (a). Building frame of example 3

Figure 5(b). Eigenvalues of the system in Figure 5(a)

IT. One step of L A M 0 (step i),as described in the second section:


Solve L D L ~=~M' ' / ~ ,q = ~ ' ~ ' 4 (2m + 2)n
Compute aj,r, pi+l 5n
Orthogonalize against an eigenvector' 2n
Update error bounds2 30j
Compute an eigenvector3 ni

[
n 2m+j+9+-
3n0 7
1. It is not necessary to orthogonalize against an eigenvector at each step, but this accounting
device distributes total orthogonalization costs uniformly to each step.
2. Intensive recent research has lowered this cost from O ( j 2 ) to 30j.
868 B. NOUR-OMID, B. N. PARLETT AND R. L. TAYLOR

A
250 -
I
I
/
II
I
200 - I
I
c
ln I
.- I
c
nr
L
0
I
n
I
0 I
*
L
Subspace
I
0

>
150 - I
I
I
x
.r I
c I
zz
rn
I
'c
0 I
/
=
0
100 - /
/

50 L

0 3 6 9

No. o f Eigenvalues

Figure 6. Comparison of the matrix-vector operations for obtaining increasing numbers of eigen pairs for the building
frame of example 3

3. As in 1 it is not the case that an eigenvector converges at each step. Nevertheless in this
example the last 40 eigenvalues converged during the last 60 steps. So we have overestimated
the cost, but not grossly. This step requires the recall in sequence from secondary storage of
the Lanczos vectors q k to form y =Cz=, q k s ( k ) .In virtual memory systems there is no penalty
for such sequential transfers.
LANSO can run efficiently with 7 n-vectors in primary storage along with L, D and M.
After 120 steps, LANSO found 64 eigenvalues with error bounds <lo-".

Accelerated subspace iteration versus Lanczos


If we use the appropriate values for m,4,.. . in this example we observe that the quantities
in square brackets in (5) and (6) in the total operation counts are quite close: SI(acce1erated)
[143]; LANSO [163]. If we apportion the factorization costs to each step in SI, we raise the
LANCZOS V. SUBSPACE ITERATION 869

4
1.0

c,
I 0.9 Subspace tmx= 5.100 sec.
c,
1

I
.r
I-
I Lanczos t--.. = 1.426 sec.

V
aJ
N
.r
0.8
Subspace

0.7 Lanczos

0.0 0.008 0.016 0.024


Shift a

Figure 7. Results from example 1-relative improvement of solution times for obtaining the smallest three eigen
pairs under different applied shifts

factor to [149]. Consequently, for each method, the arithmetic labour is somewhat less than
double the number of matrix-vector operations, which would give [1801. This suggests that
even the best of SI techniques requires 11 times more arithmetic labour than does LANSO.

Accelerated subspace iteration versus basic


The basic SI program required 47 steps with q = 68. With such large values for q the solution
of the q X q eigenproblem changes from an insignificant to an important component. The

1.0
sec.

x sec.

U
0.8

m
L
z
0

0.7

0.6 1 I I I e
-0.03 -0.02 -0.01 0.00
Shift a
Figure 8. Results from example 2-relative improvement of solution times for obtaining the smallest three eigen
pairs under different applied shifts
870 B. NOUR-OMID, B. N. PARLETT AND R. L. TAYLOR

1 - 2 AT 15'

3 AT 15'

*[ ml ( b ) PLAN OF BUILDING
2 AT 20'

DATA :
YOUNG'S MODULUS = 4 3 2 0 0 0 , MASS DENSITY :1.0

COLUMNS IN FRONT BUILDING: A,; 3.0, 1,. 1,: I , = 1.0


COLUMNSIN REAR B U I L D I N G : A , = 4 . 0 , I , = I , ~ I , = I . Z 5
A L L B E A M S 1NTO X - D I R E C T I O N : A l = 2 . 0 , Il'12'1,~0.75
A L L BEAMS INTO Y - D I R E C T I O N : A,=3.0, I l = [ z = 1 3 =1.0
UNITS: FT.KIP

Figure 9. Three-dimensional building from of example 3

estimate of 10q3 operations is no longer satisfactory because when, say, 30 eigenvectors have
converged the 4 x q eigenproblem will actually be a ( q - 30) x (4 - 30) eigenproblem.
If we allow for the effective eigenproblern to decline from 4 x q to, say, 10 x 10 over the
47 steps, it is appropriate to ration the expense as (10q3)/4 per step.

Total arithmetic work for basic SI: = 47 x n x 68 x 90 + 136 + 4 + 10


[ (4466284) / 4 l
~

=47 x 68 x 255 x n

Total arithmetic work for accelerated SI: = 80 x n x 20 x 90 + 40 + 4 + 10


[ (3+ 1
~ 6

=8 0 x 2 0 1~ 4 9 n~
The ratio is 3.4. This is somewhat greater than the observed ratio of 2.9, but the main features
seem to be represented adequately.
LANCZOS V . SUBSPACE 1TERATION 87 1

Variations on LANSO
As suggested by Ruhe and Ericsson, the Lanczos algorithm can also benefit from using
several shifts. In this example the technique might yield a 50 per cent reduction in arithmetic
effort.
It is also interesting to note that the well-known technique of using full re-orthogonalization
of the Lanczos vectors changes the operation count from [2m + j + 9 + 3 0 j / n ] to [2m + 3j + 7 +
3 0 j / n ] . In this example the average value is [281]. This suggests that Lanczos with full
re-orthogonalization would require only (1/5)th the arithmetic labour of accelerated SI.

DISCUSSION
It is evident from the results (Figures 2, 4 and 6) that a good Lanczos algorithm (such as
Reference 2) is an order of magnitude faster and therefore’less costly than basic subspace
iteration in both the numbers of required matrix-vector operations and C.P.U. time. We
conjecture that this ratio will persist on larger, more difficult problems even when sophisticated
versions of Subspace iteration, as in ADINA, are used in the comparisons.

ACKNOWLEDGEMENT

The first two authors gratefully acknowledge partial support by ONR Contract N00014-69-
0200-1017.

REFERENCES
1. B. N. Parlett. The Symmetric Eigenvalue Problem, Prentice-Hall, Englewood Cliffs, N.J., 1980.
2. D. S. Scott, ‘Bock Lanczos software for symmetric eignevalue problems’, Report ORNLJCSD-48, UC-32.
3 . T. Ericsson and A. Ruhe, ‘The spectral transformation Lanczos method for the numerical solution of large sparse
generalized symmetric eigenvalue problems’, Report UMINF-76.79 ISSN 0348-0542.
4. K. 1. Bathe and S. Ramaswamy, ‘An accelerated subspace iteration method’, J. Comp. Meth. Appl. Mech. Eng.,
23,313-331 (1980).
5 . K.J. Bathe and E. L. Wilson, Numerical Methods in Finite Element Analysis, Prentice-Hall, Englewood Cliffs,
N.J., 1976.
6. B. N. Parlett and D. S. Scott, ‘The Lanczos algorithm with selective orthogonalization’, Math. Comp. 33, (145),
217-238 (1979).
7. 0.C. Zienkiewicz, The Finite Element Method, 3rd edn, McGraw-Hill, London, 1977.

You might also like