You are on page 1of 20

SYMPOSIUM ON SPARSE MATRICES AND THEIR APPLICATIONS

Donald J. Rose, Department of Mathematics, University


of Denver

Ralph A. Willoughby, Mathematical Sciences Department,


IBM Research

INTRODUCTION

The main body of this Proceedings consists of 15 papers pre-


sented at a Symposium on Sparse Matrices and Their Applications
which was held at the IBM Thomas J. Watson Research Center,
Yorktown Heights, New York on September 9-10, 1971. The conference
was sponsored by the National Science Foundation, Office of Naval
Research, IBM World Trade Corporation, and the Mathematical
Sciences Department of IBM Research.
Sparse matrix technology is an important computational tool
in a broad spectrum of application areas, and a number of these
areas are represented in this Proceedings. Of course, the mathe-
matical and computational techniques, presented in the context
of a given application, impact many other applications. It is
this cross-fertilization that has been a prima:t motivation for
this and two previous sparse matrix conferences [Willoughby(1968A);
Reid(1970A»). Some fields such as Linear Programming, Power Sys-
tems, and Structural Mechanics were systematically surveyed in the
first two conferences and are not surveyed here. In addition to
the applications themselves, sparse matrix technology involves
Combinatorics, Numerical Analysis, Programming, and Data Manage-
ment. tt
tBrackets are used in the introduction to cite references in the
unified bibliography at the end of this Proceedings.
ttSee [Smith (1968A); McKellar and Coffman (1969A); Buchet (1970A);
Denning (1970A); Mattson et al (1970A); Moler (1972A») for a
discussion of various aspects of memory hierarchies.

D. J. Rose et al. (eds.), Sparse Matrices and their Applications


© Plenum Press, New York 1972
4 D. J. ROSE AND R. A. WILLOUGHBY

The major ideas in each paper will be summarized in this


introduction. These ideas will be interspersed with a brief
survev of sparse matrix technology. The papers are ordered
alphabetically within groups. The groups are determined partly
by application area and partly by mathematical character. Details
concerning each paper and related sparse matrix techniques will be
given after the listing of the groups of papers in the order in
which they occur.
The first group consists of the papers by Calahan, Erisman,
Gustavson, and Hachtel. These papers concern problem classes in
the field of Computational Circuit Design. Linear Programming
is a second application area which involves sparse matrix tech-
nology of a very general character. t The papers bytt Hellerman-
Rarick and Tomlin comprise the second group.
The sparse matrix technology associated with the field of
Partial Differential Equations is the subject of the papers by
Evans, George, Guymon-King, and Widlund. Finite element methods
are a very active field of research in this area and the papers by
George and Guymon-King concern the finite element approach.
The papers by Glaser-Saliba and Hoernes form a Special Topics
group. The former paper represents the application of sparse
matrices in the field of Analytical Photogrammetry, which is con-
cerned with the determination of reliable metric information from
photographic images. The second paper concerns Data Base Systems.
The final group of papers are by Cuthill, Rheinboldt-Basili-
Mesztenyi, and Rose-Bunch. These concern the fields of Combina-
torics and Graph Theory.

Computational Circuit Design


In the next few paragraphs some aspects of sparse matrix
technology, which have been motivated by problems from the field
of Computational Circuit Design, will be sketched along with a
discussion of the first group of papers. This field is in some
sense a problem class representative of many applications. Also
it is the most well developed with respect to sophisticated sparse
matrix techniques. It is for these reasons that this application
area is considered first.
ComputAtional Circuit Design is a very broad and highly de-
veloped area, and it is beyond the scope of this introduction to
systematically sketch all the various problem types in this field.
The interested reader should consult the two special issues of
the IEEE Proceedings [IEEE (1967A), (1972A)] and of the Trans-
actions on Circuit Theory [IEEE (197IA)] for pertinent articles and
extensive bibliography.

tIn particular, one can have highly irregular sparseness structures


in these first two fields. The matrices are, in general, neither
positive definite symmetric nor diagonally dominant.
tt
A hyphen is used to connect co-authors.
INTRODUCTION 5

The algebraic derivation of the sparse linear systems in


classical Electrical Network Theory can be found in the survev
article [Bryant (1967A)]. A novel tableau approach to this deri-
vation has been motivated by recent advances in sparse matrix
technology [Hachtel, Brayton, and Gustavson (1971A)].
One class of problems in computational design [Hachtel and
Rohrer (1967A)] concerns the numerical integration of the initial
value problem

f(t,w,p) , (1)

where the vector w(t ) is specified. The vector, p, of design


parameters is to be ~ystematica11y altered so as to find a specific
design vecto T , p , which yields "Optimal" time behavior for system
(1). 0
The unavailability, until recently, of efficient integration
techniques for stiff systems of ordinary differential equations t
has been a bottleneck in the modeling and computer analysis of
problems in many application areas. This is especially true for
the class of problems described in the previous paragraph. In that
case the efficiency of the integration is a critical factor in the
feasibility of the calculation.
The "stiffness" in system (1) manifpsts itself in the abnormal
size (»1) of the quantity K =£. L'lt, where.;t:. is the Lipschitz con-
stant associated with the w-variation of f, and L'lt is the desired
average sampling interval for the output of system (1). Efficiency
is achieved by using an "essentially" unconditionally stab 1e im-
plicit integration formula for (1) of the form

wm+1 - cxhwm+1 = R (2)

where tm+1 = tm + h and R involves wand w for t<t


-m •
Svstem (1) is, in general, nonlinear in w, and hence (2) is
nonlinear in wm+1' For stiff systems, the usual predictor-
corrector methods converge only when h is intolerably small. In
this situation, system (2) must be solved by a strongly convergent
technique such as Newton's method. Thus, one solves

(3)

w~11)= w~i + L'lw, where J = af/aw = Jacobian matrix.


for L'lw and sets
A simple starring guess in (3) for wm+1 is w~i = wm = final itera-
tion vecto: ~t t Z05m' One can also use an extrapolation procedure
for deter~n~ng wm+1'
2
If the Jacobian matrix is full and O(n ) elements depend on
t See chapter 6 in [Lapidus and Seinfe1d (1971A)].
6 D. J. ROSE AND R. A. WILLOUGHBY

the currenr guess for w, then methods like [Broyden (1969A)] are
necessary to insure that the amount of work at each step is of
2 3
order n rather than n , where n is the number of components of w.
Fortunately, when n is large in Computational Design problems,
the Jacobian is typically ~parse.t The sparseness structure can be
highly irregular, but computational efficiency is achieved, in
spite of this generality, hy exploiting the fixed sparseness
structure for the Jacobian. ----
Svstem (3) is of the form

Ax =b (4)

where A = (a .. ), l<i,j<n. Associated with (4) and a given


1J --
Jacobian matrix for system (1) is a set of index pairs
S = {(~,v)la = a}. The set S specifies the sparseness structure
~v
of the matrix A. There are a number of other ways to specify the
sparseness structure for A. For example, one can define for a
given sparse matrix class a Boolean adjacency matrix A , where
(A ) .. = 1 mean~ a .. t O. This representation require~ n 2 bits
s 1J 1J
regardless of the sparsity of A, and also is not necessarily the
most efficient representation from a programming point of view.
Threaded index lists with pointe~s are an important computational
tool for dealing with sparseness structure [Ogbuobiri (1970B);
Zollenkopf (1970A)]. Gustavson systematically discusses this
latter approach, via examples, in his paper in this Proceedings.
A sparse matrix can also be represented by a graph in various
ways [Harary (1970A)]. This topic will be considered in more
detail when the last group of Proceedings papers are being dis-
cussed.
In a given design calculation associated with the initial
value problem (1), the system (4) is generated and solved a large
number of times. If one fixes, a priori, the order in which the
equations and unknowns are processed in Gaussian elimination or
in triangular factorization, then the entire sequence of machine
operations needed to solve (4) is also determined, a priori,
simply from the sparseness structure of A.
At this point, it is convenient to introduce some standard
notation which is associated with the Crout form of triangular
factorization [Gustavson, Liniger, and Willoughby (1970A)]. Let
A = LU where L = (t .. ), t .. = 0 for j>i (lower triangular),
1J 1J
U = (u .. ), u ..
1J 11
= 1, u ..
1 J
= 0 for j<i (unit upper triangular). It is
~lso convenient to introduce a composite L\U matrix as C = (c .. )
where c .. = t .. for j<i and c .. = u .. for j>i. Each element 1J
1J 1J - 1J 1J

t That is, only a few unknowns occur in each equation.


INTRODUCTION 7

of C is generated by a single formula t

m-l
c ij = (a ij - L cikckj)d (5)
k=l

h
were . . ) d = 1 f or ~~"
m = mi n ( ~,J, .. an d d = c-1. f .<. If a ij -- 0
ii ~ ~ J.
and , for l<k<m
- - -1, c ik ckj -
- 0, then c ij is "logically zero." Other-
wise, a reduced formula defines c ij • In this formula only nonzero
numbers occur.
In 1966, a highly efficient symbolic factorization program,
GNSO (GeNerate SOlve) was created [Gustavson et al (1970A)]. GNSO
generates a linear (loop-free) code SOLVE, which is specifically
tailored to the zero-nonzero structure of A. The SOLVE program
represents a machine language code for computing the reduced formula
for each c .. t O. The program SOLVE can be very long and as an
1J
alternative, two programs SFACT and NFACT were created [Chang
(1968A)]. SFACT generates the sparseness structure of C in the
context of Tinney's row Gaussian elimination. tt The program NFACT
uses the sparseness information for C to enhance the speed of
execution of the numerical elimination.
Gustavson's paper in this Proceedings is a fundamental exposi-
tion of the main programming concepts involved in extensions ttt to
his own GNSO program, Chang's programs SFACT and NFACT, and those
of Tinney et al. Row Gaussian elimination, with diagonal pivoting
in the natural order, is treated both for the unsymmetric and
symmetric cases. An ordering program OPTORD is also described.
In the design problem associated with system (1) there is a
nested set of computation loops. Because of this, the elements a ij
of A have a hierarchy of "variability types" (Le., :: 0; :: constant;
or dependent on p, t, or w respectively). The last three variability
types imply increasing frequency of change of the numerical value
of the element a .. with that variability type. If the inner loop
calculations are~rlot memorv-limited, then it is desirable to seg-
ment the calculations. This is done in such a way as to avoid
repeated calculation of quantities which are constant within those
inner loops.
t S
lsk = 0 by definition if S<a.
k=a
ttTinney and his colleagues have developed an extensive sparse
matrix technology for problems in the field of Power Generation and
Distribution. [Sato and Tinney (1963A) ; Tinney and Walker(1967B);
Tinnev(1968A); Ogbuobiri, Tinney, and Walker(1970A); Ogbuobiri(1970B)].
ttt FORTRAN subroutines, based on these concepts, are in the IBM
program product SL-MATH, which was announced recently by IBM World
Trade Corporation.
8 D. J. ROSE AND R. A. WILLOUGHBY

The frequency of change of elements of A can be used in


connection with an ordering algorithm. These matters are discussed
in [Hachtel, Brayton, and Gustavson (197IA)] and also in the papers
by Gustavson and Hachtel in this Proceedings.
In particular, Hachtel considers the case where the vector b
in system (4 ) is sparse and where only a few components of x are
required as output. The quantities, generated during the Gaussian
elimination pro~ess as well as the forward and backward substitu-
tions, will have an inherited variability type. The forward depen-
dency chain will determine the "data type" of a quantity (e.g.,
a number may have a t-variable type but a w-data type since this
number is needed to generate a w-variable type quantity in some
subsequent calculation).
The determination of variable type and data type designation
for a quantity ;s a one-time a priori process which can be exploited
in an extension of the GNSO and SFACT calculations. The cost of
these symbolic preconditioning calculations is amortized over the
number of times system (4) is to be generated and solved for a given
specification of A and of the variability type of each nonzero a i .
In the paper ~y Erisman, a sparse matrix program, TRAFFIC, J
is described. This program has been designed for a particular
application: frequency domain analysis of linear passive electrical
networks. System (4) is represented in this class of problems as
Y(w)v = c, where Y(w) is an n x n complex symmetric admittance
matrix, v is the vector of unknown node to datum voltages, c is
the vector of input currents, w is the frequency and n is the
number of nodes.
This class of matrices has the additional property that dia-
gonal pivoting in any order is numerically acceptable. Thus, for
any permutation matrix P, M = PY(w)pT has the stable triangular
factorization t

(6)

where U is unit upper triangular and n is the (complex) diagonal


matrix of pivots. One also has L = uTn, and this latter equation
is used in a factorization algorithm, which was developed at the
Bonneville Power Laboratory. Programming details for this Bonne-
ville algorithm are described in Gustavson's paper.
Again there is a fixed sparseness structure for M in (6) for
all values of w. The matrix P, which orders the equations in
Y(w)v = c, can be determined, a priori, so as to minimize some
complexity criterion [Rose (197IA)].
Erisman describes and motivates the techniques built into the
program TRAFFIC, and, at the end, an illustrative example for a
large problem (3300 order with 60,000 nonzero elements) is given.

t Each of the papers in the Proceedings has its own set of notations
and these are, in general, different from the notation here in the
introduction. The superscript T refers to the transpose operation.
INTRODUCTION 9

Calahan's paper concerns the generalized eigenvalue problem

Ax = ABx (7)

where A and B arise from modal analysis of electronic circuits


and are sparse. The densely banded matrices and the full matrix
case has had an intensive development [Wilkinson (1965A)]. Calahan
points out, in the beginning of his paper, that the arbitrarily
sparse case has not received adequate attention by researchers,
and indicates some reasons why. He then gives a brief survey of a
number of eigenvalue methods that can be adapted to treat the sparse
case. t At the end of the paper, some statistics are given for the
calculation of all the natural modes of an amplifier circuit. The
order of A and B is 50 and B is symmetric. Muller's method
was used for ca1cu1Rting the eigenvalues. As Calahan points out, the
sparse eigenvalue problem is an important area for extensive
research.

Some Basic Notions


---- A discussion of certain basic aspects of direct sparse matrix
algorithms will be presented in what follows. These remarks will
be followed by a summary of the papers in the Linear Programming
group.
Many readers may be overly conditioned to high level languages
such as FORTRAN or ALGOL. As a result the computational efficiency
in the processing of arrays tends to be inadequately analyzed.
This is especially true concerning the relations between storage
order and usage of quantities in arrays.tt The casual use of the
transpose operation in matrix manipulations is a notable example of
a possible bottleneck. In a truly random access memory, it is
mainly indexing convenience that determines ordering strategies.
However, most nontrivial calculations involve memory hierarchies,
and here the situation can be quite different.
Highly sophisticated memory hierarchy systems are appearing
and the aim of these systems is to make the functioning of the
hierarchy transparent (i.e., be of no concern) to the user by
means of automatic memorv management [Denning (1970A); Mattson et
a1 (1970A)]. Some reasonable rules relative to ordering must be
followed, however, if efficiency is to be achieved in matrix
computations. Basically, the main ideas are: (1) When blocks of
information are moved up in the hierarchy they should have a
utilization which is directly related to the size of the block;
(2) Where information resides in the hierarchy should be related

tOf course, just as in the case of solving system (4), a nonsparse


algorithm could be used. However, this often involves an intoler-
able loss of efficiency.
tt See, for example, [McKellar and Coffman (1969A); Moler (1972A)]
for a discussion of matrix operations for a paged memory system.
10 D. J. ROSE AND R. A. WILLOUGHBY

to the effect of its access on the overall efficiency of the proces-


sing.
Efficient input-output and memory management are two of the
most critical problems in the design of large scale production
codes. The resolution of these problems often dictates the level
of generality which can be tolerated in the code without seriously
degrading the computational efficiency in the typical production
runs. t
There are a large number of methods which take advantage of
special properties of the coefficient matrix in (4) but if A is
simply a general n x n sparse matrix, then there are three main
types of direct sparse matrix algorithms. These are based res-
pectively on Gaussian elimination, triangular factorization, and
Gauss-Jordan complete elimination. There are methods based on
orthogonal transformations, such as Householder's method [Wilkinson
(1965A), pp. 290-299, 347-353], but these are used mainly in least
squares and eigenvalue calculations and will not be discussed here.
It will be seen in the discussion of the algorithms given below
that care is taken to relate the order of processing to the order
of storage. Unless otherwise indicated it is assumed that pivoting
in the algorithms proceeds down the diagonal in the natural order.
Pivoting-for-size can be incorporated into certain sparse matrix
algorithms by generating permutation lists to keep track of the
sequence of pivot locations. Of course, this approach precludes
the use of many of the preprocessing procedures described in the
first group of papers in this Proceedings.
Some additional notation is introduced to facilitate the dis-
cussion of the algorithms. Let ek be the kth column of the identity
matrix I = (8 .. ), where 0 .. is the Kronecker delta. A superscript T
tt 1.J 1.J T th
is used to designate row vectors, and thus e k denotes the k row
of I. Let A = (a .. ), then
1.J
. th (Sa)
a Ae. J column of A,
'j J
T T .th
a. e.A 1. row of A. (8b)
1.' 1.
The details of handling the sparseness features in each algorithm
(i.e., avoiding the storing of and processing of floating point
zeros) is omitted so that the reader can see the algorithm in its
simplest form. The reader should see Gustavson's paper for a care-
ful discussion of the programming details and of methods for deal-
ing with the sparseness in vectors and arrays.
The basic formula for Crout triangular factorization was given
earlier in (5). Each c .. in the composite matrix C = L\U is gener-
1.J
ated recursively. Because of the form of (5), extended precision
inner product accumulation requires very little extra storage.
Another basic vector operation in matrix algorithms is the formation

tHaving too many special purpose codes, on the other hand, tends to
create a high level of human inefficiency.
tt As indicated earlier, it is also used to denote the transpose
operation for rectangular arrays.
INTRODUCTION 11

of a vector by subtracting a multiple of one vector from another


vector. Here, th~ extended precision accumulation requires more
storage.
This second vector operation is used repeatedly in row Gaussian
elimination. In this algorithm one stores and processes the ele·
ments of A row by row in the order aI ' a~ , .•• ,aT The rows t
T T T • • n·
ul •
' u•
2 , ••• , un- of the unit upper triangular matrix U are suc-
cessively generated as is described below. The first row of U is
.
g1ven . 1y b Y uT • -- PIal'
S1mp T were
h PI -- 1/ all' Once the vectors
l
T T T T .
ul.,u2"""~_1' have been formed then ~. 1S created as follows.
Let vT = ~. initially, and for i = 1,2, ..• ,k-l form tt

R.ki +-
vi' (9a)
T T T
v +- v - V.U. (9b)
1 l'
T T -1
Then ~. Pkv where Pk v k-1 R.kk • Note that the resulting
component, v., on the left of the assignment symbol in (9b) is zero.
For sparse m~trices (9a) and (9b) are done only for those i where
v. :f O.
1 In this elimination or factorization stage, a given row of U
may be needed repeatedly to form subsequent rows of U. In compu-
tations which are memory bound ttt , it may be desirable to order
the equationQ and unknowns so that a given row of U is used only
for a small number of near-by rows of U. Thus one would need
only a reasonable sized locally active storage, and other quantities
could be brought up from back up store as needed or placed there
when generated. This aspect of ordering is an essential feature
of bandwidth minimization and is discussed in Cuthill's paper.
Row Gaussian elimination can be adapted to a form of pivoting-
for-size. One can, for example, first arrange the rows of A in
order of increasing density of nonzeros in each row, and then apply
threshold pivoting [Curtis and Reid (197lA)]. Pivoting-for-size
can then be achieved by generating a column index array J such
that, for row k, the pivot column is J(k). Procedure (9b) is still
a¥plied, but now i = J(v) for v = 1,2, ... ,k-l. This yields a vector
v with v. = 0 for the previous pivot components. As before
T T1 -1
~.= Pkv , but now n k= v~ , ~ = J(k). The pivot component
~ = J(k) satisfies Iv I > T max Iv I where the threshold, T, is in
the range 0 ~ T ~ 1. ~ v v

t Note that uT eT•


n' n
tt is the assignment symbol, which plays the role of the = symbol
+-
in FORTRAN.
ttt
That is, access to data and/or code is the critical factor in
the efficiency of the calculation.
12 D. J. ROSE AND R. A. WILLOUGHBY

The forward and backward substitutions are presented below for


the case of triangular factorization where the elements of Land
U are stored by rows or of row Gaussian elimination when J(k) = k.
Forward Substitution: Ly = b,
k-l
Yk =(b k - L
£'k· Y.)P ' k=1,2, ..• n .
. 1 J , k
(lOa)
J=
Backward Substitution: Ux = y,
n
xk Yk - L uk.x., j=n,n-l ••.. ,l
= (lOb)
;=k+l J J
Once a given approximate solution X, has been found, then the
triangular factors Land U can be used in connection with the
method of iterative refinement [Forsythe and Moler (1967B), Chapter
13] to generate a more accurate solution. Here, one forms the
residual vector t r = b - AX, solves A~x = r for ~x, and then forms
x + ~x as a refined solution.
There are applications, such as the simplex method in Linear
Programming, where both A and AT occur as the coefficient
matrix. That is, one wants to solve not only system (4) for x
but also ATz = c for z. Note that the latter system can be written
as zTA =cT • Thus, if A = LU then the (row) substitution steps are
given by wTU = cT and zTL = wT.
Assume as before that Land U are stored by r£ws. A
special (row) substitution algorithm for solving for z in
zTL = wT is given below.tt Let k = nand vT = wT initially, then
one has
k T T
L Z\.l\lO
p=l
v. (11)

This is just another way of writing zT L = wT, and from the lower
triangular property of L it follows from (11) that

(12a)
-1
where Pk = £'k . The algorithm then proceeds, as long as k>l, via
the ass~gnmen~s

(12b)

k +- k-1 (12c)

Linear Programming
There are two sparse matrix algorithms which are widely used
in the field of Linear Programming; namely, PFI (Product Form of
the Inverse) and EFI (Elimination Form of the Inverse). The
t Extended precision inner product accumulation is important
because of the inherent numerical cancellation.
tt A similar algorithm applies to wTU = c .
INTRODUCTION 13

factorization stage is based on Gauss Jordan elimination in the PFI


algorithm and on Gaussian elimination for the EFI algorithm. These
two algorithms have been compared for sparsity preservation and
efficiency by a number of authors, and Tomlin's paper traces this
work.
Both algorithms are based on column storage and processing.
Moreover, elementary column matrices form a basic operational tool.
Given a column vector t. k , one forms an elementary cgtturnn matrix t
Tk which is the identity matrix except that Tkek = k column of
Tk = t. k · These matrices are a special case of an important case
of elementary matrices of the form E = 1+ wvT [Householder (1964A),
pp.3-4], where (wvT ) .. = w.v .. Note that 8(E) = determinant of
T 1J 1 1 -1 T T -1
E = l+v w, and, if 8(E) 1 0, then E =1 -pwv where 0 = (l+v w) •
Since elementary column matrices are such a basic tool, some
properties tt of these matrices are sketched here. The operational
form of Tk is given by

(13)

matrix with Pk in
-1
If c = Tk b, then c can be calculated as follows: Let a = 0kbk'
then ~ = a, and, for j k, c,. = b. - at. k • If bk
~ 0, then
.11 J
a = 0 and c = b. In particular, Tk e j = e j , for j 1 k, but
-1 . -1
Tk t' k = ek(these are, 1n fact, the basic characteristics of Tk ).
T ~T -1 T
Let c = b Tk ' then c can be calculated as follows:
ck (b k -bTt~*)Pk and c i =_~j for j ~ k. Thus one does not need
to form the k column of Tk . This column is often referred to
as an n-vector.
It is now an easy matter to present the factorization stage
of both the PFI and EFI algorithms. The PFI method is as follows:
-1 -1
t' l = a' l and for t' k = Tk _ l
2<k~n, Tl a. k • One then has
the identity T~l ... T~l A = I which yields the following form of
inverse.
-1 -1 -1
A = Tn ... Tl (14)

t See the second page of Tomlin's paper for a diagram of such a


matrix.
tt The transpose of such a matrix is an elementary row matrix which
has corresponding properties.
14 D. J. ROSE AND R. A. WILLOUGHBY

This algorithm has an elegant simplicity in its_formulation and


use, but it has the sparseness structure of L\U rather than the
preferred L\U structure of the other algorithms discussed here
[Brayton, Gustavson, and Willoughby (1970A)).
One can process the columns of A in any order. Moreover,
once t. k has been formed, any pivot component, not previously
designated, can be used. Let ~ = J(k) be chosen for this purpose,
then Tk is defined by Tke~ = t. k (i.e., Tklt. k = e~) and Tke v = e v
for v I~. The nontrivial column of Tk and of T~l in this case is
th
the ~ column. Another feature of the PFI method is the simpli-
city of updating the form of the inverse under column modification
[Tewarson (1966A)). This involves the formation of one more Tkl
matrix iust as in the PFI method itself. -1-1
The EFI method is based on the identity Ln .• Ll A U. It
differs from the PFI method in that if c = Lk-lb then c. b. for
J J
all the pivot components determined by the previously determined
L- l , ~ < k. Only the case J(k) = k is considered here. The EFI
~
algorithm proceeds as follows: First, Ll = Tl as in the PFI case.
-l
For Z<k <n 1 et v = Lk-l·.· L-l
1 a. k , t h en u ik = vi f or 1.. < k and
£ik = vi for i > k. The elementary column matrix Lk is also lower
triangular and has £.k' i > j, for its nontrivial elements in the
1. -
lower half of the kth column. Note that by a trivial factorization t
U = Un •.. Uz' where Uk is an elementary column matrix which is also
unit upper triangular and has u. k ' i < k, as nontrivial elements.
One then has the identity 1.

-1 -1 -1
A = Ul Uz ... Un-1Ln-1 ... L-1
l • (15)

The row oriented version of (15) formed by operating on the


right of AT by elementary row matrices (L~)-l is simply another
version of row Gaussian elimination. In Erisman's paper, one
had A symmetric and with diagonal pivoting allowed. Then
L = uTn where n is the diagonal matrix of pivots; that is,
£ .. = u .. £ .. for j I i. A Rifactorization algorithm has been
1.J J 1. J J
developed to exploit this feature [Zollenkopf (197lA)). In this
method one simultaneously operates on the left and on the right
of the coefficient matrix to obtain
-1 -1 -1 -1
Ln •.• Ll A Ul •.. Un = I. (16 )

Again one has the form of inverse indicated in (15). However, in


this case, U~l is an elementary row matrix which is the identity
J
t Ul = I and can be omitted.
INTRODUCTION 15

matrix except for having - ~ .. P. as the nontrivial element in the


. . .. . th 1J J . .
(J,1) pos1t10n of the J row for 1 > J.
The Bifactorization approach also applies to the case of
diagonal pivoting and symmetric sparseness structure. In this case,
it is the sparseness structure of the elementary row matrix U. which
is determined from the sparseness structure of L.. J
J
In the next few paragraphs, the papers by Tomlin and He11erman-
Rarick will be summarized. Tomlin uses the EFI algorithm as the
basic starting point for representing the inverse of the basis.
He discusses a number of methods for updating the elementary tri-
angular factor matrices under column modification, and motivates
the use of an extension of one presented in an IBM report which ;s
an enlarged earlier version of the paper [Brayton, Gustavson, and
Willoughby (1970A)]. As is the case with many of the papers in this
Proceedings, the ideas in Tomlin's paper are a part of a production
program, which has been performance tested on a set of significant
proh1ems. This program, called UMPIRE, showed substantial improve-
ment over programs in which other updating procedures were used.
For one thing, the growth of nonzero elements in the updated factors
was much slower than with the traditional n-vector approach. This
means that reinversion does not have to occur as often. An increase
of 40% in terms of simplex iterations per unit time was reported
over a standard method for problems up to 6800 rows for a controlled
set of experiments.
The Hellerman-Rarick paper also deals with the simplex method
for Linear Programming problems. The paper concerns the determina-
tion of the set of n pivot locations (J(k),k) l<k<n, to produce
a highly sparse set of n-vectors (i.e., Tk1e ,~ =-J(k), l~k<n)
in the PFI algorithm before column modificat~on. The usual search
for row and/or column singletons provides a partial set of pivots.
The choice of the pivots in the nontrivial part of the matrix
starts with a determination of a maximal assignment [Yaspan (1966A)];
that is, a set of distinct row indices I(k) such that a k ~ 0 for
~ = I(k). The a k are tentative pivots. This step is ~ followed
by a procedure ~ which is a matrix reducibility a1gorithm. t If
the nontrivial part of the matrix is reducible, this procedure
determines the reordering which puts the matrix into block lower
triangular form. The diagonal blocks will be irreducible, and
each such block is further reordered so as to minimize the number
of nonzero elements above the diagonal. The various steps in the
pivot procedure are illustrated in a special 16 x 16 example.

Partial Differential Equations


The next group of papers concern the numerical solution of
partial differential equations. The character of the sparse
matrix technology is auite different in the field of Partial Dif-
ferential Equations than it is in the field of Linear Programming.
t The paper by Rose-Bunch has some discussion of reducibility algorithms.
16 D. J. ROSE AND R. A. WILLOUGHBY

First of all, in the former field the matrices are normally posi-
tive definite symmetric. Also, the coefficient matrix has a
highly structured numerical and sparseness pattern. Since it is
clearly beyond the scope of this introduction to systematically
survey this field, only a few remarks of a general character will
be presented along with a summary of the papers by Evans, George,
Guymon-King, and Widlund. However, the bibliography contains a
number of basic references. t
Finite difference techniques have been a major tool for gener-
ating the coefficient matrix for system (4), but more recently,
finite element methods have heen actively Rtudied and developed
[ Zlamal (1968A), (1970A); Felippa and Clough (1970A); Babuska
(1971A); Fix and Larsen (1971A)]. A part of the paper by George
concerns the analysis, developed in his Ph.D. thesis at Stanford
in 1971, of finite element methods from the point of view of
computational efficiency.
The application paper by Guymon-King also deals with the finite
element approach. Their field of application is environmental
pollution; that is, they are dealing with the Quantification of the
growth, decay, and movement of man-made and natural substances in
the environment. The particular problem discussed in the paper
is the development of a mathematical model capable of simulating
or predicting the movement and fate of man-made pollutants in large
fresh water lakes or reservoirs. This is clearly a very ambitious
project, but it represents a systematic approach to problem modeling
in an important new area of research.
As is always the case in new application areas, the main
focus of attention is on the problem modeling rather than the effi-
cient solution of well known and properly posed problems. General
purpose algorithms and the computer programs which implement these
algorithms Rhould be aimed at providing a computational tool; that
is, to facilitate the formulation and testing of mathematical
models in new application areas.
There has always been a set of tradeoffs in comparing the
computational complexity of various numerical methods for solving
partial differential equations. There is, first of all, the com-
plexity question associated with the generation, storage, and
fetching of the nonzero coefficients in the coefficient matrix.
In the usual finite difference approach, the order n of the matrix
can be very large (10,000-100,000). However, the matrix has only
a small number of nonzero elements in each row (5-50) and this
number is independent of n. Moreover, in simple geometries, the
generation of the coefficients is straightforward. In sophisticated
finite element methods, the size of the matrix is orders of magni-
tude small~r, but the matrix is denser and the generation of the
nonzero elements is much more complex.
t [Birkhoff (1971A); Young (1971A)] are two recent surveys which
analyze much of the current research in this area.
INTRODUCTION 17

Just as in the case of programs in the field of Computational


Circuit Design, symbolic and numerical preprocessing can greatly
enhance the efficiency of repetitive inner loops of these calcula-
tions. In planning a computational strategy one should look at the
total problem to determine the loop structure. Moreover, the
question of efficiency for a given loop or nested set of loops must
be related to what percentage of time is devoted to that part of
the calculation.
Once one has the coefficient matrix, a different kind of com-
plexity is involved. For most iterative methods, the computational
complexity per iteration is optimal both with respect to the amount
of arithmetic and also the average access time to the data. In
fact, the complexity per iteration is a small multiple of that
associated with the matrix-vector operation v + Av. This same
optimal complexity per iteration is associated with the method of
Conjugate Gradients [Fox and Stanton (1968A); Fried (1969A,B);
Reid (19 70B) ] .
The whole question of total efficiency hinges t on the number
of iterationQ required to achieve an order of magnitude reduction
in the norm of the error vector. Tvpically, the Poisson equation
on the unit square is used as a model problem. In this case,
Fourier techniques and other powerful analytical tools are available
for studying the character of the spectral radius of the error
propagation operator.
It has been shown that this type of analysis can be misleading
when comparing methods [Stone (1968A); Weinstein (1968A); Weinstein,
Stone, and Kwan (1969A)]. A more appropriate model problem for two
dimensional elliptic problems is a general linear self-adjoint
problem with variable coefficients, such as that given by Evans
in his paper. He presents a novel approach to the ordering of grid
points in a block successive over-relaxation scheme. His peripheral
method is asymptotically equivalent to successive two line over-
rela~ation, and the latter is known to be a very good general pur-
pose method. The new method is shown to work well with special
regions, and the solution of the Torsion problem for a hollow
square is given as illustration.
Thus far, the discussion of computational efficiency in the
numerical solution of partial differential equations has focused
on iterative methods. These methods should also be compared with
direct methods when the memory hierarchy can handle the increased
data management. Thi~ is associated with the fill-in of elements
in the £holesky decomposition A = GTG, where G = (g, ,) is upper
T T 1/2 1J
triangular. Since U DU = A = G G-one has G = D U where D is
the diagonal matrix such that d" = t" > O.
11 11
The second half of George's paper contains an exciting new
ordering strategy for matrices which have a Poisson-like sparseness
structure. This structure is characterized by a local neighbor
pattern in a multidimensional grid of points. George's ordering
t For iterative methods other than Conjugate Gradients.
18 D. J. ROSE AND R. A. WILLOUGHBY

scheme reduces the multiplication count in the Cholesky factoriza-


tion from the usualt O(N4) to O(N3). This result will be discussed
again when the papers involving Graph Theory are being considered.
One can correctly argue that special partial differential
equations such as Poisson's equation on a rectangle are important
in their own right. In fact, there is a widely held view that only
special equations should be considered. In such a circumstance,
the computational efficiency for a given general purpose scheme
must be compared to special techniques which are specifically
tailored to that problem [Hockney (1965A), (1970A)].
The whole Rubject of special techniques for separable problems
and for problems which are "almost" of this form is a very active
field. Widlund's paper forms an excellent survey of recent
literature. He also discusses the methods for taking advantage
of the special numerical character of the elements of the coefficient
matrix A for these important problem classes. A number of extra
reference~ of this type are included in the bibliography, and the
reader is strongly urged to browse the listings.
There is a class of sparse matrix problems in the field of
Economics that have features similar to the finite difference
type matrix problems. These arise in input-output modeling
[Leontief (1966A); Noble (1968A); Carter and Brody (1970A)]. The
coefficient matrix can, for example, be a block matrix of order M
with each block being a square matrix of order N. In particular,
A = G-H, G = (G ) = block diagonal matrix (G = 0 unless V = v)
~v - ~v
H = (H ) = block matrix which is diagonal by blocks (H is a
~v ~v
diagonal matrix for each ~,v). One could interchange the role
of M and N, and then the sparseness character of G and H would be
interchanged.
In terms of indices one has
= g~V _ h~v (17)
a ij as as
where i = (~-l)N + a, l<~<M, l<a<N, and l<i<MN. A similar set of
relations applies to the 1Udices-j, a, andv. The sparseness
pattern indicated earlier is characterized by the conditions
g~~ = 0 if ~ ~ v, and h~~ = 0 if a + S. There are two sets of
relations involved in (17): (1) relations between geographical
regions and (2) relations between industries. Because of diagonal
dominance, the diagonal elements are the appropriate pivots. Alter-
nating direction type iteration seems a natural approach, but this
mayor may not be feasible since there is a great generality in the
numerical character of the nonzero elements.
This class of sparse matrix problems is a generalization of
the classic matrix equation shown_below.

t The order of the matrix he~e is n = N2 •


INTRODUCTION 19

AX + XB = C. (18)

The matrices t A(MxM), B(NxN), and C(MxN) are given and one seeks a
matrix X(MxN) which satisfies (18). Equation (18) can also be
expressed in tensor product form [Lynch, Rice, and Thomas (1964A,B)]
and is related to an analysis of alternating direction methods.
Direct solution for certain classes of finite difference equations
have been studied via (18) [Bickley and McNamee (1960A); Osborne
(1965A)]. Notice that the assignment, A + TAT-I, B + S-lBS,
C + TCS, and X + TXS, leaves (18) invariant. Thus, by means of
similarity transform~tions on A and B, one can precondition system
for easy solution. For example, if A and B are diagonal then (18)
becomes (a ii - bij)x ij = c ij .

Special Topics
This group consists of the two papers hy Glaser-Saliba and
Hoernes. The Hoernes paper concerns Data Base Systems, a technology
which will have a growing impact on problem modeling and computa-
tion. In particular, many of the coefficient matrices and vectors
used in sparse matrix applications will be generated from data bases.
Hoernes presents a brief introduction to this important field. An
in-depth treatment of this technology would involve a sophisticated
interac~ion of the fields of Systems Programming and Computer
Hardware. These subjects are beyond the scope of this Proceedings.
The Glaser-Saliba paper involves, besides a description of
analytical photogrammetry, the important topic of matrix partitioning
[Householder (1953A), (1964A)]. Let A = (A ) be an MxM partitioned
matrix where, A is a square matrix of orde¥vk(v), and
k(l) + k(2) + .. ~¥ k(M) = n. Assume that All is nonsingular and that
the first k(l) pivots in Gaussian elimination are chosen from the
first k(l) rows and columns. Then the reduced matrix of order
n-k(l) is giventt in block form for 2~V,v~ by
At (19)
UV
Notice that (19) is invariant under any nonsingular transformation
of the first k(l) rows and columns of A.
Let M = 2, k(l) = 1, and k(2) = n-l, then (19) is the outer
product representation of a single pivot step. Also, in this case,
-1 -1
All = tIl = PI' A2l is a column vector, and A12 is a row vector.
The outer product approach is used by Gustavson and by Hachtel
relative to symbolic preprocessing and to a priori ordering.
If k(l»>l then formula (19) is to be viewed symbolically,
since one does not usually form the inverse of All explicitly in
t The dimensions are given for each of the arrays in the parenthe-
sis after the symbol for the array.
tt To within roundoff errors.
20 D. J. ROSE AND R. A. WILLOUGHBY

this case. However, formula (19) indicates the nature of the


fill-in of nonzero elements in the reduced matrix after k(l) pivot
steps.
Consider the case where A is symmetric indefinite and one
wishes to preserve symmetry in the factorization. A stable pro-
cedure exists which is a mixture of scalar and 2x2 block diagonal
pivoting (i.e., for each ~, k(~) = 1 or k(~) = 2) [Bunch and
Parlett (197lA); Bunch (197lB»).
A priori partitioning for classes of matrices is usually
limited to cases where diagonal pivoting in any order is permitted.
In many applications such pivoting is feasible; e.g., the normal
matrix in Photogrammetry, the input-output matrix in Economics,
the stiffne~s matrix in Structural Mechanics, and the finite
difference or finite element matrix for coupled systems of
partial differential equations.
There are a variety of reasons why one deals with partitioned
matrices. One of the early reasons had to do with segmenting prob-
lems so that the subproblems could be successively solved within
the limitations of existing memories. This is still a primary
motivation in some applications, and Glaser-Saliba make this point.
However, with the advent of efficient automatic memory hierarchies
and excellent vector-oriented sparse matrix algorithms, other
methods of segmenting are available which require less clever
insight on the part of the problem poser. This is especially
important in the case of irregular sparseness structures.
In some cases there is a natural partitioning imposed by the
physical nature of the problem. Here the partitioning may be
completely regular and the elements of A, for example, might all
be 6x6 matrices. t
There is a density threshold for the reduced blocks in (19)
where it no longer pays for A' to be considered sparse [Jennings
~\)

(1968A»). In particular, ~'s sparseness, if any, will generally


be dominated by the fill associated with repeated application of
(19). If k(M)«n then a full matrix algorithm can be applied to
the pivoting for the last block without degrading the efficiency
of the total algorithm.

Combinatorics and Graph Theory


This last part of the introduction is concerned with a
discussion of combinatorial aspects of sparse matrix technology
and of their relation to graph theory. Also, a summary is given
of the last group of papers by Cuthill, Rheinboldt-Basili-Mesztenyi,
and Rose-Bunch. Some further discussion of partitioning will be
made in connpction with the summary of George's ordering algorithm
and of the paper by Rose-Bunch.
A sparse matrix can be represented by a graph in various
ways. The expository article [Harary (197lA») surveys some uses
t It defeats the purpose of this approach if one has to deal with
nontrivial sparseness for the 6x6's themselves.
INTRODUCTION 21

of graphs in sparse matrix research, and [Rose (1971A)] has given


a detailed graph theoretic study of the elimination process for
sparse positive definite symmetric systems.
Graph Theory is useful as a representation because it faci-
litates recognition of structures not evident in the array
representation of a sparse matrix; i.e., cycles, vertex separators,
cut sets, strong components, etc. These structures play an impor-
tant part in ana1vzing and comparing the complexity of algorithms,
especially in the context of a priori "optimal ordering."
Certain properties and structures remain invariant under
classes of transformations. The discovery and classification of
these invariances are a fundamental part of matrix research;
and, where sparseness plays a role in the invariance, graph theory
forms a basic tool. In fact, when diagonal pivoting in any order
is allowed, the graph of the sparseness structuretis invariant
under the class of diagonal preserving reorderings, PAP T , where
P is a permutation matrix.
In George's ordering a1gori thm, the matrices A are block
J.!J.!
diagona1 tt for l<J.!<M, and this structure is to remain invariant
under Gaussian reduction in (19). The size of the diagonal blocks
in A (i.e., the B ) is an increasing function of J.!; e.g., the
uu aa
blocks B double in size for J.! + J.!+1. Since it is assumed that
aa
B'a = 0 for a ~ /3, where A' = B' = (B'a), this implies the existence
a.., J.!J.! a.., ttt
of a set of logical ortho,ona1ity relations involving the rows
of AJ.!l (note that AJ.!l = A1J.! since A is assumed to be positive
definite symmetric in George's algorithm). The reader can under-
stand George's result in detail only by a step-by-step treatment
of a nontrivial example, e.g., that associated with figure 4.1 in
George's paper. One should carry out the steps both by means of
a sequence of undirected graphs [Rose (1971A)] and by use of (19)
with Boolean sparseness matrices. The paper [Feingold and Spohn
(1967A)] also involves the preservation of the block diagonal
structure of A under Gaussian reduction.
J.!J.!
This concept can be generalized to the nonsymmetric case
provided diagonal pivoting in any order is numerically feasible.
The block diagonal property is to be then replaced by the more
general property of being block triangular. Rose-Bunch have
discussed this situation in some detail via 2x2 partitioned
t The graph has n vertices and there is a directed branch from
vertex i to vertex j for i ~ j whenever ai' t O. In the case of
symmetric sparseness structure, one deals Jwith an undirected graph.
tt That is, if A = B = (B ) then B = 0 for a ~ /3. Here one
is dealing with aJ.!~ird leve~/3of parti~~oning.
ttt A vector u is logically orthogonal to another vector v of the
same dimension if and only if for each i either u, = 0 or vi = O.
Thus, the inner product Euivi is logically zero. 1
22 D. J. ROSE AND R. A. WILLOUGHBY

matrices and also directed graphs. They also study the role of
tearing and modification in sparse systems. Here the use of
graphs allows precise statements about the complexity of the
solution.
If a matrix is block triangular, then only the diagonal blocks
need be factored, and the solution is determined via block substi-
tution. Rose-Bunch point out that, if A is irreducible, so are
all the matrices obtained from A via Gaussian reduction. However,
All can still be highly reducible, and they show how one can take
advantage of the reducibility of All. Again, one can create a
modified algorithm in which only the diagonal blocks of All are
~ -1
factored t in the formation~of AZ2 • For example, let A12= AllA12
A12 = A12 · Now, A12 is formed via a set of hlock substi-
~

tutions, and AZ2 = A22 - A2l A12 •


Cuthill's paper is a survey of ordering algorithms associated
with stiffness matrices in Structural Mechanics. The matrices
are positive definite symmetric and of very large order. It is
assumed that access to information from relatively slow auxiliary
storage is the primary bottleneck in the calculation. The survey
concerns mainly bandwidth minimization and its generalization to
"pipe" or band-like matrices [Jennings (1966A)]. Two comparison
tables are presented for the application of various algorithms tt
to two examples.
The paper by Rheinboldt-Basili-Mesztenyi is a description of
a graph theoretic programming language, GRAAL, and its use in
identifying appropriate structures in a graph. GRAAL makes it easy
to describe and implement graph algorithms as they arise in appli-
cations. After giving a brief description of the language, four
Rubroutines are given in GRAAL. The first two subroutines deter-
mine the strong components of a directed graph in node form (i.e.,
a matrix reducibility algorithm for A where a .. ~ 0); the third
constructs the acyclic condensation graph ass6~iated with the strong
components; and the fourth provides a topological sort of the nodes
of a graph.
It is well known that there is a wide spectrum to the orders
of complexity for various graph algorithms; for example, finding
all the cycles of a directed graph can have an exponential com-
plexity. The study of complexity in this field is an active
research area, and a language such as GRAAL can provide important
insights by making it easy to formulate and test conjectures by
means of examples.
t After All has been reordered to be block triangular.
tt The Cuthill-McKee and reverse Cuthill-McKee algorithms are
included in the comparison.

You might also like