Algorithm 583 LSQR: Sparse Linear Equations and Least Squares Problems

ALGORITHM 583
LSQR: Sparse Linear Equations and Least

Squares Problems
CHRISTOPHER C. PAIGE
McGill University, Canada
and
MICHAEL A. SAUNDERS
Stanford University
Categories and Subject Descriptors: G.1.3 [Numerical Analysis] Numerical Linear Algebra--linear
systems (dtrect and tterattve methods), G.3 [Mathematics of Computing]: Probability and Statis-
tms--statzshcal computtng, stattsttcal software, G.m [Mathematics of Computing]: Miscella-
n e o u s - - F O R T R A N program untts
General Terms Algorithms
Additional Key Words and Phrases: Analysis of variance, conjugate-gradient method, least squares,
linear equatmns, regression, sparse matrix
1. INTRODUCTION
L S Q R finds a solution x to the following problems:
U n s y m m e t r i c equations: solve Ax = b (1.1)
Linear least squares: minimize [[A x - b [[2 (1.2)
where A is a matrix with m rows and n columns, b is an m-vector, ~ is a scalar,

and the given data A, b, ), are real. T h e matrix A will normally be large and
sparse. It is deemed by means of a user-written subroutine A P R O D , whose
Received 4 June 1980; revised 23 September 1981, accepted 28 February 1982

This work was supported by Natural Sciences and Engineering Research Council of Canada Grant
A8652, by the New Zealand Department of Scientific and Industrial Research; and by U S. National
Science Foundation Grants MCS-7926009 and ECS-8012974, the Department of Energy under
Contract AM03-76SF00326, PA No. DE-AT03-76ER72018, the Office of Naval Research under
Contract N00014-75-C-0267, and the Army Research Office under Contract DAA29-79-C-0U0,
Authors' addresses: C. C. Paige, School of Computer Science, McGill University, Montreal, Quebec,
Canada H3A 2K6; M. A Saundem, Department of Operations Research, Stanford University,
Stanford, CA 94305.
Permmsion to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notme is given that copying is by permission of the Association
for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific
permission.
© 1982 ACM 0098-3500/82/0600-0[95 $00 75
ACM Transactions on Mathematical Software, Vol. 8, No. 2, June 1982, Pages 195-209.
196 Algorithms
Table I. Comparison of CGLS and LSQR

Storage Work per iteration
CGLS, h = 0 2m 4- 2n 2m + 3n
CGLS, )~~ 0 2m 4- 2n 2m + 5n
LSQR, any )~ m 4- 2n 3m 4- 5n
essential function is to compute products of the form A x and AWy for given
vectors x and y.
Problems (1.1) and (1.2) are t r e a t e d as special cases of (1.3), which we shall
write as
An earlier successful m e t h o d for such problems is the conjugate-gradient m e t h o d

for least squares systems given b y Hestenes and Stiefel [3]. (This m e t h o d is
described as algorithm CGLS in [6, sect. 7.1].) CGLS and L S Q R are iterative
m e t h o d s with similar qualitative properties. T h e i r computational requirements
are summarized in T a b l e I. In addition t h e y require a p r o d u c t A x and a p r o d u c t
ATy each iteration.
In order to achieve the storage shown for LSQR, we ask the user to i m p l e m e n t
the matrix-vector products in the form
y ,,-- y + A x and x (-- x + ATy, (1.5)
where <--- m e a n s t h a t one of the given vectors is overwritten by the expression
shown. (A p a r a m e t e r specifies which expression the user's subroutine A P R O D
should c o m p u t e on any given entry.) W e see t h a t L S Q R has a storage advantage
if the operations (1.5) can be performed with no additional storage b e y o n d t h a t
required to represent A. For least squares applications with m a n y observations
(m >> n), this could be useful.
T h e work shown in T a b l e I is the n u m b e r of floating-point multiplications per
iteration, excluding the work involved in the products A x , AWy. Since CGLS is
s o m e w h a t more efficient, we would not discourage using t h a t m e t h o d whenever
A or A is well conditioned. However, L S Q R is likely to obtain a more accurate
solution in fewer iterations if.4 is m o d e r a t e l y or severely ill-conditioned.
L e t ~k = b - / i x k be the residual vector associated with the k t h iteration. L S Q R
provides estimates of ][xk IJ2, II~k ]]2, []-4T~k H2, the n o r m of.4, the condition n u m b e r
of .4, and standard errors for the c o m p o n e n t s of x. T h e last two items require a
f u r t h e r 2n multiplications per iteration and an additional n-vector of storage.
S u b r o u t i n e L S Q R is written in the P F O R T subset of American National
S t a n d a r d F O R T R A N . It contains no m a c h i n e - d e p e n d e n t constants. Auxiliary
routines required are A P R O D , NORMLZ, SCOPY, SNRM2, and SSCAL. T h e
last t h r e e correspond to m e m b e r s of the BLAS collection [5].
2. MATHEMATICAL BACKGROUND
Algorithmic details are given in [6], mainly for the case ~ = 0. We summarize
these here with ~ reintroduced, and show t h a t a given value of :k m a y be dealt
with at negligible cost. T h e vector n o r m I] v ]]2 = (vTv) 1/2 is used throughout.
ACM Transactions on Mathematical Software, Vol. 8, No. 2, June 1982
Algorithms • 197
L S Q R uses an algorithm of Golub and K a h a n to reduce A to lower bidiagonal

form. T h e quantities produced from A and b after k + 1 steps of the bidiagonal-
ization (procedure Bidiag 1 [6]) are
(XI
~2 O~2
fi3 "
U k + l ~ [UI~ U2~ - - . ~ Uk+l]~ (2.1)
B k ~--
V k + l = [Vl, V2, • • . , Vk+l],
~k+l
T h e k t h approximation to the solution x is then defined to be x k = V ~ y k , where

yk solves the subproblem
(2.2)
Letting the associated residual vectors be
tk+l = file1 - B k y k
rk = b - A x k (2.3)
we find t h a t the relations
rk = Uk+ltk+l (2.4)
AWrk = h2Xk + Olk+lgk+lVk+l
will hold to machine accuracy, where rk+~ is the last component of tk+~, and we
therefore conclude t h a t (rk, xD will be an acceptable solution of (1.4) if the
computed value of either IIth+l II or I ak+lVk+l I is suitably small.
Bjorck [1] has previously observed t h a t subproblem (2.2) is the appropriate
generalization of minll B k y k -- fl~el II, when X ~ 0. He also discusses methods for
computing yk and xk efficiently for various ~ and k.
In L S Q R we assume t h a t a single value of )~ is given, and to save storage and
work, we do not compute yk, rk, or tk+~. The orthogonal factorization
(2.5)
qk J
is computed ( Q T Q k = I ; R k upper bidiagonal, k x k) and this would give R k y k =

fk, but instead we solve ~hDTDT k = V[ and form x k = D k f k .
The factorization (2.5) is formed similarly to the case )t = 0 in [6], except t h a t
two rotations are required per step instead of one. For k --- 2, the factorization
ACM Transac$1ons on Mathematical Software, Vol. 8, No. 2, June 1982.
198 • Algorithms
proceeds according to
pl
2 0/2 ~2 O/2 j~2 ~2
X
I pl 8= 61] p~ e2
p2 62
-'* f13
Note that the first ~ is rotated into the diagonal element a~. This alters the right-
hand side fl~e~ to produce ~1, the first component of qk. An alternative is to rotate
into f12 (and similarly for later h), since this does not affect the right-hand side
and it more closely simulates the algorithm that results when LSQR is applied to
and 5 directly. However, the rotations then have a greater effect on Bk, and in
practice the first option has proved to give marginally more accurate results.
The estimates required to implement the stopping criteria are
II kU = = Ilrkll 2 + X=llxkll = = + Ilqkll =,
']'4T~*]l ffi l]ATr~-- X2X*l' = ] ak÷~Bk+~k

"ok ]
This is a simple generalization of the case h ffi 0. No additional storage is needed
for qk, since only its norm is required. In short, although the presence of h
complicates the algorithm description, it adds essentially nothing to the storage
and work per iteration.
3. REGULARIZATION AND RELATED WORK

Introducing h as in (1.3) is just one way of "regularizing" the solution x, in the
sense that it can reduce the size of the computed solution and make its compo-
nents less sensitive to changes in the data. LSQR is applicable when a value of
is known a priori. The value is entered via the subroutine parameter DAMP. A
second method for regularizing x is available through LSQR's parameter ACOND,
which can cause iterations to terminate before I[xk [I becomes large. A similar
approach has recently been described by Wold et al. [9], who give an illuminating
interpretation of the bidiagonalization as a partial least squares procedure. Their
description will also be useful to those who prefer the notation of multiple
regression.
Methods for choosing X, and other approaches to regularization, are given in
[1, 2, 4, 8] and elsewhere. For a philosophical discussion, see [7].
4. CODING APROD
The best way to compute y + Ax and x + ATy depends upon the origin of the
matrix A. We shall illustrate a case that commonly arises, in which A is a sparse
matrix whose nonzero coefficients are stored by rows in a simple list. Let A have
ACM Transactions on Mathematmal Software, Voi. 8, No. 2, June 1982
Algorithms • 199
M rows, N columns, and NZ nonzeros. Conceptually we need three arrays

dimensioned as REAL RA(NZ) and INTEGER JA(NZ), NA(M),.where
RA(L) is the Lth nonzero of A, counting across row 1, then across row 2, and
so on;
JA(L) is the column in which the Lth nonzero of A lies;
NA(I) is the number of nonzero coefficients in the Ith row of A.
These quantities may be used in a straightforward way, as shown in Figure 1 (a
FORTRAN implementation). We assume that they are made available to
APROD through COMMON, and that the actual array dimensions are suitably
large.
Blank or labeled COMMON will often be convenient for transmitting data to
APROD. (Of course, some of the data could be local to APROD.) For greater
generality, the parameter lists for LSQR and APROD include two workspace
arrays IW, RW and their lengths LENIW, LENRW. LSQR does not use these
parameters directly; it just passes them to APROD.
Figure 2 illustrates their use on the same example (sparse A stored by rows).
An auxiliary subroutine APROD1 is needed to make the code readable. A similar
scheme should be used to initialize the workspace parameters prior to calling
LSQR.
Returning to the example itself, it may often be natural to store A by c o l u m n s
rather than rows, using analogous data structures. However, we note that in
sparse least squares applications, A may have many more rows than columns
(M >> N). In such cases it is vital to store A by rows as shown, if the machine
being used has a paged (virtual) memory. Random access is then restricted to
arrays of length N rather than M, and page faults will therefore be kept to a
minimum.
Note also that the arrays RA, JA, NA are adequate for computing both A x and
AWy; we do not need to store A by rows a n d by columns.
Regardless of the application, it will be apparent when coding APROD for the
two values of MODE that the matrix A is effectively being defined twice. Great
care must be taken to avoid coding inconsistent expressions y + A l x and x +
A T y , where either A1 or A2 is different from the desired A. (If A1 ~ As, algorithm
LSQR will not converge.) Parameters ANORM, ACOND, and CONLIM provide
a safeguard for such an event.
5. PRECONDITIONING
It is well known that conjugate-gradient methods can be accelerated if a nonsin-

gular matrix M is available to approximate A in some useful sense. When A is
square and nonsingular, the system A x -- b is equivalent to both of the following
systems:
(M-1A)x = c where M c = b; (5.1)
(AM-I)z -- b where M x = z. (5.2)
For least squares systems (undamped), only the analogue of (5.2) is applicable:
minllAx - b]12 = minll(AM-1)z - bH2, where M x = z. (5.3)
ACM Transac~mns on Mathematical Software, Vol. 8, No 2, June 1982
200 • Algorithms
SUBROUTINE APROD( MODE,M,N,X,Y,

* LENIW,LENRW,IW,RW )
INTEGER MODE,M,N,LENIW,LENRW
INTEGER IW(LENIW)
REAL X(N) ,Y(M), RW(LENRW)
APROD PERFORMS THE FOLLOWING FUNCTIONS:
IF MODE = I, SET Y = Y + A*X

IF MODE = 2, SET X " X + A(TRANSPOSE)*Y
WHERE A IS A MATRIX STORED BY ROWS IN

THE ARRAYS RA, JA, NA. IN THIS EXAMPLE,
RA, JA, NA ARE STORED IN COMMON.
REAL RA
INTEGER JA, NA
COMMON RA(9000), JA(9000), NA(I000)
INTEGER I,J,L,LI,L2
REAL SUM, YI, ZERO
ZERO - 0.0
L2 - 0
IF (MODE .NE.I) GO TO 400
Fig. 1. Computation ofy + Ax,
x + Ary, where A is a sparse
matrix stored compactly by MODE - 1 -- SET Y = Y + A*X.
rows. For convenience, the
data structure for A is held in DO 200 I - I, M
COMMON. SUM - ZERO
L1 " L2 + 1
L2 " L2 + NA(I)
DO 100 L " LI, L2
J .. JA(L)
SUM - SUM + RA(L)*X(J)
I00 CONTINUE
Y(1) = Y(1) + SUM
200 CONTINUE
RETURN
MODE = 2 -- SET X = X + A(TRANSPOSE)*Y.
400 DO 600 I " I, M

YI = Y(1)
L1 " L2 + 1
L2 " L2 + NA(I)
DO 500 L " LI, L2
J " JA(L)
X(J) " X(J) + RA(L)*YI
500 CONTINUE
600 CONTINUE
RETURN
END OF APROD
END
A C M Transactionson MathematicalSoftware,Vol 8,~No. 2,June 1982.
Algorithms . 201
SUBROUTINE APROD( MODE,H,N,X,Y,

* LENIW,LENRW,IW,RW )
INTEGER MODE,M,N,LENIW,LENRW
INTEGER IW(LENIW)
REAL X(N),Y(M),RW(LENRW)
APROD PERFORMS THE FOLLOWING FUNCTIONS:
IF MODE = I, SET Y ffi Y + A*X

IF MODE - 2, SET X ffi X + A(TRANSPOSE)*Y
WHERE A IS A MATRIX STORED BY ROWS IN

THE ARRAYS RA, JA, NA. IN THIS EXAMPLE,
APROD IS AN INTERFACE BETWEEN LSQR AND
ANOTHER USER ROUTINE THAT DOES THE WORK.
THE WORKSPACE ARRAY RW CONTAINS RA.
THE FIRST M COMPONENTS OF IW CONTAIN NA,
AND THE REMAINDER OF IW CONTAINS JA.
THE DIMENSIONS OF RW AND IW ARE ASSUMED
TO BE SUFFICIENTLY LARGE.
INTEGER LENJA,LENRA,LOCJA
LOCJA ffi M + I
LENJA ffi LENIW - LOCJA + 1
LENRA ffi LENRW Fig. 2 S a m e as Figure 1, with
CALL A P R O D I ( M O D E , M , N , X , Y ) the data structure for A held
* LENJA,LENRA,IW,IW(LOCJA),RW ) in the workspace parameters.
RETURN
END OF APROD
END
SUBROUTINE APRODI( MODE,M,N,X,Y,

* LENJA,LENRA,NA,JA,RA )
INTEGER MODE,M,N,LENJA,LENRA
INTEGER NA(M),JA(LENJA)
REAL X(N),Y(M),RA(LENRA)
APRODI DOES THE WORK FOR APROD.
INTEGER I,J,L,LI,L2
REAL SUM,YI,ZERO
< the same code as in APROD in Figure 1 >
END OF APRODI
END

0 0 0 0 0 0 ~ ~
0 0 0 0 0 0 0 0 0 0 0 0 0
0 ~ ~ ~ ~
0 ~ 0 0 ~ ~
,-10 000
II 0 I
U r.r-1 ~.4 0
0'3 I 0 0 I~ I I I I I I I 1 1 1 1 I I I
0 r..r..1 •K O0 0
I 0
0
0 0
0
0 II II II
0
I I I I I I I I I I I l l
I| ~ 0 ~ ~ 0 ~ ~
°
Q
o
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
! I I I I I
0 O Q O
o,I
0 O O 0 0 0 Q O O O Q Q Q O
Q O O O O O O Q O 0 O 0 0
~'~ 0
II I1
0
0
~ 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
I I I I I
0
0 O O O O O 0 0 0 0 0 Q O 0
0 Q O O O O O Q O O 0 O O Q
0 O O 0 0 0 0 0 0 0 0 O O 0
II •-,-i 0
~ ~ 0 0 ~ ~
Z
0
| 1 1
I
0
r~ 0
0
I
v
000
O~ ~'~1
0 0~-~"
0 0 O0
I
~2
O0
I I 0
0
v
000
0
000
I.-.I CO
u'l
¢0,-~
O0
-I¢ I I
000
v aOu'~
0
00u'~
0
0 OC)
Z
0
0 c.,I r'~ txl r.-.
~cO00
0 u~ ,.-4
C~
r~
°
00oh
Z
o
Z
0
204 • Algorithms
W e n o t e only t h a t subroutine L S Q R m a y be applied without c h a n g e to s y s t e m s

(5.1)-(5.3). T h e effect of M is localized to the user's own subroutine A P R O D . For
example, w h e n M O D E -- 1, A P R O D for the last two s y s t e m s should c o m p u t e
y + ( A M - t ) x b y first solving M w = x and t h e n c o m p u t i n g y + A w . Clearly it m u s t
be possible to solve s y s t e m s involving M and M T v e r y efficiently.
6. OUTPUT
S u b r o u t i n e L S Q R produces printed o u t p u t on file N O U T , if the p a r a m e t e r
N O U T is positive. T h i s is illustrated in Figure 3, in which the least squares
p r o b l e m solved is P(20, 10, 1, 1) as defined in [6], with a slight generalization to
include a d a m p i n g p a r a m e t e r )~ = 10 -2. (Single precision was used on an I B M
370/168.) T h e items printed~at the k t h iteration are as follows.
ITN T h e iteration n u m b e r k. Results are always printed for the
first 10 a n d last 10 iterations. I n t e r m e d i a t e results are
printed if m _< 40 or n ___ 40, or if one of the convergence
conditions is nearly satisfied. Otherwise, i n f o r m a t i o n is
printed e v e r y 10th iteration.
X(1) T h e value of the first e l e m e n t of the a p p r o x i m a t e solution
Xk.
FUNCTION T h e value of the function being minimized, n a m e l y [] Fk [I --
(11 rk II2 + x 2 II x, 112)'/2
COMPATIBLE A dimensionless q u a n t i t y which should converge to zero if
a n d only i f A x = b is compatible. I t is a n e s t i m a t e of II ~k II/
II b II, which decreases monotonically.
INCOMPATIBLE A dimensionless q u a n t i t y which should converge to zero if
a n d only i f the o p t i m u m II ~k II is nonzero. I t is a n e s t i m a t e
of IIii T ~ II/(11 ti IIwII Fk II), which is usually not monotonic.
NORM(ABAR) A m o n o t o n i c a l l y increasing e s t i m a t e of IILi II~.
COND(ABAR) A monotonically increasing e s t i m a t e of cond(z~) =
IIA IIr II/i+ IIr, the condition n u m b e r of z{.
ACKNOWLEDGMENT
T h e a u t h o r s are grateful to R i c h a r d H a n s o n for suggestions t h a t p r o m p t e d
several i m p r o v e m e n t s to the i m p l e m e n t a t i o n of L S Q R .
REFERENCES
1. BJORCK, A. A bidlagonahzation algorithm for solving ill-posed systems of linear equatmns Rep
LITH-MAT-R-80-33, Dep. Mathematics, Linkopmg Univ., Lmkoping, Sweden, 1980.
2 ELDI~,N, L. Algorithms for the regularization of ill-conditioned least squares problems BIT 17
(1977), 134-145.
3. HESTENES, M.R, AND STIEFEL, E Methods of conjugate gradients for solving hnear systems. J
Res. N.B.S. 49 (1952), 409-436.
4 LAWSON,C.L, AND HANSON, R.J Solvmg Least Squares Problems. Prentice-Hail, Englewood
Cliffs, N.J., 1974.
5 LAWSON, C . L , HANSON, R J., KINCA1D, D.R, ANDKROGH,F T Basic hnear algebra subprograms
for Fortran usage ACM Trans Math Softw 5, 3 (Sept 1979), 308-323 and (Algorithm} 324-325.
6. PAIGE, C.C., AND SAUNDERS, M A LSQR An algorithm for sparse linear equations and sparse
least squares ACM Trans Math Softw 8, 1 (March 1982), 43-71
ACM TransacUons on Mathematical Software, Vol 8, No. 2, June 1982
Algorithms • 205
7. SMITH, G., AND CAMPBELL, F. A critique of some ridge regression methods. J. Am. Star. Assoc.
75, 369 (March 1980), 74-81
8 VARAH, J.M A practical examination of some numerical methods for linear discrete ill-posed
problems S I A M Rev. 21 (1979), 100-111.
9. WOLD, S, WOLD, H., DUNN, W.J., AND RUHE, A. The collinearity problem in linear and
nonlinear regression. The partial least squares (PLS) approach to generalized inverses. Rep.
UMINF-83.80, Univ. Ume~, Ume&, Sweden, 1980.
ALGORITHM
[ A p a r t o f t h e l i s t i n g is p r i n t e d h e r e . T h e c o m p l e t e l i s t i n g is a v a i l a b l e f r o m t h e
A C M A l g o r i t h m s D i s t r i b u t i o n S e r v i c e (see p a g e 227 for o r d e r f o r m ) . ]
SUBROUTINE LSQR( M,N,APROD, DAMP, io

i LENIW,LENRW, IW,RW, 2.
2 U,V,W,X,SE, 3.
3 ATOL,BTOL,CONLIM, ITNLIM,NOUT, 4.
4 ISTOP,ANORM,ACOND,RNOI~M,ARNORM, XNOEM ) 5.
6.
EXTERNAL APROD 7.
INTEGER M,N,LENIW, LENRW, ITNLIM,NOUT,ISTOP 8.
INTEGER IW(LENIW) 9.
REAL RW(LENRW),U(M),V(N),W(N),X(N),SE(N), i¢.
i ATOL,BTOL, CONLIM, DAMP,ANORM,ACOND, RNORM,ARNORM, XNORM ii.
12.
13.
LSQR FINDS A SOLUTION X TO THE FOLLOWING PROBLEMS... 14.
15.
i. UNSYMMETRIC EQUATIONS -- SOLVE A*X = B 16.
17.
2. LINEAR LEAST SQUARES -- SOLVE A*X = B 18.
IN THE LEAST-SQUARES SENSE 19.
2~.
3. DAMPED LEAST SQUARES -- SOLVE ( A )*X ffi ( B ) 21.
( DAMP*I ) ( ~b ) 22.
IN THE LEAST-SQUARES SENSE 23.
24.
WHERE A IS A MATRIX WITH M ROWS AND N COLUMNS, B IS AN 25.
M-VECTOR, AND DAMP IS A SCALAR (ALL QUANTITIES REAL). 26.
THE MATRIX A IS INTENDED TO BE LARGE AND SPARSE. IT IS ACCESSED 27.
BY MEANS OF SUBROUTINE CALLS OF THE FORM 28.
29.
CALL APROD( MODE,M,N,X,Y,LENIW,LENRW, IW,RW ) 3#.
31.
WHICH MUST PERFORM THE FOLLOWING FUNCTIONS... 32.
33.
IF MODE ffi i, COMPUTE Y = Y + A*X. 34.
IF MODE ffi 2, COMPUTE X = X + A(TRANSPOSE)*Y. 35.
36.
THE VECTORS X AND Y ARE INPUT PARAMETERS IN BOTH CASES. 37.
IF MODE = i, Y SHOULD BE ALTERED WITHOUT CHANGING X. 38.
IF MODE = 2, X SHOULD BE ALTERED WITHOUT CHANGING Y. 39.
THE PARAMETERS LENIW, LENRW, IW, RW MAY BE USED FOR WORKSPACE 4¢.
AS DESCRIBED BELOW. 41.
42.
THE RHS VECTOR B IS INPUT VIA U, AND SUBSEQUENTLY OVERWRITTEN. 43.
44.
45.
ACM Transactionson MathematicalSoftware,Vol. 8, No. 2, June 1982
206 • Algorithms
NOTE. LSQR USES AN ITERATIVE METHOD TO APPROXIMATE THE SOLUTION. 46.

THE NUMBER OF ITERATIONS REQUIRED TO REACH A CERTAIN ACCURACY 47.
DEPENDS STRONGLY ON THE SCALING OF THE PROBLEM. POOR SCALING OF 48.
THE ROWS OR COLUMNS OF A SHOULD THEREFORE BE AVOIDED WHERE 49.
POSSIBLE. 5¢.
51.
FOR EXAMPLE, IN PROBLEM i THE SOLUTION IS UNALTERED BY 52.
ROW-SCALING. IF A ROW OF A IS VERY SMALL OR LARGE COMPARED TO 53.
THE OTHER ROWS OF A, THE CORRESPONDING ROW OF ( A B ) SHOULD 54.
BE SCALED UP OR DOWN. 55.
56.
IN PROBLEMS i AND 2, THE SOLUTION X IS EASILY RECOVERED 57.
FOLLOWING COLUMN-SCALING. IN THE ABSENCE OF BETTER INFORMATION, 58.
THE NONZERO COLUMNS OF A SHOULD BE SCALED SO THAT THEY ALL HAVE 59.
THE SAME EUCLIDEAN NORM (E.G. i.@). 6¢.
61.
IN PROBLEM 3, THERE IS NO FREEDOM TO RE-SCALE IF DAMP IS 62.
NONZERO. HOWEVER, THE VALUE OF DAMP SHOULD BE ASSIGNED ONLY 63.
AFTER ATTENTION HAS BEEN PAID TO THE SCALING OF A. 64.
65.
THE PARAMETER DAMP IS INTENDED TO HELP REGULARIZE 66.
ILL-CONDITIONED SYSTEMS, BY PREVENTING THE TRUE SOLUTION FROM 67.
BEING VERY LARGE. ANOTHER AID TO REGULARIZATION IS PROVIDED BY 68.
THE PARAMETER ACOND, WHICH MAY BE USED TO TERMINATE ITERATIONS 69.
BEFORE THE COMPUTED SOLUTION BECOMES VERY LARGE. 7¢.
71.
72.
NOTATION 73.
74.
75.
THE FOLLOWING QUANTITIES ARE USED IN DISCUSSING THE SUBROUTINE 76.
PARAMETERS... 77.
78.
~AR = ( A ), BBAR - (B) 79.
(D~*I) (~) 8~.
81.
R = B - A'X, REAR - BBAR - ABAR*X 82.
83.
RNOEM - SQRT( NORM(R)**2 + DAMP**2 * NORM(X)**2 ) 84.
= NORM( RBAR ) 85.
86.
RELPR = THE RELATIVE PRECISION OF FLOATING-POINT ARITHMETIC 87.
ON THE MACHINE BEING USED. FOR EXAMPLE, ON THE IBM 370, 88.
RELPR IS ABOUT i. @E-6 AND 1.~D-16 IN SINGLE AND DOUBLE 89.
PRECISION RESPECTIVELY. 9¢.
91.
LSQR MINIMIZES THE FUNCTION RNOEM WITH RESPECT TO X. 92.
93.
94.
PARAMETERS 95.
96.
97.
M INPUT THE NIIMBER OF ROWS IN A. 98.
99.
N INPUT THE NIJM~ER OF COLUMNS IN A.
APROD EXTERNAL SEE ABOVE. z¢2.

z03.
DAMP INPUT THE DAMPING PARAMETER FOR PROBLEM 3 ABOVE.
(DAMP SHOULD BE ~.~ FOR PROBLEMS i AND 2.)
ACM Transactions on Mathematical Software, Vol 8, No 2, June 1982.
Algorithms • 207
IF THE SYSTEM A*X = B IS INCOMPATIBLE, VALUES 1@6.

OF DAMP IN THE RANGE @ TO SQRT(RELPR)*NORM(A) 1@7.
WILL PROBABLY HAVE A NEGLIGIBLE EFFECT. 1@8.
LARGER VALUES OF DAMP WILL TEND TO DECREASE 1@9.
THE NORM OF X AND TO REDUCE THE ~ E R OF Ii@.
ITERATIONS REQUIRED BY LSQR. IIi.
112.
THE WORK PER ITERATION AND THE STORAGE NEEDED 113.
BY LSQR ARE THE SAME FOR ALL VALUES Ol~ DAMP. 114.
115.
LENIW INPUT THE LENGTH OF THE WORKSPACE ARRAY !W. 116.
LENRW INPUT THE LENGTH OF THE WORKSPACE ARRAY RW. 117.
IW WORKSPACE AN INTEGER ARRAY OF LENGTH LENIW. 118.
RW WORKSPACE A REAL ARRAY OF LENGTH LENRW. 119.
12@.
NOTE. LSQR DOES NOT EXPLICITLY USE THE PREVIOUS FOUR 121.
PARAMETERS, BUT PASSES THEM TO SUBROUTINE APROD FOR 122.
POSSIBLE USE AS WORKSPACE. IF APROD DOES NOT NEED 123.
IW OR RW, THE VALUES LENIW = i OR LENRW ,, i SHOULD 124.
BE USED, AND THE ACTUAL PARAMETERS CORRESPONDING TO 125.
IW OR RW MAY BE ANY CONVENIENT ARRAY OF SUITABLE TYPE. 126.
127.
U(M) INPUT THE RHS VECTOR B. BEWARE THAT U IS 128.
OVER-WRITTEN BY LSQR. 129.
13@.
V(N) WORKSPACE 131.
W(N) WORKSPACE 132.
133.
X(N) OUTPUT RETURNS THE COMPUTED SOLUTION X. 134.
135.
SE(N) OUTPUT RETURNS STANDARD ERROR ESTIMATES FOR THE 136.
COMPONENTS OF X. FOR EACH I, SE(1) IS SET 137.
TO THE VALUE ENORM * SQRT(SIGMA(I,I) / T ), 138.
WHERE SIGMA(I,I) IS AN ESTIMATE OF THE I-TH 139.
DIAGONAL OF THE INVERSE OF ABAR(TRANSPOSE)*ABAR 14@.
AND T " i IF M .LE. N, 141.
T ffi M - N IF M .GT. N AND DAMP ffi @, 142.
T ,, M IF DAMP .NE. @. 143.
144.
ATOL INPUT AN ESTIMATE OF THE RELATIVE ERROR IN THE DATA 145.
DEFINING THE MATRIX A. FOR EXAMPLE, 146.
IF A IS ACCURATE TO ABOUT 6 DIGITS, SET 147.
ATOL " I.@E-6 . 148.
149.
BTOL INPUT AN ESTIMATE OF THE RELATIVE ERROR IN THE DATA 15@.
DEFINING THE RHS VECTOR B. FOR EXAMPLE, 151.
IF B IS ACCURATE TO ABOUT 6 DIGITS, SET 152.
BTOL ffi I.@E-6 • 153.
154.
CONLIM INPUT AN UPPER LIMIT ON COND('ABAR), THE APPARENT 155.
CONDITION NUMBER OF THE MATRIX ABAR. 156.
ITERATIONS WILL BE TERMINATED IF A COMPUTED 157.
ESTIMATE OF COND(ABAR) EXCEEDS CONLIM. 158.
THIS IS INTENDED TO PREVENT CERTAIN SMALL OR 159.
ZERO SINGUI2d~ VALUES OF A OR ABAR FROM 16@.
COMING INTO EFFECT AND CAUSING UNWANTED GROWTH 161.
IN THE COMPUTED SOLUTION. 162.
163.
CONLIM AND DAMP MAY BE USED SEPARATELY OR 164.
TOGETHER TO REGULARIZE ILL-CONDITIONED SYSTEMS. 165.
~~ ; ~ L ~ I ~ •
208 • Algorithms
C 166.
C NORMALLY, CONLIM SHOULD BE IN THE RANGE 167.
C 1~ TO 1/REAR. 168.
C SUGGESTED VALb~ -- 169.
C CONLIM = 1/(100*RELPR) FOR COMPATIBLE SYSTEMS, 170.
C CONLIM - I/(10*SQRT(RELPR)) FOR LEAST SQUARES. 171.
C 172,
C NOTE. IF THE USER IS NOT CONCERNED ABOUT THE PARAMETERS 173.
C ATOL, BTOL AND CONLIM, ANY OR ALL OF THEM MAY BE SET 174.
C TO ZERO. THE EFFECT WILL BE THE SAME AS THE VALUES 175.
C RELPR, RELPR AND I/KELPR RESPECTIVELY. 176.
C 177.
C ITNLIM INPUT AN UPPER LIMIT ON THE NUMBER OF ITERATIONS. 178.
C SUGGESTED VALUE -- 179.
C ITNLIM " N/2 FOR WELL CONDITIONED SYSTEMS, 180.
C ITNLIM - 4*N OTHERWISE. 181.
C 182.
C NOUT INPUT FILE NUMBER FOR PRINTER. IF POSITIVE, 183.
C A SUMMARY WILL BE PRINTED ON FILE NOUT. 184.
C 185.
C ISTOP OUTPUT AN INTEGER GIVING THE REASON FOR TERMINATION... 186.
C 187.
C 0 X " 0 IS THE EXACT SOLUTION. 188.
C NO ITERATIONS WERE PERFORMED. 189.
C 190.
C 1 THE EQUATIONS A*X - B ARE PROBABLY 191.
C COMPATIBLE. NORM(A*X - B) IS SUFFICIENTLY 192.
C SMALL, GIVEN THE VALUES OF ATOL AND BTOL. 193 •
C 194.
C 2 THE SYSTEM A*X = B IS PROBABLY NOT 195.
C COMPATIBLE. A LEAST-SQUARES SOLUTION HAS 196.
C BEEN OBTAINED WHICH IS SUFFICIENTLY ACCURATE, 197.
C GIVEN THE VALUE OF ATOL. 198 •
C 199.
C 3 AN ESTIMATE OF COND(ABAR) HAS EXCEEDED 200.
C CONLIM. THE SYSTEM A*X - B APPEARS TO BE 2~1.
C ILL-CONDITIONED. OTHERWISE, THERE COULD BE AN 202.
C AN ERROR IN SUBROUTINE APROD. 203.
C 204.
C 4 THE EQUATIONS A*X - B ARE PROBABLY 205.
C COMPATIBLE. NORM(A*X - B) IS AS SMALL AS 206.
C SEEMS REASONABLE ON THIS MACHINE. 207.
C 208.
C 5 THE SYSTEM A*X - B IS PROBABLY NOT 209.
C COMPATIBLE. A LEAST-SQUARES SOLUTION HAS 21~.
C BEEN OBTAINED WHICH IS AS ACCURATE AS SEEMS 211.
C REASONABLE ON THIS MACHINE. 212.
C 213.
C 6 COND(ABAR) SEEMS TO BE SO LARGE THAT THERE IS 214.
C NOT MUCH POINT IN DOING FURTHER ITERATIONS, 215.
C GIVEN THE PRECISION OF THIS MACHINE. 216.
C THERE COULD BE AN ERROR IN SUBROUTINE APROD. 217.
C 218.
C 7 THE ITERATION LIMIT ITNLLM WAS REACHED. 219.
C 220.
C ANORM OUTPUT AN ESTIMATE OF THE FROBENIUS NORM OF ABAE. 221.
C THIS IS THE SQUARE-ROOT OF THE SUM OF SQUARES 222.
C OF THE ELEMENTS OF ABAR. 223.
C IF DAMP IS SMALL AND IF THE COLUMNS OF A 224.
C RAVE ALL BEEN SCALED TO HAVE LENGTH 1.0, 225.
C ANORM SHOULD INCREASE TO ROUGHLY SQRT(N). 226.
ACM Transactions on Mathematical Software, Vol. 8, No. 2, June 1982.
Algorithms • 200
C A RADICALLY DIFFERENT VALUE FOR ANORM MAY 227.

C INDICATE AN ERROR IN SUBROUTINE APROD (THERE 228.
C MAY BE AN INCONSISTENCY BETWEEN MODES 1 AND 2). 229.
C 23~.
C ACOND OUTPUT AN ESTIMATE OF COND(ABAR), THE CONDITION 231.
C NUMBER OF ABAR. A VERY HIGH VALUE OF ACOND 232.
C MAY AGAIN INDICATE AN ERROR IN APROD. 233.
C 234.
C RNORM OUTPUT AN ESTIMATE OF THE FINAL VALUE OF NORM(KBAR), 235.
C THE FUNCTION BEING MINIMIZED (SEE NOTATION 236.
C ABOVE). THIS WILL BE SMALL IF A*X " B HAft 237.
C A SOLUTION. 238.
C 239.
C ARNORM OUTPUT AN ESTIMATE OF THE FINAL VALUE OF 24@.
C NORM( ABAR(TRANSPOSE)*RBAR ) , THE NORM OF 241.
C THE RESIDUAL FOR THE USUAL NORMAL EQUATIONS. 242.
C THIS SHOULD BE SMALL IN ALL CASES. (ARNORM 243.
C WILL OFTEN BE SMALLER THAN THE TRUE VALUE 244.
C COMPUTED FROM THE OUTPUT VECTOR X. ) 245.
C 246.
C XNORM OUTPUT AN ESTIMATE OF THE NORM OF THE FINAL 247.
C SOLUTION VECTOR X. 248.
C 249.
C 25@.
C SUBROUTINES AND FUNCTIONS USED 251.
C 252.
C 253.
C USER APROD 254.
C LSQR NORMLZ 255.
C BLAS SCOFY,SNRM2,SSCAL (SEE LAWSON ET AL. BELOW) 256.
C (SNRM2 IS USED ONLY IN NORMLZ) 257.
C FORTRAN ABS,MOD, SQRT 258.
C 259.
C 26~.
C PRECISION 261.
C 262.
C 263.
C THE NUMBER OF ITERATIONS REQUIRED BY LSQR WILL USUALLY DECREASE 264.
C IF THE COMPUTATION IS PERFORMED IN HIGHER PRECISION. TO CONVERT 265.
C LSQR AND NORMLZ BETWEEN SINGLE- AND DOUBLE-PRECISION, CHANGE 266.
C THE WORDS 267.
C SCOPY, SNRM2, SSCAL 268.
C ABS, REAL, SQRT 269.
C TO THE APPROPRIATE BLAS AND FORTRAN EQUIVALENTS. 270.
C 271.
C 272.
C REFERENCES 273.
C 274.
C 275.
C PAIGE, C,C, AND SAUNDERS, M,A. LSQR: AN A L G O R ~ T ~ FOP, SPARSE 276.
C LINEAR EQUATIONS AND SPARSE LEAST SQUARES. 277.
C ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 8, 1 (MARCH 1982). 278.
C 279.
C LAWSON, C.L., HANSON, R.J., KINCAID, D.R. AND KROGH, ¥.T. 28@.
C BASIC LINEAR ALGEBRA SUBPROGRAMS FOR FORTRAN USAGE. 281.
C ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 5, 3 (SEPT 1979), 282.
C 3@8-323 AND 324-325. 283.
C 284.
c 285.
C LSQR. THIS VERSION DATED 22 FEBRUARY 1982. 286.
C 287.
ACM Transactions on Mathematical Software, VoL 8, No. 2, June 1982.

Algorithm 583 LSQR: Sparse Linear Equations and Least Squares Problems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Algorithm 583 LSQR: Sparse Linear Equations and Least Squares Problems

Uploaded by

Copyright:

Available Formats

ALGORITHM 583

LSQR: Sparse Linear Equations and Least

where A is a matrix with m rows and n columns, b is an m-vector, ~ is a scalar,

Received 4 June 1980; revised 23 September 1981, accepted 28 February 1982

Table I. Comparison of CGLS and LSQR

An earlier successful m e t h o d for such problems is the conjugate-gradient m e t h o d

L S Q R uses an algorithm of Golub and K a h a n to reduce A to lower bidiagonal

T h e k t h approximation to the solution x is then defined to be x k = V ~ y k , where

Letting the associated residual vectors be

we find t h a t the relations

is computed ( Q T Q k = I ; R k upper bidiagonal, k x k) and this would give R k y k =

']'4T~*]l ffi l]ATr~-- X2X*l' = ] ak÷~Bk+~k

3. REGULARIZATION AND RELATED WORK

M rows, N columns, and NZ nonzeros. Conceptually we need three arrays

It is well known that conjugate-gradient methods can be accelerated if a nonsin-

SUBROUTINE APROD( MODE,M,N,X,Y,

APROD PERFORMS THE FOLLOWING FUNCTIONS:

IF MODE = I, SET Y = Y + A*X

WHERE A IS A MATRIX STORED BY ROWS IN

MODE = 2 -- SET X = X + A(TRANSPOSE)*Y.

400 DO 600 I " I, M

SUBROUTINE APROD( MODE,H,N,X,Y,

APROD PERFORMS THE FOLLOWING FUNCTIONS:

IF MODE = I, SET Y ffi Y + A*X

WHERE A IS A MATRIX STORED BY ROWS IN

SUBROUTINE APRODI( MODE,M,N,X,Y,

APRODI DOES THE WORK FOR APROD.

< the same code as in APROD in Figure 1 >

ACM Transactions on Mathematical Software, Vol. 8, No. 2, June 1982

W e n o t e only t h a t subroutine L S Q R m a y be applied without c h a n g e to s y s t e m s

SUBROUTINE LSQR( M,N,APROD, DAMP, io

NOTE. LSQR USES AN ITERATIVE METHOD TO APPROXIMATE THE SOLUTION. 46.

APROD EXTERNAL SEE ABOVE. z¢2.

IF THE SYSTEM A*X = B IS INCOMPATIBLE, VALUES 1@6.

C A RADICALLY DIFFERENT VALUE FOR ANORM MAY 227.

You might also like

']'4T~]l ffi l]ATr~-- X2Xl' = ] ak÷~Bk+~k