Professional Documents
Culture Documents
Key words, bound constrained optimization, limited memory method, nonlinear optimization, quasi-Newton
method, large-scale optimization
*Received by the editors November 29, 1993; accepted for publication August 3, 1994.
Computer Science Department, University of Colorado at Boulder, Boulder, Colorado 80309 (r+/- chard@cs.
colorado, edu). The work of this author was supported by National Science Foundation grant CCR-9101795,
Army Research Office grant DAAL 03-91-G-0151, and Air Force Office of Scientific Research grant AFOSR-90-
0109.
Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, Illinois 60208
(nocedal@eecs.nwu.edu, ciyou@eecs.nwu.edu). The work of these authors was supported by Na-
tional Science Foundation Grants CCR-9101359 and ASC-9213149, and by Department of Energy Grant DE-FG02-
87ER25047-A004.
1190
A LIMITED MEMORY METHOD 1191
2. Outline of the algorithm. At the beginning of each iteration, the current iterate Xk,
the function value fk, the gradient gk, and a positive definite limited memory approximation
Downloaded 03/09/18 to 134.102.219.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
obtained by projecting the steepest descent direction onto the feasible region, where
We then compute the generalized Cauchy point x c, which is defined as the first local minimizer
of the univariate, piecewise quadratic
qk(t) mk(x(t)).
The variables whose value at x c is at lower or upper bound, comprising the active set ,4(xC),
are held fixed. We then consider the following quadratic problem over the subspace of free
variables:
region, is described in 6. We then evaluate the gradient at Xk+l, compute a new limited
memory Hessian approximation Bk+ and begin a new iteration.
Downloaded 03/09/18 to 134.102.219.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Because in our algorithm every Hessian approximation Bk is positive definite, the ap-
proximate solution 2k+ of the quadratic problem (2.3), (2.4) defines a descent direction
dk 2k+ Xk for the objective function f. To see this, first note that the generalized Cauchy
point x c, which is a minimizer of m k (x) on the projected steepest descent direction, satisfies
m(xk) > mk(X c) if the projected gradient is nonzero. Since the point +
is on a path from
x c to the minimizer of (2.3), along which mk decreases, the value of m at Yk+ is no larger
than its value at x c. Therefore we have
f (x) m(x) > mk(x c) >_ mk(k+l) f (Xk) q" gkT dk -at- -dBkdk.
This inequality implies that gr dk < 0 if B is positive definite and d is not zero. In this paper
we do not present any convergence analysis or study the possibility of zigzagging. However,
given the use of gradient projection in the step computation we believe analyses similar to
those in [7] and [9] should be possible and that zigzagging should only be a problem in the
degenerate case.
The Hessian approximations Bk used in our algorithm are limited memory BFGS matri-
ces (Nocedal [21 and Byrd, Nocedal, and Schnabel [6]). Even though these matrices do not
take advantage of the structure of the problem, they require only a small amount of storage
and, as we will show, allow the computation of the generalized Cauchy point and the sub-
space minimization to be performed in O(n) operations. The new algorithm therefore has
similar computational demands as the limited memory algorithm (L-BFGS) for unconstrained
problems described by Liu and Nocedal 18] and Gilbert and Lemar6chal 14].
In the next three sections we describe in detail the limited memory matrices, the compu-
tation of the Cauchy point, and the minimization of the quadratic problem on a subspace.
3. Limited memory BFGS matrices. In our algorithm, the limited memory BFGS
matrices are represented in the compact form described by Byrd, Nocedal, and Schnabel
[6]. At every iterate Xk the algorithm stores a small number, say m, of correction pairs
{Si, Yi }, k 1 k m, where
These correction pairs contain information about the curvature of the function, and in con-
junction with the BFGS formula, define the limited memory iteration matrix Bk. The question
is how to best represent these matrices without explicitly forming them.
In [6] it is proposed to use a compact (or outer product) form to define the limited memory
matrix Bk in terms of the n x m correction matrices
(3.2) Bk OI Wk Mk W,
where
(3.3) W [ Yk O S 1,
A LIMITED MEMORY METHOD 1193
(3.4) M= [ --Dk L
]_1
Downloaded 03/09/18 to 134.102.219.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
s
o s
and where L k and Dk are the m x m matrices
(Sk-m-l+i) T (Yk-m-l+j) if > j,
(3.5) (Lk)i,j
0 otherwise,
T
(3.6) D, diag [SkLmYk_ m Sk_lYk- ]
(We should point out that (3.2) is a slight rearrangement of [6, (3.5)].) Note that since Mk
is a 2m x 2m matrix, and since rn is chosen to be a small integer, the cost of computing the
inverse in (3.4) is negligible. It is shown in [6] that by using the compact representation (3.2)
various computations involving Bk become inexpensive. In particular, the product of Bk times
a vector, which occurs often in the algorithm of this paper, can be performed efficiently.
There is a similar representation of the inverse limited memory BFGS matrix Hk that
approximates the inverse of the Hessian matrix:
(3.7) Hk I + ff’,/’/, ,
where
and
(Sk-m-l+i) T(yk-rn-l+j) if < j,
(3.8) (Rk)i,j
0 otherwise.
(We note that (3.7) is a slight rearrangement of [6, (3.1)].)
Representations of limited memory matrices analogous to these also exist for other quasi-
Newton update formulae, such as SR1 or DFP (see [6]), and in principle could be used for
Bk in our algorithm. We consider here only BFGS since we have considerable computational
experience in the unconstrained case indicating the limited memory BFGS performs well.
Since the bounds on the problem may prevent the line search from satisfying (2.6) (see
6), cannot guarantee that the condition st yk > 0 always holds (cf. Dennis and Schnabel
we
12]). Therefore to maintain the positive definiteness of the limited memory BFGS matrix,
we discard a correction pair {sk, y} if the curvature condition
4. The generalized Cauchy point. The objective of the procedure described in this
section is to find the first local minimizer of the quadratic model along the piecewise linear
Downloaded 03/09/18 to 134.102.219.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
path obtained by projecting points along the steepest descent direction, Xk tgk, onto the
feasible region. We define x xk and, throughout this section, drop the index k of the
outer iteration, so that g, x, and B stand for gk, Xk, and Bk. We use subscripts to denote the
components of a vector; for example gi denotes the th component of g. Superscripts will be
used to represent iterates during the piecewise search for the Cauchy point.
To define the breakpoints in each coordinate direction we compute
(xi ui)/g ifgi < O,
(4.1) ti (xi li)/gi if gi > 0,
oo otherwise,
and sort {ti, n} in increasing order to obtain the ordered set {t j < t j +, j
n }. This is done by means of a heapsort algorithm ]. We then search along P (x
tg, l, u), a piecewise linear path can be expressed as
Suppose that we are examining the interval [t j-l, J]. Let us define the (j 1)th breakpoint
as
j-1
X X(t j-l)
so that on It j-1 J]
x(t) x j- + Atd j-l
where
j-I
At
and
(4.2) dj-I
i
I/ 0
-gi ift j-1 < ti,
otherwise.
Using this notation we write the quadratic (2.1), on the line segment [x(tJ-1), x(tJ)], as
m(x) f + gr (x x ) + (x x ) r B(x x )
f + gr(zJ-I + Atd j-l) + (z j- + Atd j- I)T B(z j- + AtdJ-1),
where
(4.3) z j-1 X
j-1
X O.
Therefore on the line segment [x(tJ-1), x(tJ)], m(x) can be written as a quadratic in At,
+-l(dJ-lT BdJ-1)At 2
Differentiating rh(At) and equating to zero, we obtain At* --fj’-l/fj’-. Since B is posi-
tive definite, this defines a minimizer provided j- + At* lies on [t j-, J). Otherwise the
generalized Cauchy point lies at x(t j-) if fj’_ >_ 0, and beyond or at x(t j) if "< 0. ff--1
-
If the generalized Cauchy point has not been found after exploring the interval [t j-t J],
we set
and update the directional derivatives fj’ and fj" as the search moves to the next interval. Let us
assume for the moment that only one variable becomes active at J, and let us denote its index
by b. Then tb j, and we zero out the corresponding component of the search direction,
(4.7) dj d j-1 + gbeb,
where eb is the bth unit vector. From the definitions (4.3) and (4.6) we have
fj" d jT Bd
d j-iT Bd j-1
+ 2gbeBd j-l + g2bebrBeb
(4.10) fj’-I + 2gbe Bdj-I + geb r Beb.
The only expensive computations in (4.9) and (4.10) are
T
BZ j T j-1 r
eb eb Bd eb Beb
which can require O(n) operations since B is a dense limited memory matrix. Therefore
it would appear that the computation of the generalized Cauchy point could require O(n 2)
operations, since in the worst case n segments of the piecewise linear path can be examined.
This cost would be prohibitive for large problems. However, using the limited memory BFGS
formula (3.2) and the definition (4.2), the updating formulae (4.9), (4.10) become
where Wbr stands for the bth row of the matrix W. The only O (n) operations remaining in
(4.11) and (4.12) are Wrz j and Wrd j-1. We note, however, from (4.7) and (4.8) that z j and
Downloaded 03/09/18 to 134.102.219.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
d j are updated at every iteration by a simple computation. Therefore if we maintain the two
2m-vectors
0 if ti 0
di :-- -gi otherwise
Ub if d >0
xCbP :--
lb if db < 0
Zb XCbp Xb
+ At p (O (m) operations)
C "= C
(4.13)
end while
told
x;p
For all
:"- told
xi
with ti
C :-- C + AtminP
-
Atmin :-- max{Atmin, 0}
"
A tmin
+ tolddi, /i such that ti >
t, remove from .
The last step of this algorithm updates the 2m-vector c so that upon termination
c W r (x c xk).
This vector will be used to initialize the subspace minimization when the primal direct method
or the conjugate gradient method are used, as will be discussed in the next section.
Our operation counts only take into account multiplications and divisions. Note that
there are no O (n) computations inside the loop. If hint denotes the total number of segments
explored, then the total cost of Algorithm CP is (2rn + 2)n + O(m 2) nint operations plus
n log n operations, which is the approximate cost of the heapsort algorithm ].
5. Methods for subspace minimization. Once the Cauchy point x c has been found, we
proceed to approximately minimize the quadratic model rnk over the space of free variables
and impose the bounds on the problem. We consider three approaches to minimize the model:
a direct primal method based on the Sherman-Morrison-Woodbury formula, a primal itera-
tive method using the conjugate gradient method, and a direct dual method using Lagrange
multipiers. Which of these is most appropriate seems problem dependent, and we have exper-
imented numerically with all three. In all these approaches we first work on minimizing rnk,
ignoring the bounds, and at an appropriate point truncate the move so as to satisfy the bound
constraints.
The following notation will be used throughout this section. The integer denotes the
number of free variables at the Cauchy point xC; in other words there are n variables
at bound at x c. As in the previous section f denotes the set of indices corresponding to
the free variables, and we note that this set is defined upon completion of the Cauchy point
1198 BYRD, LU, NOCEDAL, AND ZHU
computation. We define Zk tO be the n matrix whose columns are unit vectors (i.e., columns
of the identity matrix) that span the subspace of the free variables at x c. Similarly Ak denotes
Downloaded 03/09/18 to 134.102.219.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
(5.2) x xc + Zkl,
-
where is a vector of dimension t. Using this notation, for points of the form (5.2) we can
write the quadratic (2.1) as
mk(X)
(5.3)
_ f + g (x x c + x c x) + (X X + X Xk) T Bk(X X c + X c Xk)
(gk + B (x c Xk)) r (x x c) + (x xC) B (x x c) + F
rc + d
where F is a constant,
/k Zkr Bk Z
is the reduced Hessian of m, and
z (g +/ (x x))
is the reduced gradient of mk at X c. Using (3.2) and (4.13), we can express this reduced
gradient as
(5.4) = zr ( + O(x
- x)- WMc),
which, given that the vector c was saved from the Cauchy point computation, costs (2m
1)t + O(m 2) extra operations. Then the subspace problem (2.3) can be formulated as
(5.5) min nk(l) =--lr c + 1/2lr[kl + F,
+
c
(5.8) c* max{co c < 1, li --X < Ot <_ Ui --X ’}.
Therefore the approximate solution 5 of the subproblem (2.3), (2.4) is given by
It only remains to consider how to perform the computation in (5.7). Since B, is given
Z
by (3.2) and Z, I, the reduced matrix/ is given by
=OI-(ZrW)(MWrZ),
where we have dropped the subscripts for simplicity. Applying the Sherman-Morrison-
Woodbury formula (see for example [22]), we obtain
(5.10) B^-
I 01ZrW (I 1MWrZZrW MWrZ O’
so that the unconstrained subspace Newton direction u is given by
t -1
(5.11) l" c + N Zr -M Wr Z Zr W MW r ZU
0
Given a set of free variables at x c that determines the matrix Z, and a limited memory BFGS
matrix B defined in terms of 0, W, and M, the following procedure implements the approach
just described. Note that since the columns of Z are unit vectors, the operation Z v amounts
to selecting appropriate elements from v. Here and throughout the paper our operation counts
include only multiplications and divisions. Recall that denotes the number of free variables
and that m is the number of corrections stored in the limited memory matrix.
Drect Primal Method.
1. Compute Z by (5.4) ((2m + 1)t + O(m ) operations)
2. v "= W r Z c (2rot operations)
-
(O (m ) operations)
3. v := M v
4. FormN=(I-1MW r ZZ r W)
N := W T ZZ r W (2mt +mt operations)
N := I MN (O(m ) operations)
5. v :-- N-Iv 3)
(O(m operations)
6.
" := r"c + Z r WI
7. Find c* satisfying (5.8)
8. Compute 2i as in (5.9)
(2mr + operations)
(t operations)
(t operations)
The total cost of this subspace minimization step based on the Sherman-Morrison-
Woodbury formula is
(5.12) 2mt + 6mr + 4t + O(m 3)
operations. This is quite acceptable when is small, i.e., when there are few free variables.
However in many problems the opposite is true: few constraints are active and is large. In
this case the cost of the direct primal method can be quite large, but the following mechanism
can provide significant savings.
1200 BYRD, LU, NOCEDAL, AND ZHU
-
:--- + ct2p (t operations)
:= + ct2q (t operations)
Pl :- P2; P2 "--T; / :___ P2/Pl (t operations)
p := + /p (t operations)
goto3
A LIMITED MEMORY METHOD 1201
and restrict xk + d to lie on the subspace of free variables at x by imposing the condition
Ad A (x x)
bko
(Recall that Ak is the matrix of constraint gradients.) Using this notation we formulate the
subspace problem as
(5.15) min gr d + 1/2 d r Bd,
(5.16) subject to AT d bk,
(5.17) _< x +d < u.
We first solve this problem without the bound constraint (5.17). The optimality conditions for
(5.15), (5.16) are
(5.18) gk -b Bkd* q- Ak,* O,
(5.19) ATd k bg.
(A HA);* -A Hg bk.
1202 BYRD, LU, NOCEDAL, AND ZHU
Since the columns of Ak are unit vectors and Ak has full column rank, we see
principal submatrix of Hk. Thus (5.20) determines .*, and hence d* is given by
A HkAk is a
Downloaded 03/09/18 to 134.102.219.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
(In the special case where there are no active constraints, we simply obtain Bd* --gk.) If
the vector Xk + d* violates the bounds (5.17), we backtrack to the feasible region along the
line joining this infeasible point and the generalized Cauchy point x c.
The linear system (5.20) can be solved by the Sherman-Morrison-Woodbury formula.
Using the inverse limited memory BFGS matrix (3.7), and recalling the identity Agr A I,
we obtain
AHAk -I + (Ar)(IQrA).
(We have again omitted the subscripts of M, W, and 0 for simplicity.) Applying the Sherman-
Morrison-Woodbury formula we obtain
A [ Hk A k -1
O1 O A l: l + 01Q "v r A t A iQ ,vrC r A t O
Given gk, a set of active variables at x c that determines the matrix of constraint gradients
Ak, and an inverse limited memory BFGS matrix Hk, the following procedure implements the
dual approach just described. Let us recall that denotes the number of free variables and let
us define ta n t, so that ta denotes the number of active constraints at x c. As before, the
operation counts given below include only multiplications and divisions, and m denotes the
number of corrections stored in the limited memory matrix.
Dual method
If ta O, compute d* --Hkgk ---gk WM gk as follows
to := rgk (0 operations)
to :=/Qto (O (m 2) operations)
d* "= _1-gk llw ((2m + 1)n operations)
If ta > O, compute
1. u -A Hkgk b _l-OAkT gk A T t[T gk b
b :--A (x c Xk) (0 operations)
v := ff, Tgk (0 operations)
v :=/Qw (O(m 2) operations)
u "= A ff’o (2rata operations)
T
u -d Ak gk U b (ta operations)
2. ;* --(AHkAk) lbl --Ou .qt. 0 2 A kT ff’(I + 0 IQff "TAA )-llQff, TAbu
to := I/T Aku (2rata operations)
Form
11 "= OWrAkAW ((2m 2 + m)ta operations)
/ "= I + 37// (O (rn 3) operations)
w "= -1/Qw (O(m 3) operations)
L* "= 02Ar w (2rata operations)
)* "= -Ou +
* (ta operations)
A LIMITED MEMORY METHOD 1203
(Note from (2.2) that P (Xk gk, l, u) Xk is the projected gradient.) The algorithm we tested
is given as follows.
L-BFGS-B Algorithm
Choose a starting point x0, and a integer m that determines the number of limited memory
corrections stored. Define the initial limited memory matrix to be the identity and set k := 0.
1. If the convergence test (6.1) is satisfied stop.
2. Compute the Cauchy point by Algorithm CE
3. Compute a search direction dk by either the direct primal method, the conjugate
gradient method or the dual method.
4. Perform a line search along dk, subject to the bounds on the problem, to compute
a steplength .k, and set xk+ Xk + .kdk. The line search starts with the unit
steplength, satisfies (2.5) with a 10 -4, and attempts to satisfy (2.6) with/3 0.9.
5. Compute Vf(xk+).
1204 BYRD, LU, NOCEDAL, AND ZHU
6. If Yk satisfies (3.9) with eps 2.2 x 10 -16, add Sk and yk to S and Y. If more than
rn updates are stored, delete the oldest column from Sk and Yk.
Downloaded 03/09/18 to 134.102.219.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
7. Update S[ Sk, Y
8. Setk:=k+landgotol.
Yk, L k and R, and set 0 y[ Yk/Y[ Sk.
The line search was performed by means of the routine of Mor6 and Thuente 19] which
attempts to enforce the Wolfe conditions (2.5) and (2.6) by a sequence of polynomial interpola-
tions. Since steplengths greater than one may be tried, we prevent the routine from generating
infeasible points by defining the maximum steplength for the routine as the step to the closest
bound along the current search direction. This approach implies that, if the objective function
is bounded below, the line search will generate a point Xk+l that satisfies (2.5) and either
satisfies (2.6) or hits a bound that was not active at Xk. We have observed that this line search
performs well in practice. The first implementation of our algorithm used a backtracking line
search, but we found that this approach has a drawback. On several problems the backtracking
line search generated steplengths for which the updating condition (3.9) did not hold, and the
BFGS update had to be skipped. This resulted in very poor performance in some problems.
In contrast, when using the new line search, the update is very rarely skipped in our tests, and
the performance of the code was markedly better. To explore this further we compared the
early version of our code, using the backtracking line search, with the L-BFGS code for un-
constrained optimization 18] (which uses the routine of Mor6 and Thuente) on unconstrained
problems, and found that L-BFGS was superior in a significant number of problems. Our
results suggest that backtracking line searches can significantly degrade the performance of
BFGS, and that satisfying the Wolfe conditions as often as possible is important in practice.
Our code is written in double precision FORTRAN 77. For more details on how to
update the limited memory matrices in step 7 see [6]. When testing the routine SBMIN of
LANCELOT [10] we tried three options: BFGS, SR1 and exact Hessians. In all these cases
we used the default settings of LANCELOT.
The test problems were selected from the CUTE collection [4] (which contains some
problems from the MINPACK-2 collection [2]). All bound constrained problems in CUTE
were tested, but some problems were discarded for one of the following reasons: (a) the number
of variables was less than 4; (b) the various algorithms converged to a different solution point;
(c) too many instances of essentially the same problem are given in CUTE; in this case we
selected a representative sample.
The results of our numerical tests are given in Tables 1-3. All computations were per-
formed on a Sun SPARCstation 2 with a 40-MHz CPU and 32-MB memory. We record the
number of variables n the number of function evaluations and the total run time. The limited
memory code always computes the function and gradient together, so that nfg denotes the
total number of function and gradient evaluations. However for LANCELOT the number of
function evaluations is not necessarily the same as the number of gradient evaluations, and in
the tables we only record the number (nf) of function evaluations. The notation F1 indicates
that the solution was not obtained after 999 function evaluations, and F2 indicates that the run
was terminated because the next iterate generated from the line search was too close to the
current iterate to be recognized as distinct. This occurred only in some cases when the limited
memory method was quite near the solution but was unable to meet the stopping criterion. In
Table nact denotes the number of active bounds at the solution; if this number is zero it does
not mean that bounds were not encountered during the solution process.
The differences in the number of function evaluations required by the direct primal and
dual methods are due to rounding errors and are relatively small. Their computing time is
also quite similar due to use of the form described at the end of 5.3. Our computational
experience suggests to us that the conjugate gradient method for subspace minimization is the
A LIMITED MEMORY METHOD 1205
TABLE
Test results of new limited memory method (L-BFGS-B) using primal method for subspace minimization, and
Downloaded 03/09/18 to 134.102.219.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
least effective approach; it tends to take more time and function evaluations. Even though
Table 3 appears to indicate that the cg option results in fewer failures, tests with different
values of rn resulted, overall, in three more failures for the cg method than for the primal
method. The limited memory method is sometimes unable to locate the solution accurately,
and this can result in an excessive number of function evaluations (F1) or failure to make
further progress (F2). The reasons for this are not clear to us, but are being investigated.
The tests described here are not intended to establish the superiority of LANCELOT or
of the new limited memory algorithm, since these methods are designed for solving differ-
ent types of problems. LANCELOT is tailored for sparse or partially separable problems
whereas the limited memory method is well suited for unstructured or dense problems. We
use LANCELOT simply as a benchmark, and for this reason ran it only with its default settings
and did not experiments with its various options to find the one that would give the best results
on these problems. However, a few observations on the two methods can be made.
The results in Table indicate that the limited memory method usually requires more
function evaluations than the SR1 option of Lancelot. (The BFGS option of Lancelot is
1206 BYRD, LU, NOCEDAL, AND ZHU
TABLE 2
Results of new limited memory method, using the primal method for subspace minimization, for various values
Downloaded 03/09/18 to 134.102.219.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Clearly inferior to the SR1 option.) However in terms of computing time the limited memory
method is often the most efficient; this can be seen by examining the problems with at least
50 or 100 variables (which are the only ones for which times are meaningful). To our surprise
we observe in Table 3 that even when using exact Hessians, LANCELOT requires more time
than the limited memory method on many of the large problems. This is in spite of the
fact that the objective function in all these problems has a significant degree of the kind of
partial separability that LANCELOT is designed to exploit. On the other hand, LANCELOT
using exact Hessians requires a much smaller number of function evaluations than the limited
memory method.
Taking everything together, the new algorithm (L-BFGS-B) has most of the efficiency of
the unconstrained limited memory algorithm (L-BFGS) 18] together with the capability of
handling bounds, at the cost of a significantly more complex code. Like the unconstrained
A LIMITED MEMORY METHOD 1207
TABLE 3
Results of new limited memory method using three methods for subspace minimization (primal, dual and cg),
Downloaded 03/09/18 to 134.102.219.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
method, the bound limited memory algorithm’s main advantages are its low computational
cost per iteration, its modest storage requirements, and its ability to solve problems in which
the Hessian matrices are large, unstructured, dense, and unavailable. It is less likely to be
competitive, in terms of function evaluations, when an exact Hessian is available, or when
significant advantage can be taken of structure. However this study appears to indicate that, in
terms of computing time, the new limited memory code is very competitive with all versions
of LANCELOT on many problems.
A code implementing the new algorithm is described in [23] and can be obtained by
anonymous ftp to eecs.nwu.edu. After logging in go to the directory pub/lbfgs.
REFERENCES
[1 A. V. AHO, J. E. HovcRorrr, AND J. D. ULLMAN, The Design and Analysis of Computer Algorithms, Addison-
Wesley, Reading, MA, 1974.
[2] B.M. AVERICK AND J. J. MORI, User Guide for the MINPACK-2 Test Problem Collection, Argonne National
Laboratory, Mathematics and Computer Science Division Report ANL/MCS-TM- 157, Argonne, IL, 199 l.
1208 BYRD, LU, NOCEDAL, AND ZHU
[3] D. P. BERTSEKAS, Projected Newton methods for optimization problems with simple constraints, SIAM J.
Control Optim., 20 (1982), pp. 221-246.
Downloaded 03/09/18 to 134.102.219.75. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
[4] I. BONGARTZ, A. R. CONN, N. I. M. GOULD, AND PH. L. TOINT, CUTE: Constrained and Unconstrained Testing
Environment, Research Report, IBM T.J. Watson Research Center, Yorktown, NY, 1993.
[5] J. V. BURKE, AND J. J. Moal, On the identification of active constraints, SIAM J. Numer. Anal., 25 (1988),
pp. 1197-1211.
[6] R. H. BYRD, J. NOCEDAL, AND R. B. SCHNABEL, Representation of quasi-Newton matrices and their use in
limited memory methods, Math. Programming, 63 (1994), pp. 129-156.
[7] P.n. CALAMAI AND J. J. MORI, Projected gradient methodsfor linearly constrained problems, Math. Program-
ming, 39 (1987), pp. 93-1 6.
[8] A.R. CONN, N. I. M. GOULD, AND PH. L. TOINT, Testing a class of methods for solving minimization problems
with simple bounds on the variables, Math. Comput., 50 (1988), pp. 399-430.
[9] Global convergence of a class of trust region algorithms for optimization with simple bounds, SIAM
J. Numer. Anal., 25 (1988), pp. 433-460.
[10] LAN(ELOT: A FORTRAN package for Large-Scale Nonlinear Optimization, Release A, Springer
Ser. Comput. Math., No. 17, Springer-Verlag, New York, Berlin, 1992.
11] A.R. CONN AND J. MORi, Private communication, 1993.
[12] J. E. DENNIS, JR. AND R. I. SCHNABEL, Numerical Methods for Unconstrained Optimization and Nonlinear
Equations, Prentice-Hall, Englewood Cliffs, NJ, 1983.
[13] Harwell Subroutine Library, Release 10, Advanced Computing Department, AEA Industrial Technology,
Harwell Laboratory, Oxfordshire, UK, 1990.
[14] J. C. GILBERT AND C. LEMARICHAL, Some numerical experiments with variable storage quasi-Newton algo-
rithms, Math. Programming, 45 (1989), pp. 407-436.
[15] P.E. GILL, W. MURRAY, AND M. H. WRIGHT, Practical Optimization, Academic Press, London, 1981.
16] A.A. GOLDSTEIN, Convex programming in Hilbert space, Bull. Amer. Math. Soc., 70 (1964), pp. 709-710.
[17] E. S. LEVITIN AND B. T. POLYAK, Constrained minimization problems, USSR Comput. Math. Math. Phys.,
6 (1966), pp. 50.
18] D.C. LIU AND J. NOCEDAL, On the limited memory BFGS methodfor large scale optimization methods, Math.
Programming, 45 (1989), pp. 503-528.
19] J.J. MORI AND D. J. THUENTE, On Line Search Algorithms with Guaranteed Sufficient Decrease, Mathematics
and Computer Science Division Preprint MCS-P153-0590, Argonne National Laboratory, Argonne, IL,
1990.
[20] J. J. MORI AND G. TORALDO, Algorithms for bound constrained quadratic programming problems, Numer.
Math., 55 (1989), pp. 377-400.
[21 J. NOCEDAL, Updating quasi-Newton matrices with limited storage, Math. Comput., 35 (1980), pp. 773-782.
[22] J.M. ORTEGA AND W. C. R nEINBOLOT, Iterative Solution ofNonlinear Equations in Several Variables, Academic
Press, New York, 1970.
[23] C. ZHU, R. H. BYRD, P. Lu, AND J. NOCEDAL, LBFGS-B: Fortran Subroutinesfor Large-Scale Bound Constrained
Optimization, Report NAM-1 l, EECS Department, Northwestern University, 1994.