A FAST ALGORITHM FOR NONLINEARLY CONSTRAINED

OPTIMIZATION CALCULATIONS

M.J.D.

i.

Powell

Introduction
An algorithm for solving the general constrained optimization problem is

presented that combines the advantages of variable metric methods for unconstrained
optimization calculations with the fast convergence of Newton's method for solving
nonlinear equations.

It is based on the work of Biggs (1975) and Han (1975, 1976).

The given method is very similar to the one suggested in Section 5 of Powell (1976).
The main progress that has been made since that paper was written is that through
calculation and analysis the understanding of that method has increased.

The purpose of the algorithm is to calculate the least value of a real
tion F(x), where x is a vector of n real variables,
ci(x) = O,

i = i, 2 . . . . .

func-

subject to the constraints
m~

~

(i.i)

ci(_x) ) O, i = m' + I, m' + 2, ..., m
on the value of x.

We suppose that the objective and constraint functions are

differentiable and that first derivatives can be calculated.

We let

~(x) be the

gradient vector
= X F(~)

(1.2)

and we let N be the matrix whose columns are the normals, ~ci, of the "active
constraints".

The given algorithm is a "variable metric method for constrained optimization".
The meaning of this term is explained in Section 2.
a positive definite matrix of dimension n

Methods of this type require

to be revised as the calculation pro-

It seems to be less than half of the amount of work that is done by the best of the other published algorithms for constrained optimization. The vector ~ that minimizes the functiQn Q(~) = F(~) + d T ~ is calculated. x for many years for uncon- A good survey of their properties strained case is given by Dennis and Mor~ (1977).1) It is used as a search direction in the space of the variables.4) . in Section 5 and the numerical usual for our algorithm to require from poor in Sections 3 and 4 It is applied to some well-known results are excellent. quadratic in the uncon- Each iteration begins with a at which the gradient (1. . Vc m. calculated. to force convergence Suitable techniques are described and thus the recom~nended algorithm examples controls is defined. We follow Han (1976) in seeking extensions order to take account of constraints Then another to this method in on the variables. matrix initially. where ~ (2. theoretical elsewhere 2. When there are just n equality constraints on x and when the matrix of con- straint normals N = ~ ~cl. Variable metric methods starting point ~ A + ½ d r BH (2. using the gradients ~ and ~* = VF(~*). of our method is reported 1977).2) is a positive multiplier whose value depends on the form of the function of one variable (~) = F(x + ~ d)° (2. . properties for constrained optimization Variable metric methods have been used successfully strained optimization calculations. which is often set to the unit defines the current metric. being replaced by the vector ~* = ~ + ~ d .3) The matrix B is revised. . iteration is begun.2) is B say. . Vc 2 . analysis of some of the convergence (Powell. } (2.145 ceeds and they require some step-length starting approximations. in the space of the variables A positive definite matrix.

. theory in Powell is usually obtained. We see that it is unnecessary to add a penalty term to F(~) in order to make the second derivative of the objective which is done in the augmented Lagrangian method important features of the given algorithm numerical results and the convergence linear rate of convergence matrix of the "Lagrangian 1975). > O. consider the simple problem of calculating least value of the function of one variable x~ = I. The reason for mentioning the . For example. the subject to the equation In this case any positive definite matrix B is satisfactory although F(x) has negative curvature. equations This is just Newton's method for solving the constraint if we let ~ n the advantages ci(~) equal one in the definition (2. function positive. m' ~ i = m'+l. When m' is less than of Newton's method and the variable metric algorithms strained optimization are combined by calculating d to minimize for uncon- the function (2.6) J.m which is a quadratic programming problem... m'+2. 2 . vciT~ + c i i = i.. . it may not be possible to 2 F(x. One of the is that there are no penalty terms.5) where the components of ~ are the constraint values (i = i. . One of the main conclusions the matrix B positive definite. Our (1977) show that a super- even when the true second derivative is indefinite. Provided that each matrix B is positive definite and that each new value of x has the form (2. (2. 2.2).1) subject to the linear conditions T Ve i ~ + c i = O.146 has full rank. (2. m').. when there are constraints identify B in this way. However. function" (Fletcher. . we say that an algorithm that calculates d in this way is a "variable metric method for constrained optimization".2).. constraints because of this paper is that it is satisfactory to keep This point is easy to accept when there are no then it is usual for the second derivative matrix of F(~) to be positive definite at the required minimum so B can be regarded as a second derivative approximation.) = -x. . an obvious method of defining the search direction d is to satisfy the equations NTd + c = O. .

2 . . . i = i. ~ say. some pre- liminary scaling may be helpful to the subroutines for matrix calculations that are used by the algorithm.6) on d are inconsistent...1). .. ci < O J- (2.I) are inconsistent. Usually ~ = i. m ) . m'+2.7) ~ci T ~ + e i ~ i > where ~i O. It is that. same as before. m' (2. If this case occurs it does not necessarily imply that the nonlinear constraints (I. Therefore we introduce an extra variable. except for the initial choice of B.6) by the conditions ~ci T ~ + c i ~ = O. However. in which case the calculation is the Any positive value of 9 allows a helpful correction to x of the .8) Thus we modify only the constraints that are not satisfied at the starting point of the iteration. Keeping B positive definite not only helps the quadratic programming calculation that provides ~ but it also allows a basic feature of variable metric methods to be retained. 0 ~ ~ ~ I. For the results reported in Section 5 the search directions d were calculated by the quadratic programming subroutine due to Fletcher (1970). . is that it may happen that the linear conditions (2.147 Lagrangian function is given in the next section. has the value ~i = I. the given method is invariant under linear transformations of the variables. but this subroutine is not entirely satisfactory because it does not take advantage of the fact that B is positive definite. even though it is not relevant to the examples of Section 5.. Another feature of the quadratic programming calculation that ought to be mentioned. into the quadratic programming calculation and we replace the constraints (2. ci > 0 ~ i = ~. Thus the main steps of the algorithm do not require the user to scale the variables and the constraint functions carefully before starting the calculation. i = m'+l. objective We make ~ as large as possible subject to the condition Any freedom that remains in ~ is used to minimize the quadratic function (2.

m'+n equations algorithm in m'+n unknowns. (3. ~ i(i=l.2).. (2. Therefore our method for revising B depends on estimates of Lagrange parameters. that the second derivative information We note that is required by Newton's method is con- tained in the second derivative matrix with respect to x of the function ~(_x. The point that is important ~ ) = F(~) - ~(x). ease. In this case the required vector x satisfies the equality constraints and the equation VF(x) - where m' ~ .. so the algorithm of d are zero.I). It is suitable to let ~ be the vector of Lagrange parameters at the solution of .m ') are the Lagrange parameters. except in special cases. but the point is missed by some published algorithms zation..2) is that second derivatives of for example when all the con- straints are linear.. to the present discussion F(~) alone are not helpful. (3.~i ~ci(~) i = i = O. In this case no small change to the violations of the nonlinear constraints finishes because it is assumed that the constraints are inconsistent. However. The importance of the Lagrangian constraints used for of the Lagrangian This is certainly the for constrained optimi- function is easy to see when all the are equalities and m' < n.1) These conditions provide so-one way of viewing the convergence rate of our is to compare it with Newton's method for solving the equations.. convergence the matrix B has to include some which is gained by the method that is A remark in Section 2 suggests that second derivatives function are more relevant than second derivatives of F(x).2. 3. The revision of B In order to achieve superlinear second derivative revising B. information.7) occurs when ~ it may happen that the only feasible solution of the conditions and the components to x makes a first order improvement (i.148 form (2.

(3. A that Experience with variable metric methods for unconstrained optimization suggests that B should be replaced by a matrix. (3. it can happen that _~T~_ is negative for all non-zero values of J-. In this case the usual methods for revising B would fail to make the matrix B* positive definite.8) because positive definiteness of B and condition (3. .2 @~B ~_> where the factor 0. so it can be calculated quite easily and it changes from iteration to iteration. ~ ) -~x~(~.5) subject to the condition ~-r~ >/ 0. Thus the components of correspond to inactive inequality constraints become zero automatically. but. B* say.8 _TB_ (3. We may use ~ in place of ~ in several of the formulae that are applied by unconstrained optimization algorithms for revising B. < 02 (Powell~ 1976).149 the quadratic programming problem that defines ~.4) When there are no constraints it is possible to choose the step-length ~ equation (2.2 was chosen empirically. that depends on B and on the difference in gradients = ~x ~ ( ~ where ~(_x.3) is the change in variables .2) so that the scalar product ~T_~ is positive. We prefer the BFGS formula B* = E - B~ g_TB + ~] 7]r because it is very successful in unconstrained calculations. (3. ~ +~ .~.< @ . in The methods that are used for revising B suppose that this has been done.< i.6) Thus ~ has the value /> 0. O.2 ~_TB~ . ) is the function (3.2) and where ~ = ~. ~ ).7) . when there are constraints. Therefore we replace ~ ~= that is closest to ~ @~ by the vector of form + (1-O) 36 .6) imply that B* is positive definite and because it gives invariance under changes of scale of the variables. (3. I = I ~T~ o. (3.

However. proportional It suggests II ! ~I / worthwhile. to the active constraints. . quadratic programming of that it may be better to leave B unchanged unless the One reason is that the modification The analysis has a part. not only do we wish to reduce the objective function. ~(~) However. gives a bias against correcting B when 4. ratio d I~ ~ is small. the projection being into the space that is the intersection hyperplanes con- In this case Hence the idea its diagonal elements are small. instead of minimizing This need led F(~). but also we have to satisfy the constraints.1) are satisfied and that is positive for minimizing functions of several variables were extensions were made in order that ~ ( x ) the most successful the only use of = F(~) + P ~ ! ( ~ ) J became differ- technique of this kind being the augmented Lagrangian Han (1975) shows that there is no need for differentiability ~Y~) is to help the choice of the step-length parameter. function. is zero when the constraints Because algorithms applied directly to entiable. length parameter it is used to force convergence =L in equation (2. The method of proof is based on a comparison between projections B and G. (4. implementation of the idea leaves B unchanged.150 Powell (1977) shows that this method of revising B can give superlinear vergence even when the second derivative matrix of the Lagrangian is indefinite. that cuts across the work shows that there is no need for Another reason comes from the fact (2.2) is extremely important because from poor starting approximations. if the curvature of the function a scale-invariant is complicated by the fact that to ~(x). Han's(1976) that. of the tangent This idea was investigated but was found not to be when G is positive definite. the choice of step--length is complicated by the fact that. minimize a function of the form ~(_x) where P [ ~ ( ~ ) ] otherwise. The step-lensth parameter The step.1) is small~ then the solution of the problem that defines d is usually at a vertex. each search direction tangent planes. method. fore we follow his advice and use an objective function of the form if There- . to penalty function methods that. G say.

5) on every iteration may be inefficient. in which case too much weight is given to satisfying the constraints on the variables... 2 .ci(~)]I .2) i=m'+l ! in equation (2. (4. i = i... m._ )..2) satisfies the condition ~(_x*. m)... if ~_ (4.5) He also shows that. m. Therefore he suggests that _~ be a sufficiently large constant vector. ~_~ be However. where. Condition (4. decreases initially when ~ is made positive.. This situation occurs when the initial choice of B is too large. (4.. ½( / ~ i where l-~i is the value of ~i On the first iteration we let On the other iterations we apply the formula + IAil ) ~ .151 ~M~(_x. because there is a contribution to that is proportional to B. because it can happen that on most iterations ~_ is much larger than necessary.. further numerical experimentation shows that it can be advantageous to include positive contributions in the function (4.3) where the components of ~_ are defined later.3) when there are no constraints.4) /. Therefore the following value of /_~_a_ is used in the algorithm that we recommend.~inEO. satisfies condition (4. then convergence to the required vector of variables can be obtained from remote starting approximations. (4. which reduces to expression (2. I A~I ~ ~ i = I. (4. is the vector of Lagrange parameters at the solution of the quadratic programming problem that defines d.2) from some of the inequality constraints that are inactive at the solution of the quadratic programming problem that gives k . that was used on the previous iteration. as in Section 3. ~ I~i I . However.~) < ~(_x. 2 . ~ m~ ) = F(~) + ~ / ~ Ici(x)l i=l i requiring that the value of ~ m + ~ /L. Therefore Powell (1976) suggests letting equal to I_~ on each iteration..5) on every iteration.6) . 2 . Han (1975) proves that this happens if B is positive definite and if the inequalities ~ ~ hold.. Powell (1976) notes that a constant vector ~_ that satisfies condi- tion (4. ~ ) . ~A-i = l~i~ ~i= max (i = I.3) can be obtained if the function (o6) = ~L(~ +~d.

. we cannot guarantee the success of the The present Fortran program includes the following trap to catch some cyclic behaviour i Han's of the iterations. where It depends on a number ~ (~) is the function /~ (4. the numerical calculations This error return has never occurred. m) we let ~ .152 Because /~_ changes on each iteration. An error return is made if A there is a run of five iterations where /~_ remains constant and the minimum value of ~(x.7) and the value of =£ that minimizes Ok (J'~)" For each term in the sequence we test the condition (~/k) and we set the step-length to ~ k i ~ (O) + O . for choosing that is usually the derivative We build a sequence All 0'(0). The first term in the sequence is ={o = i and. ~ k ( k = O. @k(~) say. Therefore. / ~ ) that occurs during each We note the minimum value of sequence of iterations On each iteration and for each value of remains constant. Methods of this type are used frequently zation. i.~ is as follows. (1975) global convergence as was the case when variable metric algorithms theorems for un- were proposed. . do not apply..8) as soon as this inequality is satisfied.. (i = i. . 2 . l = L k (4. the value of depends on the quadratic approximation ~k to 0 (~. that is defined by the equations Ok(O) = 0(0) ~(o) = Ck(~k_l We let ~k { z~ ) = be the greater of O. for k > I. A ~u) for which ~ _ does not decrease. that have been tried suggest that the algorithm does converge satisfactorily from poor starting approximations. constrained optimization given method. .). culated.. ) until it gives a suitable value of ~ . 2 .l~k_ 1 0(~k_ I ) J (4. The procedure ~.. /'k in algorithms for unconstrained optimi- .4). be the greatest value of ~il that has been cal- A ~/~(x.

. . is not always equal to ~'(O). Throughout Thus the algorithm reduces the problem in five unknowns and only two of the equations the calculation a step-length of one is used so Newton's method is being applied to solve the equations. where the nonlinear constraints define surfaces here on experience with three of Colville's Office Parcel problem Colville's (Rosenbrock.153 However~ it should be noted that. five variables and sixteen constraints. that he recommends. J. It is the easiest of the examples because six five constraints are active at the solution.. m) were all linear. What is surprising not that our algorithm is fast but that the algorithms in this case is reported by Colville are so slow.I 3 . Numerical Results The given algorithm has been applied to several test problems. discontinuity EO. the value of ~ if a derivative = O. The numerical results of the next section show that on nearly every iteration the step-length 5. is easy to compute because the gradients ~ and Vci ) information about if the discontinuity to the difference We is expected to occur in ~ ( ~ I~ for then the gradient ~'(O) may give misleading on the interval discontinuities This (i = i. particularly In all cases we set ~ occur if the functions F(x) and difference ci(~) E~(1) occurs near ..~ has the value one. including some that include ripples. Colville's first problem is more interesting because.. at the solution. 2. m) are known at the starting point of the iteration. to the solution of five equations are nonlinear. because of the derivative in the function define /~ differently for O < ~ < ~(~) (4.2).. We report with the Post and with a problem suggested by Powell(1969). (1968) test problems. . given the starting point it is not obvious which of the fifteen constraints Our algorithm identifies these constraints are active successfully on the .~ (O)J that would (i = i. which are identified on every iteration by the quadratic programming calculation that defines ~. 2..1960) third problem includes of them being nonlinear.

2). throughout ~i difference between our algorithm and calculation. which is identified on the second in the second and third variables Similarly calculation and seven linear constraints. freedom. Thus fast convergence There are eleven active constraints. second problem there are fifteen variables Using his infeasible and twenty constraints.2. the calculation. starting point we find the final set of active constraints the tenth iteration. In Colville's is obtained. but they keep the constraint violations which is analogous (i=1. for example). There are four. they are all linear. A comparison of the present algorithm with some other methods Table i. to reduce the number of degrees of freedom in the Similar savings are made by reduced gradient methods 1974. i0) is used. while the number of variables Hence at an early stage the problem is reduced to the minimization of a function of only one variable.. m) in expression to choosing small large values of (4. It is that we is shown in The given figures are the number of function and gradient evaluations . These remarks emphasize an important methods that minimize a function of n variables on every iteration.154 second iteration. is only five. are using active constraints minimization (see Sargent. on is rapid. again the minimization Because of symmetry the problem is really reduced to the minimization so again the rate of convergence in Powell's problem. part of the calculation The nonlinearity of the constraints Hence has to take up only one degree of makes this calculation more testing than the post office parcel problem. The post office parcel problem has three variables but only one constraint of a function of one variable. nonlinear... to is active at the solution. eight of them being Hence in this example the algorithm does have to combine Newton's method for satisfying the constraint conditions with a minimization take up the remaining freedom in the variables. and three nonlinear there is symmetry between the last two variables. iteration when the standard starting point (i0. . i0. which has five variables equality constraints.

even though the three given numbers are obtained by three different algorithms. It is incorrect to infer from the table that some of Colville's algorithms are superior to the augmented Lagrangian method because most of the early algorithms are rather inconsistent. The five test problems have been The initial values of x for the Colville problems are the feasible starting points that he suggests. Two important differences are the way he approaches constraint boundaries and his use of the objective function F(~) instead of the Lagrangian function (3. mentioned already. -i. both in the amount of work and in the final accuracy.2) in order to revise the matrix that we call B. He gives figures for each version and. as in the case of the Colville results. -i) respectively. For each of his test problems We quote the smallest of his figures. three columns of figures are taken from Colville (1975). he gives results for at least ten methods. our table shows the figures that are most favourable to Fletcher's work on each test problem. We do. It can be programmed in an afternoon if one has a . claim that the table shows that the algorithm described in this paper is a very good method of solving constrained optimization calculations with nonlinear constraints. Colville's (1968) report compares most of the algorithms for constrained mini- mization calculations that were available in 1968.155 that are required to solve each constrained minimization problem. He now uses the Lagrangian function to revise B (Biggs. while on the last two problems the starting points are (iO. The results due to Biggs (1972) were calculated by his REQP Fortran program which is similar to the method that we prefer. IO. however. IO) and (-2. 1975). 2~ 2. except that the figures in brackets are the number of iterations. Biggs (1972) and Fletcher In the case of our algorithm we suppose that the solution is found when all the components of x are correct to five significant decimal digits. Fletcher (1975) studies three versions of the augmented Lagrangian method. The first (1968). but this change influences the numerical results only when some of the active constraints are nonlinear.

the matrix calculations of the algorithm may dominate the running time of the program. seems to be unnecessary. Such a program is usually satisfactory for small calculations and for calculations where most of the computer time is spent on function and gradient evaluations. TABLE I Comparison of Algorithms PROBLEM COLVILLE i COLVILLE 13 BIGGS 8 FLETCHER 39 PRESENT 6 (4) (4) COLVILLE 2 112 47 149 (3) 17 (16) COLVILLE 3 23 iO 64 (5) 3 (2) POP -- ii 3O (4) 7 (5) 37 (5) 7 (6) POWELL . In other cases.6). Also the device that depends on the paragraph that follows equation A ~- . such as the form of the change to B that occurs on each iteration.156 quadratic prograrm~ing subroutine available to calculate ~ and -~ . It is usually unneccessary to use the form (2.7) of the linear approximations to the constraints instead of the form (2.6). Therefore the quadratic programming part should be solved by an algorithm that takes advantage of structure. described in (4. however.

Sargent. (1972) "Constrained minimization using recursive equality quadratic programming" in Numerical methods for nonlinear optimization. 75-257 (Dept. J. Madison. M..J.W. M.A. Mathematical Prosrammin$. M. (1976) "Algorithms for nonlinear constraints that use Lagrangian functions". (Amsterdam).W. pp. Lootsma. 175-184. Inst. Biggs. Academic Press (London).R.R. (1975) "An ideal penalty function for constrained optimization".. 19. pp.Po SzegB. Han. (1960) "An automatic method for finding the greatest or the least value of a function". (1969) "A method for nonlinear constraints in minimization problems" in Optimization.C. (1970) "A Fortran subroutine for quadratic prograrmning". ed. Applics. presented at the Ninth International Symposium on Mathematical Prograrmning. North-Holland Publishing Co. Fletcher. Han. Powell. Maths. Vol. M.E. Report No.157 References Biggs. Report No. presented at Nonlinear Programming Symposium 3.J. H. Harwell). Vol.E. Rosenbrock. 320-2949 (IBM New York Scientific Center). (1974) "Reduced-gradient and projection methods for nonlinear programming" in Numerical methods for constrained optimization. ed. M. . (1975) "A globally convergent method for nonlinear programming". Gill and W.C. R6370 (A. Computer Journal. R. Colville. P. of Computer Science. S-P. SIAM Review.D. ii. R. (1975) "Constrained minimization using recursive quadratic programming: some alternative subproblem formulations" in Towards $1obal optimization.H. Murray.C. eds. (1968) "A comparative study on nonlinear programming codes". and Mor~. Powell. pp. R. J.H.D. Budapest. (1977) "The convergence of variable metric methods for nonlinearly constrained optimization calculations". (1977) "Quasi-Newton methods. Report No. A. J. motivation and theory".E. L. R. Dennis. Dixon and G. F. S-P (1976) "Superlinearly convergent variable metric algorithms for general nonlinear prograrmning problems". Vol. Fletcher. Wisconsin. Cornell University). Powell. 3. 46-89. Vol 15. eds. Fletcher. 319-342.E. pp. 263-282.J. Academic Press (London).D. Academic Press (London).