This action might not be possible to undo. Are you sure you want to continue?

**Algorithms for Constrained Optimization
**

Methods for solving a constrained optimization problem in n variables and m constraints

can be divided roughly into four categories that depend on the dimension of the space in

which the accompanying algorithm works. Primal methods work in n – m space, penalty

methods work in n space, dual and cutting plane methods work in m space, and

Lagrangian methods work in n + m space. Each of these approaches are founded on

different aspects of NLP theory. Nevertheless, there are strong interconnections between

them, both in the final form of implementation and in performance. The rates of

convergence of most practical algorithms are determined by the structure of the Hessian

of the Lagrangian, much like the structure of the Hessian of the objective function

determines the rates of convergence for most unconstrained methods.

In this appendix, we present several procedures for solving problem (1). The first

is the now classical penalty approach developed by Fiacco and McCormick [1968] and is

perhaps the simplest to implement. The second is Zoutendijk's feasible direction method.

Other primal approaches discussed in the literature include the gradient projection

method and the generalized reduced gradient (GRG) method that is a simplex-like

algorithm. There are many commercial codes that implement these and related

techniques. Sequential linear programming and sequential quadratic programming (SQP),

for example, are two Lagrangian approaches that have proven to be quite effective. SQP

is highlighted at the end of this appendix.

A.1 Penalty and Barrier Methods

The methods that we describe presently, attempt to approximate a constrained

optimization problem with an unconstrained one and then apply standard search

techniques to obtain solutions. The approximation is accomplished in the case of penalty

methods by adding a term to the objective function that prescribes a high cost for

violation of the constraints. In the case of barrier methods, a term is added that favors

points in the interior of the feasible region over those near the boundary. For a problem

with n variables and m constraints, both approaches work directly in the n-dimensional

space of the variables. The discussion that follows emphasizes penalty methods

recognizing that barrier methods embody the same principles.

Consider the problem

Minimize{f(x) : x ∈ S} (23)

where f is continuous function on ℜ

n

and S is a constraint set in ℜ

n

. In

most applications S is defined explicitly by a number of functional

2 Algorithms for Constrained Optimization

constraints, but in this section the more general description in (23) can be

handled. The idea of a penalty function method is to replace problem (23)

by an unconstrained approximation of the form

Minimize{f(x) + cP(x)} (24)

where c is a positive constant and P is a function on ℜ

n

satisfying (i) P(x)

is continuous, (ii) P(x) ≥ 0 for all x ∈ ℜ

n

, and (iii) P(x) = 0 if and only if x

∈ S.

Example 16

Suppose S is defined by a number of inequality constraints: S = {x : g

i

(x)

≤ 0, i = 1,..., m}. A very useful penalty function in this case is

P(x) =

1

2

(max{0,g

i

(x)}

2

i ·1

m

∑

(25)

which gives a quadratic augmented objective function denoted by

(c, x) ≡ f(x) + cP(x).

Here, each unsatisfied constraint influences x by assessing a penalty equal

to the square of the violation. These influences are summed and

multiplied by c, the penalty parameter. Of course, this influence is

counterbalanced by f(x). Therefore, if the magnitude of the penalty term is

small relative to the magnitude of f(x), minimization of (c, x) will almost

certainly not result in an x that would be feasible to the original problem.

However, if the value of c is made suitably large, the penalty term will

exact such a heavy cost for any constraint violation that the minimization

of the augmented objective function will yield a feasible solution.

The function cP(x) is illustrated in Fig. 19 for the one-dimensional

case with g

1

(x) = b – x and g

2

(x) = x – a. For large c it is clear that the

minimum point of problem (24) will be in a region where P is small. Thus

for increasing c it is expected that the corresponding solution points will

approach the feasible region S and, subject to being close, will minimize f.

Ideally then, as c → ∞ the solution point of the penalty problem will

converge to a solution of the constrained problem.

Penalty and Barrier Methods 3

a b x

cP(x)

c = 1

c = 10

c = 100

c = 1

c = 10

c = 100

Figure 19. Illustration of penalty function

Example 17

To clarify these ideas and to get some understanding on how to select the

penalty parameter c, let us consider the following problem.

Minimize f(x) = (x

1

– 6)

2

+ (x

2

– 7)

2

subject to g

1

(x) = –3x

1

– 2x

2

+ 6 ≤ 0

g

2

(x) = –x

1

+ x

2

– 3 ≤ 0

g

3

(x) = x

1

+ x

2

– 7 ≤ 0

g

4

(x) =

2

3

x

1

– x

2

–

4

3

≤ 0

The feasible region is shown graphically in Fig. 20 along with several

isovalue contours for the objective function. The problem is a quadratic

program and the isovalue contours are concentric circles centered at (6,7),

the unconstrained minimum of f(x).

4 Algorithms for Constrained Optimization

0

1

2

3

4

5

0 1 2 3 4 5 6

x

2

x

1

Figure 20. Feasible region for Example 17

Using the quadratic penalty function (25), the augmented objective

function is

(c, x) = (x

1

– 6)

2

+ (x

2

– 7)

2

+ c((max{0, –3x

1

– 2x

2

+ 6})

2

+ (max{0, –x

1

+ x

2

– 3})

2

+ (max{0, x

1

+ x

2

– 7})

2

+ (max{0,

2

3

x

1

– x

2

–

4

3

})

2

).

The first step in the solution process is to select a starting point. A

good rule of thumb is to start at an infeasible point. By design then, we

will see that every trial point, except the last one, will be infeasible

(exterior to the feasible region).

A reasonable place to start is at the unconstrained minimum so we

set x

0

= (6,7). Since only constraint 3 is violated at this point, we have

(c, x) = (x

1

– 6)

2

+ (x

2

– 7)

2

+ c(max{0, x

1

+ x

2

– 7})

2

.

Assuming that in the neighborhood of x

0

the “max” operator returns the

constraint, the gradient with respect to x is

∇

x

(c, x) =

¸

¸

,

_

2x

1

– 12 + 2c(x

1

+ x

2

– 7)

2x

2

– 14 + 2c(x

1

+ x

2

– 7)

.

Setting the elements of ∇

x

(c, x) to zero and solving yields the stationary

point

Penalty and Barrier Methods 5

x

*

1

(c) =

6(1 + c)

1 + 2c

and x

*

2

(c) = 7 –

6c

1 + 2c

as a function of c. For any positive value of c, (c, x) is a strictly convex

function (the Hessian of (c, x) is positive definite for all c > 0), so x

*

1

(c)

and x

*

2

(c) determine a global minimum. It turns out for this example that

the minima will continue to satisfy all but the third constraint for all

positive values of c. If we take the limit of x

*

1

(c) and x

*

2

(c) as c à ∞, we

obtain x

*

1

= 3 and x

*

2

= 4, the constrained global minimum for the original

problem.

Selecting the Penalty Parameter

Because the above approach seems to work so well, it is natural to

conjecture that all we have to do is set c to a very large number and then

optimize the resulting augmented objective function (c, x) to obtain the

solution to the original problem. Unfortunately, this conjecture is not

correct. First, “large” depends on the particular model. It is almost

always impossible to tell how large c must be to provide a solution to the

problem without creating numerical difficulties in the computations.

Second, in a very real sense, the problem is dynamically changing with the

relative position of the current value of x and the subset of the constraints

that are violated.

The third reason why the conjecture is not correct is associated

with the fact that large values of c create enormously steep valleys at the

constraint boundaries. Steep valleys will often present formidable if not

insurmountable convergence difficulties for all preferred search methods

unless the algorithm starts at a point extremely close to the minimum

being sought.

Fortunately, there is a direct and sound strategy that will overcome

each of the difficulties mentioned above. All that needs to be done is to

start with a relatively small value of c and an infeasible (exterior) point.

This will assure that no steep valleys are present in the initial optimization

of (c, x). Subsequently, we will solve a sequence of unconstrained

problems with monotonically increasing values of c chosen so that the

solution to each new problem is “close” to the previous one. This will

preclude any major difficulties in finding the minimum of (c, x) from one

iteration to the next.

6 Algorithms for Constrained Optimization

Algorithm

To implement this strategy, let {c

k

}, k = 1,2,... be a sequence tending to

infinity such that c

k

> 0 and c

k+1

> c

k

. Now for each k we solve the

problem

Minimize{ (c

k

, x) : x ∈ ℜ

n

} (26)

to obtain x

k

, the optimum It is assumed that problem (26) has a solution

for all positive values of c

k

. This will be true, for example, if (c, x)

increases without bounds as ||x|| → ∞.

A simple implementation known as the sequential unconstrained

minimization technique (SUMT), is given below.

Initialization Step: Select a growth parameter > 1, a stopping

parameter > 0, and an initial value of the penalty parameter c

0

.

Choose a starting point x

0

that violates at least one constraint and

formulate the augmented objective function (c

0

, x). Let k = 1.

Iterative Step: Starting from x

k–1

, use an unconstrained search

technique to find the point that minimizes (c

k–1

, x). Call it x

k

and

determine which constraints are violated at this point.

Stopping Rule: If the distance between x

k–1

and x

k

is smaller than (i.e.,

||x

k–1

– x

k

|| < ) or the difference between two successive objective

functions values is smaller than (i.e., |f(x

k–1

) – f(x

k

)| < ), stop with

x

k

an estimate of the optimal solution. Otherwise, put c

k

← c

k–1

,

formulate the new (c

k

, x) based on which constraints are violated at

x

k

, put k ← k+1 and return to the iterative step.

Applying the algorithm to Example 17 with = 2 and c

0

= 0.5 for

eight iterations yields the sequence of solutions given in Table A1. The

iterates x

k

are seen to approach the true minimum point x

*

= (3,4).

Penalty and Barrier Methods 7

Table A1. Sequence of Solutions Using the Penalty Method

k c x

1

x

2

g

3

0 –– 6.00 7.00 6.00

1 0.5 4.50 5.50 3.00

2 1 4.00 5.00 2.00

3 2 3.60 4.60 1.20

4 4 3.33 4.33 0.66

5 8 3.18 4.18 0.35

6 16 3.09 4.09 0.18

7 32 3.05 4.05 0.09

8 64 3.02 4.02 0.04

Implementation Issues

Much of the success of SUMT depends on the approach used to solve the

intermediate problems, which in turn depends on their complexity. One

thing that should be done prior to attempting to solve a nonlinear program

using a penalty function method is to scale the constraints so that the

penalty generated by each is about the same magnitude. This scaling

operation is intended to ensure that no subset of the constraints has an

undue influence on the search process. If some constraints are dominant,

the algorithm will steer towards a solution that satisfies those constraints

at the expense of searching for the minimum.

In a like manner, the initial value of the penalty parameter should

be fixed so that the magnitude of the penalty term is not much smaller than

the magnitude of objective function. If an imbalance exists, the influence

of the objective function could direct the algorithm to head towards an

unbounded minimum even in the presence of unsatisfied constraints. In

either case, convergence may be exceedingly slow.

Convergence

Although SUMT has an intuitive appeal, it is still necessary to prove that it

will converge to the optimum of the original problem (23). The following

lemma is the first component of the proof, and gives a set of inequalities

that follows directly from the definition of x

k

and the inequality c

k+1

> c

k

.

Lemma 1:

(c

k

, x

k

) ≤ (c

k+1

, x

k+1

)

8 Algorithms for Constrained Optimization

P(x

k

) ≥ P(x

k+1

)

f(x

k

) ≤ f(x

k+1

)

Proof:

(c

k+1

, x

k+1

) = f(x

k+1

) + c

k+1

P(x

k+1

) ≥ f(x

k+1

) + c

k

P(x

k+1

)

≥ f(x

k

) + c

k

P(x

k

) = (c

k

, x

k

),

which proves the first inequality. From this we also have

f(x

k

) + c

k

P(x

k

) ≤ f(x

k+1

) + c

k

P(x

k+1

)

f(x

k+1

) + c

k+1

P(x

k+1

) ≤ f(x

k

) + c

k+1

P(x

k

)

Adding these two expressions and rearranging terms yields (c

k+1

–

c

k

)P(x

k+1

) ≤ (c

k+1

– c

k

)P(x

k

) which proves the second inequality. Finally,

f(x

k+1

) + c

k

P(x

k+1

) ≥ f(x

k

) + c

k

P(x

k

) which, in conjunction with the second

inequality, leads to the third.

Lemma 2: Let x

*

be a solution to problem (23). Then for each k

f(x

*

) ≥ (c

k

, x

k

) ≥ f(x

k

)

Proof: f(x

*

) = f(x

*

) + c

k

P(x

*

) ≥ f(x

k

) + c

k

P(x

k

) ≥ f(x

k

).

Global convergence of the penalty method, or more precisely

verification that any limit point of the sequence is a solution, follows

easily from the two lemmas above.

Theorem 10: Let {x

k

} be a sequence generated by the penalty method,

where x

k

is the global minimum of (c

k–1

, x) at the iterative step of the

algorithm. Then any limit point of the sequence is a solution to (23).

Proof: see Luenberger [1984].

Barrier Function

It should be apparent that the quadratic penalty function in (25) is only one

of an endless set of possibilities. While the "sum of the squared

violations" is probably the best type of penalty function for an exterior

method, it is not at all suitable if one desires to conduct a search through a

Penalty and Barrier Methods 9

sequence of feasible or interior points. Let us consider the following

augmented objective function

B(r, x) = f(x) + r ∑

i=1

m

–1

g

i

(x)

(27)

where r > 0 is the barrier parameter. This function is valid only for

interior points such that all constraints are strictly satisfied: g

i

(x) < 0 for

all i.

Equation (27) indicates that the closer one gets to a constraint

boundary, the larger B(r, x) becomes. Indeed, points precisely on any

boundary are not defined. Hence B(r, x) is often called a barrier function

and, in a sense, is opposite to the kind of exterior penalty function

introduced in (25).

As discussed in Chapter 4, the basic idea of interior point methods

is to start with a feasible point and a relatively large value of the

parameter r. This will prevent the algorithm from approaching the

boundary of the feasible region. At each subsequent iteration, the value of

r is monotonically decreased in such a way that the resultant problem is

relatively easy to solve if the optimal solution of its immediate

predecessor is used as the starting point. Mathematically, the sequence of

solutions {x

k

} can be shown to converge to a local minimum in much the

same way that the exterior penalty method was shown to converge earlier.

Example 18

To clarify these ideas consider the problem

Minimize{f(x) = 2x

2

1

+ 9x

2

subject to x

1

+ x

2

≥ 4}.

Using (27) the augmented objective function is

B(r, x) = 2x

2

1

+ 9x

2

+ r

¸

¸

,

_

–1

–x

1

– x

2

+ 4

.

with gradient

∇

x

B(r, x) =

¸

¸

,

_

4x

1

– r(–x

1

– x

2

+ 4)

–2

9 – r(–x

1

– x

2

+ 4)

–2

.

Setting the gradient vector to zero and solving for the stationary point

yields

10 Algorithms for Constrained Optimization

x

1

(r) = 2.25 and x

2

(r) = 0.333 r + 1.75 for all r > 0.

In the limit as r approaches 0, these values become x

1

= 2.25 and x

2

= 1.75

with f(x) = 25.875 which is the optimal solution to the original problem.

Figure 21 depicts the feasible region and several isovalue contours of f(x).

Also shown is the locus of optimal solutions for B(r, x) starting with r

0

=

20 and decreasing to 0.

r → 0

Figure 21. Graphical illustration of the barrier search procedure

A Mixed Barrier-Penalty Function

Equality constraints, though not discussed up until now, can be handled

efficiently with a penalty function. Let us consider the following problem.

Minimize f(x)

subject to h

i

(x) = 0, i = 1,…, p

g

i

(x) ≤ 0, i = 1,…, m

The most common approach to implementing the sequential unconstrained

minimization technique for this model is to form the logarithmic-

quadratic loss function

LQ(r

k

, x) = f(x) – r

k

∑

i=1

m

ln( – g

i

(x)) +

1

r

k

∑

i=1

p

h

i

(x)

2

.

The algorithm finds the unconstrained minimizer of LQ(r

k

, x) over the set

{x : g

i

(x) < 0, i = 1,…, m} for a sequence of scalar parameters {r

k

} strictly

Penalty and Barrier Methods 11

decreasing to zero. All convergence, duality, and convexity results for

LQ(r

k

, x) are similar to those for the pure penalty and barrier functions.

Note that for a given r, the stationarity condition is

∇f(x(r)) –

r

−g

i

(x(r))

i ·1

m

∑

∇g

i

(x(r)) +

2h

i

(x(r))

r

i ·1

p

∑

∇h

i

(x(r)) = 0

Fiacco and McCormick were able to show that as r → 0, r / g

i

(x(r)) →

*

i

,

the optimal Lagrange multiplier for the ith inequality constraint, and

2h

i

(x(r))/r →

*

i

, the optimal Lagrange multiplier for the ith equality

constraint. This result is suggested by the fractions in the above

summations.

Summary

Penalty and barrier methods are among the most powerful class of

algorithms available for attacking general nonlinear optimization

problems. This statement is supported by the fact that these techniques

will converge to at least a local minimum in most cases, regardless of the

convexity characteristics of the objective function and constraints. They

work well even in the presence of cusps and similar anomalies that can

stymie other approaches.

Of the two classes, the exterior methods must be considered

preferable. The primary reasons are as follows.

1. Interior methods cannot deal with equality constraints without

cumbersome modifications to the basic approach.

2. Interior methods demand a feasible starting point. Finding such a

point often presents formidable difficulties in and of itself.

3. Interior methods require that the search never leave the feasible

region. This significantly increases the computational effort

associated with the line search segment of the algorithm.

Although penalty and barrier methods met with great initial

success, their slow rates of convergence due to ill-conditioning of the

associated Hessian led researchers to pursue other approaches. With the

advent of interior point methods for linear programming, algorithm

designers have taken a fresh look at penalty methods and have been able

to achieve much greater efficiency than previously thought possible (e.g.,

see Nash and Sofer [1993]).

12 Algorithms for Constrained Optimization

A.2 Primal Methods

In solving a nonlinear program, primal methods work on the original problem directly by

searching the feasible region for an optimal solution. Each point generated in the process

is feasible and the value of the objective function constantly decreases. These methods

have three significant advantages: (1) if they terminate before confirming optimality

(which is very often the case with all procedures), the current point is feasible; (2) if they

generate a convergent sequence, it can usually be shown that the limit point of that

sequence must be at least a local minimum; (3) they do not rely on special structure, such

as convexity, so they are quite general. Notable disadvantages are that they require a

phase 1 procedure to obtain an initial feasible point and that they are all plagued,

particularly when the problem constraints are nonlinear, with computational difficulties

arising from the need to remain within the feasible region from one iteration to the next.

The convergence rates of primal methods are competitive with those of other procedures,

and for problems with linear constraints, they are often among the most efficient.

Primal methods, often called feasible direction methods, embody the same

philosophy as the techniques of unconstrained minimization but are designed to deal with

inequality constraints. Briefly, the idea is to pick a starting point satisfying the

constraints and to find a direction such that (i) a small move in that direction remains

feasible, and (ii) the objective function improves. One then moves a finite distance in the

determined direction, obtaining a new and better point. The process is repeated until no

direction satisfying both (i) and (ii) can be found. In general, the terminal point is a

constrained local (but not necessarily global) minimum of the problem. A direction

satisfying both (i) and (ii) is called a usable feasible direction. There are many ways of

choosing such directions, hence many different primal methods. We now present a

popular one based on linear programming.

Zoutendijk's Method

Once again, we consider problem (23) with constraint set is S = {x : g

i

(x)

≤ 0, i = 1,…, m}. Assume that a starting point x

0

∈ S is available. The

problem is to choose a vector d whose direction is both usable and

feasible. Let g

i

(x

0

) = 0, i ∈ I, where the indices in I correspond to the

binding constraints at x

0

. For feasible direction d, a small move along this

vector beginning at the point x

0

makes no binding constraints negative,

i.e.,

¹

¹

¹

d

dt

g

i

(x

0

+td)

t=0

= ∇g

i

(x

0

)

T

d ≤ 0, i ∈ I

For a minimization objective, a usable feasible vector has the additional

property that

Primal Methods 13

¹

¹

¹

d

dt

f(x

0

+td)

t=0

= ∇f(x

0

)

T

d < 0

Therefore, the function initially decreases along the vector. In searching

for a "best" vector d along which to move, one could choose that feasible

vector minimizing ∇f(x

0

)

T

d. If some of the binding constraints were

nonlinear, however, this could lead to certain difficulties. In particular,

starting at x

0

the feasible direction d

0

that minimizes ∇f(x

0

)

T

d is the

projection of –∇f(x

0

) onto the tangent plane generated by the binding

constraints at x

0

. Because the constraint surface is curved, movement

along d

0

for any finite distance violates the constraint. Thus a recovery

move must be made to return to the feasible region. Repetitions of the

procedure lead to inefficient zigzagging. As a consequence, when looking

for a locally best direction it is wise to choose one that, in addition to

decreasing f, also moves away from the boundaries of the nonlinear

constraints. The expectation is that this will avoid zigzagging. Such a

direction is the solution of the following problem.

Minimize (28a)

subject to ∇g

i

(x

0

)

T

d –

i

≤ 0, i ∈ I (28b)

∇f(x

0

)

T

d – ≤ 0 (28c)

d

T

d = 1 (28d)

where 0 ≤

i

≤ 1 is selected by the user. If all

i

= 1, then any vector (d, )

satisfying (28b) - (28c) with < 0 is a usable feasible direction. That with

minimum value is a best direction which simultaneously makes ∇f(x

0

)

T

d

and ∇g

i

(x

0

)

T

d as negative as possible; i.e., steers away from the nonlinear

constraint boundaries. Other values of

i

enable one to emphasize certain

constraint boundaries relative to others. Equation (28d) is a normalization

requirement ensuring that is finite. If it were not included and a vector

(d, ) existed satisfying Eqs. (28b) - (28c) with negative, then could be

made to approach –∞, since (28b) - (28c) are not homogeneous. Other

normalizations, such as |d

j

| ≤ 1 for all j, are also possible.

Because the vectors ∇f and ∇g

i

are evaluated at a fixed point x

0

,

the above direction-finding problem is almost linear, the only nonlinearity

being (28d). Zoutendijk showed that this constraint can be handled by a

modified version of the simplex method so problem (28) may be solved

with reasonable efficiency. Note that if some of the constraints in the

original nonlinear program (1) were given as equalities, the algorithm

would have to be modified slightly.

14 Algorithms for Constrained Optimization

Of course, once a direction has been determined, the step size must

still be found. This problem may be dealt with in almost the same manner

as in the unconstrained case. It is still desirable to minimize the objective

function along the vector d, but now no constraint may be violated. Thus t

is determined to minimize f(x

k

+ td

k

) subject to the constraint x

k

+ td

k

∈ S.

Any of the techniques discussed in Section 10.6 can be used. A new point

is thus determined and the direction-finding problem is re-solved. If at

some point the minimum ≥ 0, then there is no feasible direction

satisfying ∇f(x

0

)

T

d < 0 and the procedure terminates. The final point will

generally be a local minimum of the problem. Zoutendijk showed that for

convex programs the procedure converges to the global minimum.

Sequential Quadratic Programming 15

A.3 Sequential Quadratic Programming

Successive linear programming (SLP) methods solve a sequence of linear approximations

to the original nonlinear program. In this respect they are similar to Zoutendijk's method

but they do not require that feasibility be maintained at each iteration. Recall that if f(x)

is a nonlinear function and x

c

is the current value for x, then the first order Taylor series

expansion of f(x) around x

c

is

f(x) = f(x

c

+ ∆x) ≅ f(x

c

) + ∇f(x

c

)(∆x) (29)

where ∆x is the direction of movement. Given initial values for the variables, in SLP all

nonlinear functions are replaced by their linear approximations as in Eq. (29). The

variables in the resulting LP are the ∆x

j

's representing changes from the current values. It

is common to place upper and lower bounds on each ∆x

j

, given that the linear

approximation is reasonably accurate only in some neighborhood of the initial point.

The resulting linear program is solved and if the new point provides an

improvement it becomes the incumbent and the process is repeated. If the new point does

not yield an improvement, the step bounds may need to be reduced or we may be close

enough to an optimum to stop. Successive points generated by this procedure need not be

feasible even if the initial point is. However, the amount of infeasibility generally is

reduced as the iterations proceed.

Successive quadratic programming (SQP) methods solve a sequence of quadratic

programming approximations to the original nonlinear program (Fan et al. [1988]). By

definition, QPs have a quadratic objective function, linear constraints, and bounds on the

variables. A number of efficient procedures are available for solving them. As in SLP,

the linear constraints are first order approximations of the actual constraints about the

current point. The quadratic objective function used, however, is not just the second

order Taylor series approximation to the original objective function but a variation based

on the Lagrangian.

We will derive the procedure for the equality constrained version of a

nonlinear program.

Minimize f(x) (30a)

subject to h

i

(x) = 0, i = 1,...,m (30b)

The Lagrangian for this problem is L (x, ) = f(x) –

T

h(x). Recall that

first order necessary conditions for the point x

c

to be a local minimum of

Problem (30) are that there exist Lagrange multiplies

c

such that

∇L

x

(x

c

,

c

) = ∇f(x

c

) – (

c

)

T

∇h(x

c

) = 0 (31)

and h(x

c

) = 0

16 Algorithms for Constrained Optimization

Applying Newton's method to solve the system of equations (31), requires

their linearization at the current point yielding the linear system

¸

]

1

1

1

∇L

2

x

(x

c

,

c

) –∇

2

h(x

c

)

–∇

2

h(x

c

)

T

0

¸

¸

,

_

∆x

∆

=

¸

¸

,

_

∇L

x

(x

c

,

c

)

h(x

c

)

(32)

It is easy to show that if (∆x, ∆ )) satisfies Eq. (32) then (∆x,

c

+ ∆ ))

will satisfy the necessary conditions for the optimality of the following

QP.

Minimize f(x

c

)(∆x) +

1

2

∆x

T

∇L

2

x

(x

c

,

c

)∆x (33a)

subject to h(x

c

) + ∇h(x

c

)∆x = 0 (33b)

On the other hand, if ∆x

*

= 0 is the solution to this problem, we can show

that x

c

satisfies the necessary conditions (31) for a local minimum of the

original problem. First, since ∆x

*

= 0, h(x

c

) = 0 and x

c

is feasible to

Problem (30). Now, because ∆x

*

solves Problem (33), there exists a

*

such that the gradient of the Lagrangian function for (33) evaluated at ∆x

*

= 0 is also equal to zero; i.e., ∇f(x

c

) – (

*

)

T

∇h(x

c

) = 0. The Lagrange

multipliers

*

can serve as the Lagrange multipliers for the original

problem and hence the necessary conditions (31) are satisfied by (x

c

,

*

).

The extension to inequality constraints is straightforward; they are

linearized and included in the Lagrangian when computing the Hessian

matrix, L, of the Lagrangian. Linear constraints and variable bounds

contained in the original problem are included directly in the constraint

region of Eq. (33b). Of course, the matrix L need not be positive definite,

even at the optimal solution of the NLP, so the QP may not have a

minimum. Fortunately, positive definite approximations of L can be used

so the QP will have an optimal solution if it is feasible. Such

approximations can be obtained by a slight modification of the popular

BFGS updating formula used in unconstrained minimization. This

formula requires only the gradient of the Lagrangian function so second

derivatives of the problem functions need not be computed.

Because the QP can be derived from Newton’s method applied to

the necessary conditions for the optimum of the NLP, if one simply

accepts the solution of the QP as defining the next point, the algorithm

behaves like Newton’s method; i.e., it converges rapidly near an optimum

but may not converge from a poor initial point. If ∆x is viewed as a search

direction, the convergence properties can be improved. However, since

both objective function improvement and reduction of the constraint

infeasibilities need to be taken into accounted, the function to be

minimized in the line search process must incorporate both. Two

possibilities that have been suggested are the exact penalty function and

Sequential Quadratic Programming 17

the Lagrangian function. The Lagrangian is suitable for the following

reasons:

1. On the tangent plane to the active constraints, it has a minimum at

the optimal solution to the NLP.

2. It initially decreases along the direction ∆x.

If the penalty weight is large enough, the exact penalty function also has

property (2) and is minimized at the optimal solution of the NLP.

Relative advantages and disadvantages:

Table A2, taken from Lasdon et al. [1996], summarizes the relative merits

of SLP, SQP, and GRG algorithms, focusing on their application to

problems with many nonlinear equality constraints. One feature appears

as both an advantage and a disadvantage –– whether or not the algorithm

can violate the nonlinear constraints of the problem by relatively large

amounts during the solution process.

SLP and SQP usually generate points yielding large violations of

the constraints. This can cause difficulties, especially in models with log

or fractional power expressions, since negative arguments for these

functions are possible. Such problems have been documented in reference

to complex chemical process examples in which SLP and some exterior

penalty-type algorithms failed, whereas an implementation of the GRG

method succeeded and was quite efficient. On the other hand, algorithms

that do not attempt to satisfy the equalities at each step can be faster than

those that do. The fact that SLP and SQP satisfy all linear constraints at

each iteration should ease the aforementioned difficulties but do not

eliminate them.

There are situations in which the optimization process must be

interrupted before the algorithm has reached optimality and the current

point must be used or discarded. Such cases are common in on-line

process control where temporal constraints force immediate decisions. In

these situations, maintaining feasibility during the optimization process

may be a requirement for the optimizer inasmuch as constraint violations

make a solution unusable. Clearly, all three algorithms have advantages

that will dictate their use in certain situations. For large problems, SLP

software is used most widely because it is relatively easy to implement

given a good LP system. Nevertheless, large-scale versions of GRG and

SQP have become increasingly popular.

18 Algorithms for Constrained Optimization

Table A2. Relative Merits of SLP, SQP, and GRG Algorithms

Algorithm Relative advantages Relative disadvantage

SLP • Easy to implement

• Widely used in practice

• Rapid convergence when

optimum is at a vertex

• Can handle very large

problems

• Does not attempt to satisfy

equalities at each iteration

• Can benefit from

improvements in LP solvers

• May converge slowly on

problems with nonvertex

optima

• Will usually violate nonlinear

constraints until convergence,

often by large amounts

SQP • Usually requires fewest

functions and gradient

evaluations of all three

algorithms (by far)

• Does not attempt to satisfy

equalities at each iteration

• Will usually violate nonlinear

constraints until convergence,

often by large amounts

• Harder than SLP to

implement

• Requires a good QP solver

GRG • Probably most robust of all

three methods

• Versatile--especially good

for unconstrained or linearly

constrained problems but

also works well for nonlinear

constraints

• Can utilize existing process

simulators employing

Newton’s method

• Once it reaches a feasible

solution it remains feasible

and then can be stopped at

any stage with an improved

solution

• Hardest to implement

• Needs to satisfy equalities at

each step of the algorithm

Exercises 19

A.3 Exercises

33. Solve the problem given below with an exterior penalty function method, and then

repeat the calculations using a barrier function method.

Minimize x

2

1

+ 4x

2

2

– 8x

1

– 16x

2

subject to x

1

+ x

2

≤ 5

0 ≤ x

1

≤ 3, x

2

≥ 0

34. Perform 5 iterations of the sequential unconstrained minimization technique using

the logarithmic-quadratic loss function on the problem below. Let x

0

= (0, 0), r

0

= 2

and put r

k+1

← r

k

/2 after each iteration.

Minimize x

2

1

+ 2x

2

2

subject to 4x

1

+ x

2

≤ 6

x

1

+ x

2

= 3

x

1

≥ 0, x

2

≥ 0

35. Repeat the preceding exercise using Zoutendijk’s procedure. Use the normalization

–1 ≤ d

j

≤ 1, j = 1, 2, to permit solution by linear programming.

36. Consider the following separable nonlinear program.

Minimize 5x

2

1

– 10x

1

– 10x

2

log

10

x

2

subject to x

2

1

+ 2x

2

2

≤ 4, x

1

≥ 0, x

2

≥ 0

a. Approximate the separable functions with piecewise linear functions and solve

the resultant model using linear programming. If necessary assume 0log

10

0 = 0.

b. Solve the original problem using an penalty function approach.

c. Perform at least 4 iterations of Zoutendijk’s procedure.

37. Solve the following problem using Zoutendijk’s procedure. Start with x

0

= (0, 3/4).

20 Algorithms for Constrained Optimization

Minimize 2x

2

1

+ 2x

2

2

– 2x

1

x

2

– 4x

1

– 6x

2

subject to x

1

+ 5x

2

≤ 5

2x

2

1

+ x

2

≤ 0

x

1

≥ 0, x

2

≥ 0

38. Solve the relaxation of the redundancy problem when the budget for components is

$500 (the value of C). Use the data in the table below.

Maximize ∏

j=1

n

¸

¸

,

_

1 – (1 – r

j

)

1+x

j

subject to c

j

x

j

j ·1

n

∑

≤ C, x

j

≥ 0, j = 1,…, n

Item, j 1 2 3 4

Reliability, r

j

0.9 0.8 0.95 0.75

Cost per item, c

j

100 50 40 200

39. Consider the following quadratic programming.

Minimize f(x) = 2x

2

1

+ 20x

2

2

+ 43x

2

3

+ 12x

1

x

2

– 16x

1

x

3

– 56x

2

x

3

+ 8x

1

+ 20x

2

+ 6x

3

subject to 3x

1

+ 2x

2

+ 5x

3

≤ 35

x

1

+ 2x

2

+ 3x

3

≥ 5

–x

1

+ 2x

2

– 5x

3

≤ 3

5x

1

– 3x

2

+ 2x

3

≤ 30

x

1

≥ 0, x

2

≥ 0, x

3

≥ 0

a. Write out the KKT conditions then set up the appropriate linear programming

model and solve with a restricted basis entry rule. Is the solution a global

optimum? Explain.

b. Use an NLP code to find the optimum.

Exercises 21

40. Use an NLP code to solve the problem in the preceding exercise but this time

maximize rather than minimize. Because the maximization problem may have local

solutions, try different starting points. Confirm that you have found the global

maximum by solving the problem by hand.

22 Algorithms for Constrained Optimization

Bibliography

Bazaraa, M.S., H.D. Sherali and C. M. Shetty, Nonlinear Programming: Theory and

Algorithms, Second Edition, John Wiley & Sons, New York, 1993.

Fan, Y, S. Sarkar and L. Lasdon, "Experiments with Successive Quadratic Programming

Algorithms," Journal of Optimization Theory and Applications, Vol. 56, No. 3, pp. 359-

383, 1988.

Fiacco, A.V. and G.P. McCormick, Nonlinear Programming: Sequential Unconstrained

Minimization Techniques, John Wiley & Sons, New York, 1968.

Fletcher, R. Practical Methods of Optimization, Second Edition, John Wiley & Sons,

New York, 1987.

R. Horst and H. Tuy, Global Optimization: Deterministic Approaches, Third Edition,

Springer-Verlag, Berlin, 1995.

Lasdon, L., J. Plummer and A. Warren, "Nonlinear Programming," in M. Avriel and B.

Golany (eds.), Mathematical Programming for Industrial Engineers, Chapter 6, pp. 385-

485, Marcel Dekker, New York, 1996.

Luenberger, D.G., Linear and Nonlinear Programming, Second Edition, Addison

Wesley, Reading, MA, 1984.

Nash, S.G. and A. Sofer, “A Barrier Method for Large-Scale Constrained Optimization,”

ORSA Journal on Computing, Vol. 5, No. 1, pp. 40-53, 1993.

Nash, S.G. and A. Sofer, Linear and Nonlinear Programming, McGraw Hill, New York,

1996.

Zoutendijk, G., Methods of Feasible Directions, Elsevier, Amsterdam, 1960.

2

Algorithms for Constrained Optimization constraints, but in this section the more general description in (23) can be handled. The idea of a penalty function method is to replace problem (23) by an unconstrained approximation of the form Minimize{f(x) + cP(x)} (24)

where c is a positive constant and P is a function on ℜn satisfying (i) P(x) is continuous, (ii) P(x) ≥ 0 for all x ∈ ℜn, and (iii) P(x) = 0 if and only if x ∈ S. Example 16 Suppose S is defined by a number of inequality constraints: S = {x : gi(x) ≤ 0, i = 1,..., m}. A very useful penalty function in this case is P(x) =

1 2

∑ (max{0,g (x)}

i =1 i

m

2

(25)

which gives a quadratic augmented objective function denoted by (c,x) ≡ f(x) + cP(x). Here, each unsatisfied constraint influences x by assessing a penalty equal to the square of the violation. These influences are summed and multiplied by c, the penalty parameter. Of course, this influence is counterbalanced by f(x). Therefore, if the magnitude of the penalty term is small relative to the magnitude of f(x), minimization of (c,x) will almost certainly not result in an x that would be feasible to the original problem. However, if the value of c is made suitably large, the penalty term will exact such a heavy cost for any constraint violation that the minimization of the augmented objective function will yield a feasible solution. The function cP(x) is illustrated in Fig. 19 for the one-dimensional case with g1(x) = b – x and g2(x) = x – a. For large c it is clear that the minimum point of problem (24) will be in a region where P is small. Thus for increasing c it is expected that the corresponding solution points will approach the feasible region S and, subject to being close, will minimize f. Ideally then, as c → ∞ the solution point of the penalty problem will converge to a solution of the constrained problem.

Penalty and Barrier Methods cP(x) c=1 c=1 3 c = 10 c = 10 c = 100 c = 100 a b x Figure 19. 20 along with several isovalue contours for the objective function. Illustration of penalty function Example 17 To clarify these ideas and to get some understanding on how to select the penalty parameter c. The problem is a quadratic program and the isovalue contours are concentric circles centered at (6. let us consider the following problem. Minimize f(x) = (x1 – 6)2 + (x2 – 7)2 subject to g1(x) = –3x1 – 2x2 + 6 ≤ 0 g2(x) = –x1 + x2 – 3 ≤ 0 g3(x) = x1 + x2 – 7 ≤ 0 g4(x) = 3 x1 – x2 – 3 ≤ 0 The feasible region is shown graphically in Fig.7). the unconstrained minimum of f(x). 2 4 .

3 x1 – x2 – 3 })2). Since only constraint 3 is violated at this point.4 Algorithms for Constrained Optimization x2 5 4 3 2 1 0 0 1 2 3 4 5 6 x1 Figure 20. x1 + x2 – 7})2. 2 1 2 Setting the elements of ∇x (c. the gradient with respect to x is 2x1 – 12 + 2c(x1 + x2 – 7) ∇x (c. Assuming that in the neighborhood of x0 the “max” operator returns the constraint. Feasible region for Example 17 Using the quadratic penalty function (25). will be infeasible (exterior to the feasible region). –3x1 – 2x2 + 6})2 + (max{0. The first step in the solution process is to select a starting point.x) = (x1 – 6)2 + (x2 – 7)2 + c(max{0.x) = 2x – 14 + 2c(x + x – 7) . A good rule of thumb is to start at an infeasible point.x) = (x1 – 6)2 + (x2 – 7)2 + c((max{0. we have (c.x) to zero and solving yields the stationary point 2 4 . x1 + x2 – 7})2 + (max{0. we will see that every trial point. except the last one. By design then. A reasonable place to start is at the unconstrained minimum so we set x0 = (6. the augmented objective function is (c. –x1 + x2 – 3})2 + (max{0.7).

the problem is dynamically changing with the relative position of the current value of x and the subset of the constraints that are violated. it is natural to conjecture that all we have to do is set c to a very large number and then optimize the resulting augmented objective function (c. Second.Penalty and Barrier Methods 6(1 + c) 6c * * x1(c) = 1 + 2c and x2(c) = 7 – 1 + 2c as a function of c.x). This will assure that no steep valleys are present in the initial optimization of (c.x) is a strictly convex * function (the Hessian of (c. This will preclude any major difficulties in finding the minimum of (c. It turns out for this example that the minima will continue to satisfy all but the third constraint for all * * positive values of c. we * * obtain x1 = 3 and x2 = 4. Steep valleys will often present formidable if not insurmountable convergence difficulties for all preferred search methods unless the algorithm starts at a point extremely close to the minimum being sought. Fortunately. Subsequently. All that needs to be done is to start with a relatively small value of c and an infeasible (exterior) point. It is almost always impossible to tell how large c must be to provide a solution to the problem without creating numerical difficulties in the computations. The third reason why the conjecture is not correct is associated with the fact that large values of c create enormously steep valleys at the constraint boundaries.x) from one iteration to the next. there is a direct and sound strategy that will overcome each of the difficulties mentioned above.x) is positive definite for all c > 0). the constrained global minimum for the original problem. “large” depends on the particular model. If we take the limit of x1(c) and x2(c) as c à ∞. so x1(c) * and x2(c) determine a global minimum. . First. Unfortunately. 5 Selecting the Penalty Parameter Because the above approach seems to work so well. we will solve a sequence of unconstrained problems with monotonically increasing values of c chosen so that the solution to each new problem is “close” to the previous one. For any positive value of c. in a very real sense.x) to obtain the solution to the original problem. this conjecture is not correct. (c.

x). .. use an unconstrained search technique to find the point that minimizes (ck–1. the optimum It is assumed that problem (26) has a solution for all positive values of ck. a stopping parameter > 0. Otherwise. Now for each k we solve the problem Minimize{ (ck. for example..x) based on which constraints are violated at xk. and an initial value of the penalty parameter c0. |f(xk–1) – f(xk)| < ).e.4). if (c. Initialization Step: Select a growth parameter > 1. be a sequence tending to infinity such that ck > 0 and ck+1 > ck.. This will be true. stop with xk an estimate of the optimal solution.x) increases without bounds as ||x|| → ∞. k = 1. A simple implementation known as the sequential unconstrained minimization technique (SUMT).2.6 Algorithm Algorithms for Constrained Optimization To implement this strategy.. put ck ← ck–1. put k ← k+1 and return to the iterative step.x) : x ∈ ℜn} (26) to obtain xk. formulate the new (ck. Iterative Step: Starting from xk–1. Applying the algorithm to Example 17 with = 2 and c0 = 0. The iterates xk are seen to approach the true minimum point x* = (3. Choose a starting point x0 that violates at least one constraint and formulate the augmented objective function (c0.e. let {ck}. is given below.. Let k = 1. Stopping Rule: If the distance between xk–1 and xk is smaller than (i. ||xk–1 – xk|| < ) or the difference between two successive objective functions values is smaller than (i.5 for eight iterations yields the sequence of solutions given in Table A1.x). Call it xk and determine which constraints are violated at this point.

convergence may be exceedingly slow. This scaling operation is intended to ensure that no subset of the constraints has an undue influence on the search process.5 1 2 4 8 16 32 64 x1 6.02 g3 6. the initial value of the penalty parameter should be fixed so that the magnitude of the penalty term is not much smaller than the magnitude of objective function.00 3.33 4.05 4. xk+1) . it is still necessary to prove that it will converge to the optimum of the original problem (23).00 3.02 x2 7.09 3. In a like manner.05 3. In either case.20 0. Convergence Although SUMT has an intuitive appeal. Sequence of Solutions Using the Penalty Method k 0 1 2 3 4 5 6 7 8 Implementation Issues c –– 0.50 4. One thing that should be done prior to attempting to solve a nonlinear program using a penalty function method is to scale the constraints so that the penalty generated by each is about the same magnitude. the influence of the objective function could direct the algorithm to head towards an unbounded minimum even in the presence of unsatisfied constraints.09 4.33 3.18 0.00 4.00 1.00 4. If some constraints are dominant. the algorithm will steer towards a solution that satisfies those constraints at the expense of searching for the minimum.Penalty and Barrier Methods Table A1. and gives a set of inequalities that follows directly from the definition of xk and the inequality ck+1 > ck.50 5.66 0.18 4.00 5.35 0. xk) ≤ (ck+1.18 3. which in turn depends on their complexity. Lemma 1: (ck.09 0.04 7 Much of the success of SUMT depends on the approach used to solve the intermediate problems.00 2.60 4.60 3. The following lemma is the first component of the proof. If an imbalance exists.

Theorem 10: Let {xk} be a sequence generated by the penalty method. Global convergence of the penalty method. Lemma 2: Let x* be a solution to problem (23). xk) ≥ f(xk) Proof: f(x*) = f(x*) + ckP(x*) ≥ f(xk) + ckP(xk) ≥ f(xk).x) at the iterative step of the algorithm. follows easily from the two lemmas above. Barrier Function It should be apparent that the quadratic penalty function in (25) is only one of an endless set of possibilities. xk). Finally.8 Algorithms for Constrained Optimization P(xk) ≥ P(xk+1) f(xk) ≤ f(xk+1) Proof: (ck+1. Then any limit point of the sequence is a solution to (23). in conjunction with the second inequality. f(xk+1) + ckP(xk+1) ≥ f(xk) + ckP(xk) which. or more precisely verification that any limit point of the sequence is a solution. Proof: see Luenberger [1984]. leads to the third. Then for each k f(x*) ≥ (ck. it is not at all suitable if one desires to conduct a search through a . xk+1) = f(xk+1) + ck+1P(xk+1) ≥ f(xk+1) + ckP(xk+1) ≥ f(xk) + ckP(xk) = (ck. where xk is the global minimum of (ck–1. From this we also have f(xk) + ckP(xk) ≤ f(xk+1) + ckP(xk+1) f(xk+1) + ck+1P(xk+1) ≤ f(xk) + ck+1P(xk) Adding these two expressions and rearranging terms yields (ck+1 – ck)P(xk+1) ≤ (ck+1 – ck)P(xk) which proves the second inequality. which proves the first inequality. While the "sum of the squared violations" is probably the best type of penalty function for an exterior method.

x) is often called a barrier function and.x) becomes. the sequence of solutions {xk} can be shown to converge to a local minimum in much the same way that the exterior penalty method was shown to converge earlier.x) = –2 . Indeed. This function is valid only for interior points such that all constraints are strictly satisfied: gi(x) < 0 for all i. Hence B(r. in a sense. 9 – r(–x1 – x2 + 4) Setting the gradient vector to zero and solving for the stationary point yields 2 . the larger B(r. As discussed in Chapter 4.x) = 2x1 + 9x2 + r –x – x + 4. Equation (27) indicates that the closer one gets to a constraint boundary. is opposite to the kind of exterior penalty function introduced in (25).x) = f(x) + r ∑ g (x) i=1 i 9 (27) where r > 0 is the barrier parameter. points precisely on any boundary are not defined. the value of r is monotonically decreased in such a way that the resultant problem is relatively easy to solve if the optimal solution of its immediate predecessor is used as the starting point. Let us consider the following augmented objective function m –1 B(r. At each subsequent iteration. Example 18 To clarify these ideas consider the problem Minimize{f(x) = 2x1 + 9x2 subject to x1 + x2 ≥ 4}. Using (27) the augmented objective function is –1 2 B(r. the basic idea of interior point methods is to start with a feasible point and a relatively large value of the parameter r. This will prevent the algorithm from approaching the boundary of the feasible region. 1 2 with gradient 4x – r(–x – x + 4)–2 1 2 1 ∇xB(r. Mathematically.Penalty and Barrier Methods sequence of feasible or interior points.

….x) starting with r0 = 20 and decreasing to 0.875 which is the optimal solution to the original problem. i=1 p The algorithm finds the unconstrained minimizer of LQ(rk. though not discussed up until now. Figure 21 depicts the feasible region and several isovalue contours of f(x). Let us consider the following problem. In the limit as r approaches 0.25 and x2(r) = 0. p gi(x) ≤ 0. Also shown is the locus of optimal solutions for B(r.x) = f(x) – rk ∑ i=1 m 1 ln( – gi(x)) + r k ∑ hi(x)2 .75 with f(x) = 25. i = 1. m} for a sequence of scalar parameters {rk} strictly .25 and x2 = 1. Minimize f(x) subject to hi(x) = 0. r→0 Figure 21.333 r + 1.75 for all r > 0.10 Algorithms for Constrained Optimization x1(r) = 2. these values become x1 = 2. i = 1. Graphical illustration of the barrier search procedure A Mixed Barrier-Penalty Function Equality constraints.x) over the set {x : gi(x) < 0. m The most common approach to implementing the sequential unconstrained minimization technique for this model is to form the logarithmicquadratic loss function LQ(rk. can be handled efficiently with a penalty function. i = 1.….….

regardless of the convexity characteristics of the objective function and constraints.g. Finding such a point often presents formidable difficulties in and of itself. Interior methods demand a feasible starting point. 3. 1. . Although penalty and barrier methods met with great initial success.x) are similar to those for the pure penalty and barrier functions. and 2hi(x(r))/r → *. Interior methods require that the search never leave the feasible region. With the advent of interior point methods for linear programming. Summary Penalty and barrier methods are among the most powerful class of algorithms available for attacking general nonlinear optimization problems. All convergence. 2. r / gi(x(r)) → *. and convexity results for LQ(rk. their slow rates of convergence due to ill-conditioning of the associated Hessian led researchers to pursue other approaches. This result is suggested by the fractions in the above summations. the optimal Lagrange multiplier for the ith equality i constraint. This significantly increases the computational effort associated with the line search segment of the algorithm. algorithm designers have taken a fresh look at penalty methods and have been able to achieve much greater efficiency than previously thought possible (e. the exterior methods must be considered preferable. Interior methods cannot deal with equality constraints without cumbersome modifications to the basic approach. This statement is supported by the fact that these techniques will converge to at least a local minimum in most cases. duality. see Nash and Sofer [1993]). the stationarity condition is r ∇f(x(r)) – ∑ ∇gi(x(r)) + i =1 −gi (x(r)) m 11 ∑ 2hi (x(r )) ∇hi(x(r)) = 0 r i =1 p Fiacco and McCormick were able to show that as r → 0.. Of the two classes. i the optimal Lagrange multiplier for the ith inequality constraint. The primary reasons are as follows. They work well even in the presence of cusps and similar anomalies that can stymie other approaches.Penalty and Barrier Methods decreasing to zero. Note that for a given r.

Zoutendijk's Method Once again. primal methods work on the original problem directly by searching the feasible region for an optimal solution. such as convexity. m}. Each point generated in the process is feasible and the value of the objective function constantly decreases. We now present a popular one based on linear programming. One then moves a finite distance in the determined direction. embody the same philosophy as the techniques of unconstrained minimization but are designed to deal with inequality constraints. obtaining a new and better point.2 Primal Methods In solving a nonlinear program. i = 1. These methods have three significant advantages: (1) if they terminate before confirming optimality (which is very often the case with all procedures). where the indices in I correspond to the binding constraints at x0. A direction satisfying both (i) and (ii) is called a usable feasible direction.e. Briefly. In general. Assume that a starting point x0 ∈ S is available. hence many different primal methods.12 Algorithms for Constrained Optimization A. the idea is to pick a starting point satisfying the constraints and to find a direction such that (i) a small move in that direction remains feasible.. i∈I For a minimization objective. For feasible direction d. The convergence rates of primal methods are competitive with those of other procedures. we consider problem (23) with constraint set is S = {x : gi(x) ≤ 0. i. There are many ways of choosing such directions. a usable feasible vector has the additional property that . often called feasible direction methods. they are often among the most efficient. particularly when the problem constraints are nonlinear.…. (2) if they generate a convergent sequence. (3) they do not rely on special structure. the terminal point is a constrained local (but not necessarily global) minimum of the problem. so they are quite general. The process is repeated until no direction satisfying both (i) and (ii) can be found. The problem is to choose a vector d whose direction is both usable and feasible. Primal methods. it can usually be shown that the limit point of that sequence must be at least a local minimum. with computational difficulties arising from the need to remain within the feasible region from one iteration to the next. and (ii) the objective function improves. the current point is feasible. i ∈ I. Notable disadvantages are that they require a phase 1 procedure to obtain an initial feasible point and that they are all plagued. a small move along this vector beginning at the point x0 makes no binding constraints negative. Let gi(x0) = 0. d 0 0 T dt gi(x +td)t=0 = ∇gi(x ) d ≤ 0. and for problems with linear constraints.

then any vector (d. ) existed satisfying Eqs. such as |dj| ≤ 1 for all j.e. If it were not included and a vector (d. Thus a recovery move must be made to return to the feasible region. Minimize subject to ∇gi(x0)Td – i ≤ 0. Other values of i enable one to emphasize certain constraint boundaries relative to others. the only nonlinearity being (28d). Equation (28d) is a normalization requirement ensuring that is finite. Such a direction is the solution of the following problem. Because the constraint surface is curved. this could lead to certain difficulties. the function initially decreases along the vector. Repetitions of the procedure lead to inefficient zigzagging. the algorithm would have to be modified slightly. movement along d0 for any finite distance violates the constraint. Zoutendijk showed that this constraint can be handled by a modified version of the simplex method so problem (28) may be solved with reasonable efficiency. steers away from the nonlinear constraint boundaries. one could choose that feasible vector minimizing ∇f(x0)Td. also moves away from the boundaries of the nonlinear constraints. starting at x0 the feasible direction d0 that minimizes ∇f(x0)Td is the projection of –∇f(x0) onto the tangent plane generated by the binding constraints at x0. The expectation is that this will avoid zigzagging. If all i = 1. since (28b) . Note that if some of the constraints in the original nonlinear program (1) were given as equalities. when looking for a locally best direction it is wise to choose one that.. That with minimum value is a best direction which simultaneously makes ∇f(x0)Td and ∇gi(x0)Td as negative as possible. If some of the binding constraints were nonlinear. i. the above direction-finding problem is almost linear. In searching for a "best" vector d along which to move. In particular. As a consequence.(28c) with < 0 is a usable feasible direction. are also possible. in addition to decreasing f. then could be made to approach –∞.(28c) are not homogeneous. (28b) . Because the vectors ∇f and ∇gi are evaluated at a fixed point x0. . ) satisfying (28b) .(28c) with negative.Primal Methods d f(x0+td) = ∇f(x0)Td < 0 dt t=0 13 Therefore. i ∈ I ∇f(x0)Td – d Td = 1 ≤ 0 (28a) (28b) (28c) (28d) where 0 ≤ i ≤ 1 is selected by the user. Other normalizations. however.

Thus t is determined to minimize f(xk + tdk) subject to the constraint xk + tdk ∈ S. . the step size must still be found. This problem may be dealt with in almost the same manner as in the unconstrained case. then there is no feasible direction satisfying ∇f(x0)Td < 0 and the procedure terminates. Zoutendijk showed that for convex programs the procedure converges to the global minimum. once a direction has been determined. Any of the techniques discussed in Section 10. The final point will generally be a local minimum of the problem.6 can be used. If at some point the minimum ≥ 0.14 Algorithms for Constrained Optimization Of course. It is still desirable to minimize the objective function along the vector d. A new point is thus determined and the direction-finding problem is re-solved. but now no constraint may be violated.

Minimize f(x) subject to hi(x) = 0. Successive quadratic programming (SQP) methods solve a sequence of quadratic programming approximations to the original nonlinear program (Fan et al. In this respect they are similar to Zoutendijk's method but they do not require that feasibility be maintained at each iteration. the amount of infeasibility generally is reduced as the iterations proceed. [1988]).m (30a) (30b) The Lagrangian for this problem is L(x. A number of efficient procedures are available for solving them. Recall that first order necessary conditions for the point xc to be a local minimum of Problem (30) are that there exist Lagrange multiplies c such that ∇Lx(xc.. i = 1. and bounds on the variables. As in SLP. the step bounds may need to be reduced or we may be close enough to an optimum to stop. However. is not just the second order Taylor series approximation to the original objective function but a variation based on the Lagrangian. We will derive the procedure for the equality constrained version of a nonlinear program. however. given that the linear approximation is reasonably accurate only in some neighborhood of the initial point. c) = ∇f(xc) – ( c)T∇h(xc) = 0 and h(xc) = 0 (31) . in SLP all nonlinear functions are replaced by their linear approximations as in Eq.. Recall that if f(x) is a nonlinear function and xc is the current value for x. the linear constraints are first order approximations of the actual constraints about the current point. If the new point does not yield an improvement. ) = f(x) – Th(x). Successive points generated by this procedure need not be feasible even if the initial point is. QPs have a quadratic objective function.3 Sequential Quadratic Programming Successive linear programming (SLP) methods solve a sequence of linear approximations to the original nonlinear program.Sequential Quadratic Programming 15 A. Given initial values for the variables.. then the first order Taylor series expansion of f(x) around xc is f(x) = f(xc + ∆x) ≅ f(xc) + ∇f(xc)(∆x) (29) where ∆x is the direction of movement. The resulting linear program is solved and if the new point provides an improvement it becomes the incumbent and the process is repeated. linear constraints. By definition. (29). The quadratic objective function used. It is common to place upper and lower bounds on each ∆xj. The variables in the resulting LP are the ∆xj's representing changes from the current values..

e. even at the optimal solution of the NLP. c)∆x x (33a) (33b) subject to h(xc) + ∇h(xc)∆x = 0 On the other hand. First. the function to be minimized in the line search process must incorporate both. of the Lagrangian. c) ∆x x ∆ = h(xc) (32) It is easy to show that if (∆x. they are linearized and included in the Lagrangian when computing the Hessian matrix. (32) then (∆x. Of course. c) –∇2h(xc) x –∇2h(xc)T 0 ∇L (xc. Two possibilities that have been suggested are the exact penalty function and .e. the convergence properties can be improved. ∇f(xc) – ( *)T∇h(xc) = 0. h(xc) = 0 and xc is feasible to Problem (30). it converges rapidly near an optimum but may not converge from a poor initial point. because ∆x* solves Problem (33). i. since ∆x* = 0. (33b). However. so the QP may not have a minimum. positive definite approximations of L can be used so the QP will have an optimal solution if it is feasible. Fortunately. The extension to inequality constraints is straightforward. The Lagrange multipliers * can serve as the Lagrange multipliers for the original problem and hence the necessary conditions (31) are satisfied by (xc. the algorithm behaves like Newton’s method. *).. since both objective function improvement and reduction of the constraint infeasibilities need to be taken into accounted.16 Algorithms for Constrained Optimization Applying Newton's method to solve the system of equations (31). if one simply accepts the solution of the QP as defining the next point. Linear constraints and variable bounds contained in the original problem are included directly in the constraint region of Eq. requires their linearization at the current point yielding the linear system ∇L2(xc. there exists a * such that the gradient of the Lagrangian function for (33) evaluated at ∆x* = 0 is also equal to zero. ∆ )) satisfies Eq. the matrix L need not be positive definite. i. If ∆x is viewed as a search direction. Such approximations can be obtained by a slight modification of the popular BFGS updating formula used in unconstrained minimization. we can show that xc satisfies the necessary conditions (31) for a local minimum of the original problem. c + ∆ )) will satisfy the necessary conditions for the optimality of the following QP. Because the QP can be derived from Newton’s method applied to the necessary conditions for the optimum of the NLP. if ∆x* = 0 is the solution to this problem. L.. This formula requires only the gradient of the Lagrangian function so second derivatives of the problem functions need not be computed. Now. 1 Minimize f(xc)(∆x) + 2 ∆xT∇L2(xc.

since negative arguments for these functions are possible. One feature appears as both an advantage and a disadvantage –– whether or not the algorithm can violate the nonlinear constraints of the problem by relatively large amounts during the solution process. SLP and SQP usually generate points yielding large violations of the constraints. Relative advantages and disadvantages: Table A2. focusing on their application to problems with many nonlinear equality constraints. all three algorithms have advantages that will dictate their use in certain situations. On the tangent plane to the active constraints.Sequential Quadratic Programming the Lagrangian function. large-scale versions of GRG and SQP have become increasingly popular. It initially decreases along the direction ∆x. In these situations. summarizes the relative merits of SLP. maintaining feasibility during the optimization process may be a requirement for the optimizer inasmuch as constraint violations make a solution unusable. taken from Lasdon et al. This can cause difficulties. it has a minimum at the optimal solution to the NLP. 2. whereas an implementation of the GRG method succeeded and was quite efficient. and GRG algorithms. Such cases are common in on-line process control where temporal constraints force immediate decisions. algorithms that do not attempt to satisfy the equalities at each step can be faster than those that do. Such problems have been documented in reference to complex chemical process examples in which SLP and some exterior penalty-type algorithms failed. The fact that SLP and SQP satisfy all linear constraints at each iteration should ease the aforementioned difficulties but do not eliminate them. [1996]. especially in models with log or fractional power expressions. There are situations in which the optimization process must be interrupted before the algorithm has reached optimality and the current point must be used or discarded. SQP. Clearly. If the penalty weight is large enough. On the other hand. the exact penalty function also has property (2) and is minimized at the optimal solution of the NLP. . Nevertheless. The Lagrangian is suitable for the following reasons: 17 1. SLP software is used most widely because it is relatively easy to implement given a good LP system. For large problems.

18 Algorithms for Constrained Optimization Table A2. and GRG Algorithms Algorithm SLP • • • • • • • Relative advantages Easy to implement Widely used in practice Rapid convergence when optimum is at a vertex Can handle very large problems Does not attempt to satisfy equalities at each iteration Can benefit from improvements in LP solvers Usually requires fewest functions and gradient evaluations of all three algorithms (by far) Does not attempt to satisfy equalities at each iteration Probably most robust of all three methods Versatile--especially good for unconstrained or linearly constrained problems but also works well for nonlinear constraints Can utilize existing process simulators employing Newton’s method Once it reaches a feasible solution it remains feasible and then can be stopped at any stage with an improved solution • • Relative disadvantage May converge slowly on problems with nonvertex optima Will usually violate nonlinear constraints until convergence. often by large amounts SQP • • • • • • • • Will usually violate nonlinear constraints until convergence. SQP. often by large amounts Harder than SLP to implement Requires a good QP solver Hardest to implement Needs to satisfy equalities at each step of the algorithm GRG • • . Relative Merits of SLP.

Consider the following separable nonlinear program. c. Use the normalization –1 ≤ dj ≤ 1. 36. and then repeat the calculations using a barrier function method.Exercises 19 A. 2. Approximate the separable functions with piecewise linear functions and solve the resultant model using linear programming. r0 = 2 and put rk+1 ← rk/2 after each iteration. Start with x0 = (0.3 Exercises 33. x1 ≥ 0. Solve the problem given below with an exterior penalty function method. 37. Solve the original problem using an penalty function approach. If necessary assume 0log 10 0 = 0. Minimize 5x1 – 10x1 – 10x2 log10 x2 subject to x1 + 2x2 ≤ 4. 3/4). b. x2 ≥ 0 a. 0). Perform 5 iterations of the sequential unconstrained minimization technique using the logarithmic-quadratic loss function on the problem below. Minimize x1 + 2x2 subject to 4x1 + x2 ≤ 6 x1 + x2 = 3 x1 ≥ 0. Perform at least 4 iterations of Zoutendijk’s procedure. Repeat the preceding exercise using Zoutendijk’s procedure. x2 ≥ 0 35. 2 2 2 2 2 2 2 . Minimize x1 + 4x2 – 8x1 – 16x2 subject to x1 + x2 ≤ 5 0 ≤ x1 ≤ 3. Solve the following problem using Zoutendijk’s procedure. j = 1. x2 ≥ 0 34. Let x0 = (0. to permit solution by linear programming.

rj Cost per item.8 50 3 0.95 40 4 0. Use an NLP code to find the optimum. j = 1. 2 2 2 . Write out the KKT conditions then set up the appropriate linear programming model and solve with a restricted basis entry rule. 2 Maximize ∏1 – (1 – rj)1+xj j=1 n subject to ∑c x j =1 j n j ≤ C.75 200 Reliability.9 100 2 0. Is the solution a global optimum? Explain. x2 ≥ 0.….20 Algorithms for Constrained Optimization 2 2 Minimize 2x1 + 2x2 – 2x1x2 – 4x1 – 6x2 subject to x1 + 5x2 ≤ 5 2x1 + x2 ≤ 0 x1 ≥ 0. Consider the following quadratic programming. n Item. x3 ≥ 0 a. j 1 0. b. Minimize f(x) = 2x1 + 20x2 + 43x3 + 12x1x2 – 16x1x3 – 56x2x3 + 8x1 + 20x2 + 6x3 subject to 3x1 + 2x2 + 5x3 ≤ 35 x1 + 2x2 + 3x3 ≥ 5 –x1 + 2x2 – 5x3 ≤ 3 5x1 – 3x2 + 2x3 ≤ 30 x1 ≥ 0. Use the data in the table below. xj ≥ 0. cj 39. Solve the relaxation of the redundancy problem when the budget for components is $500 (the value of C). x2 ≥ 0 38.

Use an NLP code to solve the problem in the preceding exercise but this time maximize rather than minimize.Exercises 21 40. . Confirm that you have found the global maximum by solving the problem by hand. Because the maximization problem may have local solutions. try different starting points.

Second Edition.V. 1995. 1984. Second Edition. Sofer. S.." Journal of Optimization Theory and Applications. Fletcher. "Experiments with Successive Quadratic Programming Algorithms. L. 3. Avriel and B. "Nonlinear Programming.G." in M. 5. New York. New York. pp. 1993. M. John Wiley & Sons. D. Sofer. Third Edition.” ORSA Journal on Computing. J. and G.P. Methods of Feasible Directions. 1996. Sarkar and L.. R. Practical Methods of Optimization. MA. pp. 1. 359383. Linear and Nonlinear Programming. S. Springer-Verlag.S. Fan. Vol. A. “A Barrier Method for Large-Scale Constrained Optimization. 56.. Addison Wesley. R. 1968. S. Golany (eds. 1988. Luenberger. pp. Nash. Fiacco. New York. Berlin. G. Lasdon. Amsterdam.G. Sherali and C. Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Horst and H.. John Wiley & Sons. No. Second Edition. Elsevier. 1987. Tuy. Shetty. and A. 1993. Chapter 6. Plummer and A. . Marcel Dekker. New York. 1996. Mathematical Programming for Industrial Engineers. McGraw Hill. H. Vol. Zoutendijk. Global Optimization: Deterministic Approaches. John Wiley & Sons. Nash. 1960. M.D. Reading. New York. Nonlinear Programming: Theory and Algorithms. Linear and Nonlinear Programming. Y.G. 385485. Warren. 40-53. No.). McCormick. Lasdon. and A.22 Algorithms for Constrained Optimization Bibliography Bazaraa.

- IJCAS_v2_n3_pp263-278
- delft _ ex.pdf
- A Sequential Quadratic Programming Algorithm With
- svm tutorial 2003.pdf
- [9] SQP Review
- Group13_SectionE_LocalSearch
- Cec2013 Lsgo Benchmark Tech Report
- Submodular Fot Revised Hal
- OD Nonlinear Programming 2010
- A Comparison of Optimization Software for Mesh Shape-Quality Improvement Problems
- Firefly
- M8L4_LN
- 15325000701603892 powersys
- Fuzzy Nonlinear Programming for Mixed-discrete Design
- Bachelorthesis_JLoetscher
- despacho economico
- ded
- C04
- 2003 - A Tutorial on Support Vector Regression
- Tesis Denise Gonzalez Moreno
- Multiparametric Nonlinear Integer Programming and Explicit Quantized Optimal Control
- Multi Parametric With References
- Efficient Optimization in MPC
- 4904224.pdf
- Radziukyniene i
- Science
- A Comparison of Interior Point and SQP Methods on Optimal Control Problems
- Thesis Optimization
- A tutorial on support vector regression
- sem6
- Algorithms for Constrained Optimization

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd