You are on page 1of 27

CHAPTER 7

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


Model Decomposition Algorithms

Out of intense complexities intense simplicities emerge.


Winston Churchill

In the previous chapters we studied algorithms that facilitate parallel


computations due to the structure of their operations. Sometimes, however,
it is the structure of the model of the problem at hand (that is suitable for
some decomposition) that leads to efficient parallel computations. Large-
scale problems often display some characteristic sparsity pattern which is
amenable to decomposition. In time-dependent optimization problems, for
example, one has to optimize a given system at different points in time.
Successive time periods are linked through the flow of inventory. It might
be possible to partially optimize the operations of the system for each time
period, while maintaining a level of inventory consistent with the optimal
operating schedule of successive time periods. In large spatial systems
(e.g., transportation or telecommunication problems) one has to optimize
distinct geographical regions. Adjacent regions are linked through trading
and the flow of traffic. Totally decentralized optimization is not possible,
but it might be possible to partially optimize each region separately, while
restricting the trading between adjacent regions to be consistent with each
region’s optimal state.
Optimization algorithms have been devised over the last fifty years
specifically to deal with such problems; the two most noteworthy exam­
ples are the Dant zig-Wolfe decomposition and Benders decomposition. An
important book by Lasdon (1970) discusses decomposition algorithms for
large-scale optimization problems. A common feature of these algorithms
is that they solve a sequence of (smaller) subproblems to optimize the dis­
tinct components (time periods, regions, etc.), while a coordinating master
problem synthesizes these solutions into an estimate of the overall opti­
mum. The current solution estimate of the master program then defines a
new subproblem and the process repeats iteratively. The subproblems and
the master problem are much smaller than the original program. Hence,
even if the decomposition algorithm requires several iterations between the
master and the subproblems to reach a solution, it is usually faster than
algorithms that attack the original undecomposed problem. We call such

Parallel Optimization, Yair Censor, Oxford University Press (1997), © 1997 by


Oxford University Press, Inc., DOI: 10.1093/9780195100624.003.0007
Sect. 7.1 General Framework of Model Decompositions 191

algorithms here model decomposition algorithms, as they do not solve the


original model of the problem directly, but instead solve a modified decom­
posed variant of the model.
In the early days of parallel optimization, it was anticipated that paral­

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


lelism would substantially speed up model decomposition algorithms. This
has not been the case, however. The speedups observed with decomposition
algorithms such as those of Dantzig-Wolfe or Benders have been modest.
Why is this so? The most successful attempts to parallelize these algo­
rithms (see references in Section 7.3) solve the subproblems in parallel,
but solve the master program on a single processor. There have also been
cases where the solution of the master program was parallelized, but the
efficiency was low. Thus the master program becomes a serial bottleneck:
as the decomposition algorithm iterates the master programs increase in
size and the serial bottleneck becomes more restrictive, as Amdahl’s law
(Definition 1.4.6) dictates.
With a view toward parallelism other model decomposition algorithms
have recently been designed and implemented, which either use a very
simple coordination phase that does not create any serial bottleneck, or
have a coordination phase that is itself suitable for parallel computations.
In this chapter we will discuss one such algorithm. Section 7.1 contains
preliminary discussion on model decompositions and discusses parallel de­
compositions based on linearization or diagonal-quadratic approximations.
Section 7.2 discusses the Linear-Quadratic Penalty (LQP) algorithm for
large-scale structured problems. Notes and references are given in Sec­
tion 7.3.

7.1 General Framework of Model Decompositions


Consider the minimization of a convex, continuously differentiable, block-
separable function F : IRnK —* IR, written as F(x) = A(^fc) where
fk : IRn —> IR for all k = 1,2,..., K. The vector x e IRn/< is the con­
catenation of K subvectors x = ((z1)T, (x2)T,..., (xK)T) where xk €
]Rn for all k = 1,2,..., K. (Boldface letters denote vectors in the product
space IRn/<.) Consider now the following constrained optimization prob­
lem:
Problem [P]:

Minimize F(x) (7.1)

s.t. for all k = 1,2,...,K, (7.2)


x e Q C IRnK. (7.3)

The sets X&, k = 1,2,..., K, and Q are assumed to be closed and convex.
Figure 7.1 illustrates the structure of this problem in two dimensions.
192 Model Decomposition Algorithms Chap. 7

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


Figure 7.1 Constraint sets and set of feasible solutions of
problem [P] in IR2 with k = 2.

If the product set Xi x X2 x ••• x Xk Q Q, then problem [P] can


be solved by simply ignoring the constraints x 6 Q and solving K inde­
pendent subproblems in each of the xk vector variables. In this respect,
the constraints x € Q are complicating (or coupling) constraints. When the
complicating constraints cannot be ignored, a model decomposition applies
a modifier to problem [P] to obtain a problem [Pz] in which the compli­
cating constraints are not explicitly present. It then employs a suitable
algorithm to solve [P']. If the solution to [Pz] is sufficiently close to a so­
lution of the original problem [P] then the process terminates. Otherwise,
the current solution is used to construct a new modified problem and the
process repeats. Figure 1.2 illustrates the model decomposition algorithmic
framework (see Chapter 1).
It is the judicious combination of a modifier and a suitable algorithm
for the solution of the modified problem [P'] that leads to a decomposition
of the original problem [P] suitable for parallel computations. We discuss
in this section modifiers and algorithms suitable for solving the modified
problems.

7.1.1 Problem modifiers


We present now two modifiers for problem [P], drawing on the general
theory developed in Chapter 4.
Sect. 7.1 General Framework of Model Decompositions 193

Modifier I: Penalty or Barrier Functions


The first modifier eliminates the complicating constraints x e fi by using
a penalty or a barrier function. Using a penalty function p : IRnK -> IR
with respect to the set fi defining the complicating contraints (see Defini­

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


tion 4.1.1), the modified problem can be written as:
Problem [P']:

Minimize F(x) + cp(x) (7.4)


s.t. xkeXk, for all fc = 1,2,..., K. (7.5)

We know (see Section 7.3) that it is possible to construct penalty functions


that are exact, i.e., there exists some constant c > 0, such that for c > c
any solution of [P'] is also a solution to [P]. Hence, a solution to [P] can
be obtained by solving [P']. Note that [P'] has a simpler constraint set
than [P] because the complicating constraints x € fi have been removed.
However, problem [P'] still cannot be solved by solving K independent
subproblems, since the function p is not necessarily block-separable (see
Definition 1.3.1 and Section 4.3 for definitions of separability). The next
section explores algorithms that induce separability of this function.
Consider now situations when the set fi has a nonempty interior. Such
sets arise when inequality constraints are used to define them, e.g.,

fi = {jc | gi(x) < 0, for all I = 1,2,..., L} , (7.6)

where gi : IRnAr —» IR. In this case we can use a barrier function (see
Definition 4.2.1) to establish a barrier on the boundary of fi so that the
iterates of an algorithm that starts with an interior point remain in the
interior of the set, therefore satisfying the constraints x e fi. For example,
a barrier function for the set fi defined by (7.6) can be constructed with
the aid of Burg’s entropy (6.149) (see Example 4.2.2) as:
L

1=1

where gi for I = 1, 2,..., L are the functions used in the definition of fi.
With the use of such a barrier function the modified problem is written as:
Problem [P']:

Minimize F(x) 4- cq(x) (7.7)


s.t. xktXk, for all fc = 1,2,..., (7.8)

A solution to the problem [P] can be approximated by solving the modified


problem with the barrier function for a sufficiently small value of the param­
194 Model Decomposition Algorithms Chap. 7

eter c. It is also possible to solve the barrier-modified problem repeatedly


for a sequence of barrier parameters {c^}, such that cy > cp+i > 0. If {se17}
denotes the sequence of solutions of these barrier problems, then it is known
that {x"} converges to a solution of [P] as cy —* 0 (see Theorem 4.2.1).

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


Like the penalty-modified problem [P'], the barrier-modified problem
has a simpler constraint structure than problem [P]. However, it still can­
not be decomposed into independent components since the barrier function
is not necessarily block-separable. The algorithms of the next section can
be used to induce separability of penalty and barrier functions.
Modifier II: Variable Splitting and Augmented Lagrangian
The second modifier first replicates (or splits) the components xk of the
vector x into two copies, one of which is constrained to belong to the set Xk
and the other constrained to satisfy the complicating constraints. Let z e
lRnK denote the replication of a?, where z = ((^1)T, (^2)T, • •., (zK)T)T and
the vector zk C IRn for all k = 1,2,..., K. Consider now the equivalent
split-variable formulation of [P]:
Problem [Split-P]:
K
Minimize (7.9)
fc=i
s.t. xkeXk, for all fc = 1,2,..., K, (7.10)
z e Q, (7.11)
zk=xk, for all k = 1,2,..., K. (7.12)

The constraints zk = xk link the variables that appear in the constraint sets
Xk with the variables that appear in the set Q. An augmented Lagrangian
formulation (see Section 4.4) is now used to eliminate these complicating
constraints. We let tt= ((tt1)t, (tt2)t, ..., (ttk)t)t where 7tk 6 IRn denotes
the Lagrange multiplier vector for the complicating constraints zk = xk,
and let c > 0 be a constant. Then a partial augmented Lagrangian for
(7.9)-(7.12) can be written as:
K K K
£c(x, 7T) = £ /fc(xfc) + £(tt\ zk - xk} + C
- £ II zk - xk ||2 . (7.13)
fc=l k=l k=l
A solution to problem [Split-P] can be obtained by solving the dual prob­
lem:
Problem [P"]:
Maximize <£>c(tt), (7.14)
7reiRnK
where <£>c(tt) = minx/c6xfc, zea C^x^it). This is the modified problem
whose solution yields a solution of [P].
Sect. 7.1 General Framework of Model Decompositions 195

An algorithm for solving convex optimization problems using augmented


Lagrangians is the method of multipliers, which is an instance of the aug­
mented Lagrangian algorithmic scheme (Algorithm 4.4.1). It proceeds by
minimizing the augmented Lagrangian for a fixed value of the Lagrange

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


multiplier vector, followed by a simple update of this vector. Using the
method of multipliers to solve the dual problem [Pff] we obtain the follow­
ing algorithmic scheme:

Algorithm 7.1.1 Method of Multipliers for Solving the Modified


Problem [P"].

Set v = 0. Let 7T° be an arbitrary Lagrange


Step 0: (Initialization.)
multiplier vector.
Step 1: (Minimizing the augmented Lagrangian.)

(®I'+1,zI'+1) = argmin Cc (x, z, (7.15)


xkexk, zcsi

Step 2: (Updating the Lagrange multiplier vector.) For k = 1,2,..., K,


update:
(7rfc)t'+1 = (7rk)1' + c ((zky+1 - G?)"44) • (7.16)
Step 3: Replace v <— v 4-1 and return to Step 1.

The minimization problem in Step 1 has a block-decomposable constraint


set. The problem, however, still cannot be decomposed into K independent
subproblems since the augmented Lagrangian is not block-separable due to
the cross-products {zk, xk} in the quadratic term of (7.13). The next section
explores algorithms that induce separability of this term. Step 2 consists
of simple vector operations that can be executed very efficiently on parallel
architectures.

7.1.2 Solution algorithms


Both modified problems [P'] and [P"] have a block-decomposable con­
straint set, but the objective function is not block-separable. These prob­
lems can be written in the general form:

Minimize 4>(se) = $(a:1,^2,... ,xK) (7.17)


s.t. ® GX = Xi xX2 x ••• x XK. (7.18)

We consider in this section two solution algorithms that—when applied


to problems of this form—give rise to block-separable functions and thus
decompose the problem into K subproblems, which can then be solved
in parallel. The first algorithm uses linear approximations to the non­
linear function These linear approximations are block-separable. The
196 Model Decomposition Algorithms Chap. 7

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


Figure 7.2 Illustration of the Frank-Wolfe linearization algo­
rithm for a function 0(.T| , .7'2)- (The curves denote level sets.)

second algorithm uses a diagonal approximation for cases where $ is a non-


separable quadratic function, such as the last summand appearing in the
augmented Lagrangian (7.13).

Solution Algorithms Based on Linearization


One of the earliest algorithms suggested for the solution of nonlinear op­
timization problems using linear approximations is the Frank-Wolfe algo­
rithm. It uses Taylor’s expansion formula to obtain a first-order approxi­
mation <I> to around the current iterate x\ i.e.,

4>(x) = $(a?p) 4- (V#(a/),a? - x"),

ignoring second- and higher order terms. The Frank-Wolfe algorithm now
minimizes this linear function, subject to the original constraints. The
solution of this linear program, y, is a vertex of the constraint set, which
determines a direction of descent for the original nonlinear function, given
by p = y—xy. The algorithm then performs a one-dimensional search along
this direction to determine the step length where the nonlinear function
attains its minimum. Figure 7.2 illustrates the algorithm.
Applied to problem (7.17)-(7.18) the Frank-Wolfe algorithm is formally
stated below:
Sect. 7.1 General Framework of Model Decompositions 197

Algorithm 7.1.2 The Frank-Wolfe Algorithm

Step 0: (Initialization.) Set v ~ 0. Let x° e X be an arbitrary vector.


Step 1: (Solving linearized subproblem.) Evaluate the first-order approx­

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


imation of $(;r) at xy, i.e., $(a?) = ^(a^) + (V$(a?zy),a? — xy}, and
compute a direction of descent of $ by solving the linear program:

Minimize (V$ {xy\y — xy} (7.19)

s.t. y e X. (7.20)

Let y denote the optimal vertex of the linear program. Then p — y—xy
is a direction of descent for $.
Step 2: (Linesearch.) Compute a step length a * along the direction p that
minimizes the nonlinear function $(a?) by solving the one-dimensional
nonlinear program:

* = argmin $(xy + ap).


ci (7.21)
O<Q<1

Step 3: (Updating the iterate.) Let — xy + a


p
* and return to Step 1.
The subproblem in Step 1 is a linear programming problem over a Cartesian
product of linear constraint sets. It can be solved by solving K independent
linear programs in the variables k = 1,2,... , TV, because although £(a?)
need not to be a block-separable function, the objective function in (7.19)
is. Denoting the subvector of V$(a?), which is calculated with respect to
xk, by Va.fc^(se) we obtain

K
(V$(^),y - xv) = ^V^^)"),/ - (xkYY (7.22)
fc=l

The independent linear subproblems are then:

Minimize (Vxk$((xkY),yk - {xk)u} (7.23)

s.t. yk G Xk. (7.24)

The problem in Step 2 is a nonlinear program in a single bounded


variable. For large-scale problems the calculations of Step 2 are insignificant
in comparison to the calculations of Step 1, and the algorithm parallelizes
efficiently.
It is well-known that the Frank-Wolfe algorithm can zigzag toward the
solution. If the optimal solution lies on a facet of the constraint set, the
algorithm cannot reach it by a linesearch between one vertex (i.e., y) and
198 Model Decomposition Algorithms Chap. 7

an interior point (i.e., xy). A mild version of this effect is illustrated in


Figure 7.2 where the optimal solution is close to a facet. The simplicial
decomposition algorithm (described next) avoids the zigzagging effect of the
Frank-Wolfe algorithm. It uses information contained in multiple vertices

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


generated during successive iterations of the algorithm and solves a larger
nonlinear master program rather than the simple linesearch. We describe
next the simplicial decomposition algorithm.
Let y = {y1 ,y2,... ,yp} denote the set of vertices of the feasible region
X generated during the first v iterations of the Frank-Wolfe algorithm.
The convex hull of T (i.e., the set defined by all convex combinations of
the vertices) is
V v
conv(» = {y = wiy1 | yl e V, wi > 0, I = 1,2,..., z/, Wl = 1 }>
i=i 1=1
(7.25)
which is a subset of the feasible set X. Simplicial decomposition generates
vertices as in the Frank-Wolfe algorithm by solving linear programming
subproblems. It then optimizes the original objective function by solving
a master program over the set conv (T). The dimension of the master pro­
gram is usually much smaller than the dimension of the original program.
The master program also has a simple constraint structure—consisting of a
single equality constraint and bounds on the variables— which can be ex­
ploited to convert the problem into a locally unconstrained program. Now
the problem can be solved using standard unconstrained optimization tech­
niques, and it can also be solved inexactly. Vertices that do not contribute
to the representation of the optimal solution of the master program can be
removed, thereby reducing the size of the master program.
Applied to problem (7.17)—(7.18), the simplicial decomposition algo­
rithm is described next.
Algorithm 7.1.3 The Simplicial Decomposition Algorithm

Step 0: (Initialization.) Set z/ = 0. Let x° e X be an arbitrary vector.


Let y = 0 be the set of vertices and its cardinality v = 0.
Step 1: (Solving linearized subproblem.) Evaluate the first-order approx­
imation <t>(x) at xy, i.e., <t>(x) = ^(x17) 4- (V^x"), x - xy), and com­
pute a direction of descent of $ by solving the linear programming
problem:

Minimize (V<£(a?p), y - xy) (7.26)

s.t. y e X. (7.27)

Let yy denote the optimal solution of this linear program. Update the
set of vertices y <— y U {^p}, and its cardinality v <— v 4-1.
Sect. 7.1 General Framework of Model Decompositions 199

Step 2: (Solving the nonlinear master program.) Optimize the nonlinear


function <£ over the convex hull of y (equation (7.25)). That is, com­
pute
v \
(

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


(7.28)
1^1 /

where yl e y, for all I = 1,2,..., v, and


V
Wv = {w = (w/) e IRt’ I ^2 = 1, > 0, for all I = 1,2,..., v}.
i=i

V
Step 3: (Updating the iterate.) Let Update the set of
i=i
vertices y by deleting from it any vertices with zero weight in the
representation of xy+i, i.e., set y <— y\{yl | wf = 0,1 < I < u} and
let v = card (}>). Set v v 4-1 and return to Step 1.
At Step 1 the algorithm solves a linear program with a Cartesian prod­
uct of linear constraint sets. This subproblem can be decomposed, and its
components solved independently and in parallel:
Step 1 (alternate): (Solving the decomposed linearized subproblems.)
Let Vdenote the subvector of the gradient vector V3 *(x)
corresponding to the A?th block of x evaluated at the current iterate
xy. For each k = 1,2,..., K, solve

Minimize (V^G?)"), yk - (xky) (7.29)


?/fceiRn

s.t. yk e Xk. (7.30)

Let (yk)y denote the optimal solution of this linear program and form
yy as the concatenation of the subvectors {(yl)y | I = 1,2,..., K }.
The nonlinear master program in Step 2 is much smaller in size than the
original problem. Typically, the number of vertices upon termination of the
algorithm does not exceed one hundred. Furthermore, it has a simple struc­
ture, i.e., a simplex equality constraint, namely = L and bounds
on the variables 0 < wi < 1. This structure can be exploited by designing an
unconstrained optimization procedure for its solution as follows. Using the
simplex equality constraint we can substitute wv with wv — 1 — wi.
Then we can write the master program (7.28) as:

v—1
w* = argmin $(yv + V wi(yl - yu)). (7.31)
{0<w/<l|Z=l,2,...,v —1} TT
200 Model Decomposition Algorithms Chap. 7

Recall that at the current iteration we have v — 1 active vertices (i.e.,


wi > 0 for I = 1,..., v — 1) and the last vertex yv lies along a direction of
descent. Hence the nonlinear master program is locally unconstrained in
the neighborhood of the current iterate xy. Any unconstrained optimiza­

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


tion algorithm can be used to compute a descent direction, followed by a
simple test to determine the maximum allowable step that will keep the w’s
within the bounds. Note that the evaluation of the objective function in­
volves operations on dense vectors, i.e, the vectors yl, / = 1,2,..., u. Such
operations parallelize naturally and the solution of the master program is
also amenable to parallel computations.

Solution Algorithms Based on Diagonalization


Consider now a special structure of the objective function (7.17) arising
from the modified problem [P"]. In particular, we assume that $(•) can be
written as
K
^x,Z) = ^\\xk-zk\\2 . (7.32)

This is the structure of the quadratic terms of the augmented Lagrangi­


an (7.13). (We consider only the quadratic terms of the augmented La­
grangian, since these are the nonseparable terms that prevent us from
decomposing the minimization of Step 1 of Algorithm 7.1.1 into K in­
dependent subproblems.) We will approximate this nonseparable function
using a separable quadratic function. The term diagonal quadratic approx­
imation is also used, which indicates that the Hessian matrix of $(x, z) is
approximated by a diagonal matrix.
The terms in (7.32) can be expanded as

|| xk - zk ||2 = || xk ||2 + || zk ||2-2(^,^),

and we only discuss the cross-product terms (xk,zk) for k = 1,2,... ,K.
Using Taylor’s expansion formula we obtain a first-order approximation of
the cross-product terms around the current iterate as:

(xk,zk)«(xk, (zky) - {(xky, (zkyj + {(xky,zk),

for k = 1,2,...,JC With this approximation of its cross-product terms,


the function (7.32) is approximated by the expression:

K k k

E
fc=l
ii - {zky ii2 -E
k=l
ii <xkr - (zkr n2 +E n (zT -zk n2 •
fc=l

A solution algorithm, based on diagonalization solves problem [Pff] us­


ing the method of multipliers (Algorithm 7.1.1) but instead of minimizing
Sect. 7.2 The Linear-Quadratic Penalty Algorithm 201

the augmented Lagrangian in Step 1 it minimizes the diagonal quadratic


approximation. This approximation is block-separable into the variable
blocks xk,k = 1,2,... ,K, and the minimization is decomposed into K
independent problems. These problems can be solved in parallel.

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


7.2 The Linear-Quadratic Penalty (LQP) Algorithm
Many different model decomposition algorithms can be designed using
problem modifiers based on penalty or barrier methods and then followed
by the use of linearization. We discuss here one such algorithm based on
a linear-quadratic penalty (LQP) function. The algorithm has been shown
to be efficient for large-scale, structured optimization problems. It has also
been implemented on different parallel architectures.
We consider the problem:

Minimize /o(#) (7.33)


s.t. fk(xk) < 0, for all k = 1,2,..., K, (7.34)
gfix) < 0, for all Z = 1,2,..., L. (7.35)

The functions fk : IRn —> IR, k = 0,1,...,JC and gi : lRnK —* IR, for
I = 1,2,..., L are convex and continuously differentiable. The constraints
(7.34) decompose into blocks, one for each subvector xk, while constraints
(7.35) are complicating. Let X = X± x X2 x • • • x Xk be a product of the
sets Xk = {xk € IRn | fk(%k) < 0} for all k = 1, 2,..., K and assume that
X is a compact set. We further make the following assumptions:
Assumption 7.2.1 Problem (7.33)-(7.35) has a nonempty and compact
optimal solutions set.

Assumption 7.2.2 Problem (7.33)-(7.35) has at least one feasible solution


that satisfies all the constraints with strict inequality.
Under these assumptions a Kuhn-Tucker vector (i.e., a Lagrange mul­
tiplier vector as defined in Rockafellar (1970, p. 274)) exists for problem
(7.33)-(7.35).
Consider now the £i-norm penalty function p : IR —> IR given by:

p(t) = p max(0, Z), (7.36)

where t is a scalar variable, and p is a positive constant (see Figure 7.3).


We want to obtain a solution to (7.33)-(7.35) by solving the following exact
penalty problem:
l \

(foW+ ^p(gi(xy)}.
1=1 /
(7.37)
202 Model Decomposition Algorithms Chap. 7

It is known (see, e.g., Bertsekas (1982, Chapter 4)) that under the as­
sumptions stated above there exists a penalty parameter p for which the
optimal solutions to (7.37) and (7.33)-(7.35) coincide. In particular, if the
penalty parameter is larger than a threshold value p* given by the largest

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


component of a Lagrange multiplier vector of (7.33)-(7.35), then a solution
to (7.37) is also a solution to (7.33)-(7.35). However the £i-norm exact
penalty function is nondifferentiable, and this precludes the application of
gradient-based descent methods for the solution of the exact penalty prob­
lem. In order to gain access to gradient-based minimization techniques for
solving (7.37) we consider an e-smoothing of the function p around t = 0.
In particular we introduce the following e-smoothed function pe:

0, if t < 0,
t2
Pe(0 = < ifO<t<e, (7.38)
|)> if < > e,

where e is a positive scalar; see Figure 7.3. It is easy to see that

lim pe(t) = p(t), (7.39)

and, furthermore, this convergence is uniform, that is, given an arbitrary


T] > 0 we can find a <5 > 0 such that if 0 < e < 6 then | pe(t) — p(t) | < 77,
for all t (i.e., b does not depend on f).
With the introduction of e-smoothing we obtain a continuously differen­
tiable penalty function that can be optimized using algorithms such as the
simplicial decomposition Algorithm 7.1.3. Furthermore, the e-smooihing
device provides a natural mechanism for handling “soft” constraints that
need not be satisfied exactly.
The use of e-smoothing might introduce some approximation error be­
cause an optimum point of the original problem is not necessarily an opti­
mum of the smoothed penalty problem, even for penalty parameter values
larger than the threshold value. However, an a priori upper bound to this
error can be computed as a function of the penalty parameters. It is also
possible to compute a solution that is feasible to within any given e > 0 for
given values of the penalty parameter. Such a solution is termed e-feasible.

Definition 7.2.1 (c-feasibility) Given some e > 0, a vector x e X is


e-feasible for problem (7.33)-(7.35) if gi(x) < e, for all I = 1,2,..., L.
We first describe the linear-quadratic penalty (LQP) algorithm that
uses the smoothed penalty function pe(t) instead of the exact penalty p(i) in
(7.37). We then proceed with the analysis of the properties of e-smoothing,
and derive bounds on the difference between the solution of the smoothed
Sect. 7.2 The Linear-Quadratic Penalty (LQP) Algorithm 203

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


Figure 7.3 The /h-norm penalty function and the linear-
quadratic smooth penalty function defined by (7.38).

penalty problem and the original problem (7.33)-(7.35).


Define first the objective function for the exact penalty problem
L
F(x, n) = f0(x) + ^2p(gi(xy), (7.40)
z=i

and then express the objective function for the c-smoothed penalty problem

L
F(x,p,e) = f0(x)+ '^/Pe(.gi(x)). (7.41)
1=1

The LQP algorithm can now be described in detail:

Algorithm 7.2.1 The Linear-Quadratic Penalty (LQP) Algorithm


Step 0: (Initialization.) Set v — 0, set the initial parameter values //q >
0, 6q > 0, and define some €min > 0.
Step 1: Solve the problem

min F(x,
x^x

and let x" denote its optimal solution.


Step 2: If x" is e^-feasible and ey < Cmin, then stop. Otherwise, update
the penalty parameters /iy and ey according to the rules given below,
set v v 4-1 and go to Step 1.
The parameter emin > 0 is a user-determined final feasibility tolerance.
At Step 1 the algorithm solves the modified penalty problem, whereby the
204 Model Decomposition Algorithms Chap. 7

complicating constraints have been placed in the objective function, and


at Step 2 the modifier is updated. Two situations may arise:
• The point xy is c^-feasible but ey > ernin. In this case the penalty
parameter remains unchanged, i.e., py^\ = py and ey is decreased

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


by replacing it with = rp maxi</<£ ui where ui = g/.(xy) for all
I = 1,2,..., L, and 0 < z/i < 1 is a user-specified fixed parameter.
• The point xy is not c^-feasible. This indicates that the value of the
penalty parameter py is not large enough and must be increased by
replacing it with py±i = rppy where 772 > 1 is a user-specified fixed
parameter. The feasibility tolerance remains unchanged, i.e., ey^i =

7.2.1 Analysis of the c-smoothed linear-quadratic penalty function


We now analyze the approximation error introduced with the use of e-
smoothing. The following result gives an a priori upper bound on the
difference between the exact and the smoothed penalty functions.
Proposition 7.2.1 Let the functions F and F be defined by (7.40) and
(7.41), respectively. Then

0 < F(x, p) — F(x, p, e) < (7.42)

for any x e p > 0 and e > 0.


Proof From the definitions of p and pe we have

0 < p(gfixf) — pe(gi(x)) < p~ for all I = 1,2,..., L, and x C IRnK.


(7.43)
Adding these inequalities up, for I = 1, 2,..., L, we obtain

l L
0 - ]Tpe(0j(a:)) < for all x e IRnK, (7.44)

and the result follows from the definitions of F and F. ■

The LQP algorithm solves in Step 1 the smoothed penalty problem

min F(x,p,,e), (7.45)

instead of solving the exact penalty problem (7.37). The following results
give a priori upper bounds on the error incurred by solving the smooth
penalty problem (7.45) in lieu of the nondifferentiable penalty problem
(7.37).
Sect. 7.2 The Linear-Quadratic Penalty (LQP) Algorithm 205

Proposition 7.2.2 Let x* G X be an optimal solution of (7.33)-(7.35)


and x e X be an optimal solution of (7.45) for some p and e. Then

0< e) < L/i|. (7-46)

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


Proof From Proposition 7.2.1 we have

F(x,fi)<F(x,n,e) + L^. (7.47)

Taking the infimum, we obtain

inf F(®,p) < inf' F\x,n,e) + (7.48)


*Z/€A Z

which proves the right-hand side inequality. The left-hand side inequality
can be similarly proved. ■
The next proposition tells us that the difference between the optimal
values of the exact penalty problem and the smoothed penalty problem
can be controlled through the parameter e provided that the solution of
the penalty problem is e-feasible.
* G X be an optimal solution of (7.33)-(7.35) and
Proposition 7.2.3 Let x
x G X be an optimal solution of (7.45) for some p and e. Furthermore let
x be e-feasible. Then

0 < /o(®
*) ~ /o(^) < Lpe. (7.49)

Proof Since x is e-feasible, we have by the definition of pe that

L
< L^. (7.50)
1=1

Also, x* is a solution to (7.33)-(7.35), which implies that


L
))
J2?to(®
* = 0. (7.51)
1=1

From Proposition 7.2.2 we have


L l
)
*
o < (fo(x + ^p(5z(a:
*))) - (/o(i) + J^Pe^C®))) < . (7.52)
/=i 1=1 2

Substituting (7.50) and (7.51) into (7.52) and rearranging terms, the result
is established. ■
206 Model Decomposition Algorithms Chap. 7

The next question is how to specify conditions on the penalty parameter


Ii to ensure that e-feasibility is achieved. For the exact penalty function
case it is known (see, for example, Bertsekas (1975, Proposition 1)) that
for fi larger than a threshold * = max^i^,...,! z* where z* =
given by /z

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


(zf) is a Lagrange multiplier vector for the constraints (7.35), an optimal
solution x* to (7.33)-(7.35) is also an optimal solution for (7.37). This
result provides the motivation for exact penalty methods. The question
we address next is how this result is affected by the e-smoothing of the
exact penalty function. The answer is that although an optimal solution
to (7.33)-(7.35) does not necessarily coincide with an optimal solution to
,
(7.45) its suboptimality with respect to the latter can be bounded as a
function of the penalty and smoothing parameters.
To simplify notation we introduce, for each constraint function gi(x),
the function g+(x) = max(0,£/(#)), thus p(gi(x)) = /igf(x). Obviously if
9l(x) > 0 then gf (x) = gt(x).
Proposition 7.2.4 Let x* be an optimal solution for (7.33)-(7.35) and let
6 IRK+L be a Lagrange multiplier vector. Then, for some e > 0,

F(x
* ,fi, e) < F(x, fi, e) + L/i^, for all x € X, (7.53)

provided that fi > zz*, for all I = 1, 2,..., L.


Proof Since x * is an optimal solution for (7.33)-(7.35) and *)(?/,z is
a Lagrange multiplier vector whose existence is guaranteed by Assump­
tion 7.2.2, we have (see, e.g., Rockafellar (1970, Theorem 28.3)):
K L
*
Vf0(x
) = - £ ^V/fc((x
*
) fc) - £ zfVg^x
),
* (7.54)
1=1

ykfk<fx
)
* k) = 0 for all k — 1,2,..., K, zfgi(x
)
* = 0 for all I = 1,..., L,
(7.55)
y* k > 0 for all k = 1,2,. ..,K, zf > 0 for all I = 1,2,...,L, (7.56)
*
A((z
) fc) < 0 for all k = 1,2,..., K, gt(x
*
) < 0 for all I = 1,2,..., L.
(7.57)
It is known that convexity and differentiability of any function h guarantee
that h(x) > h(x
*
) + (Vh(z
*
),z - x*); see, e.g., Luenberger (1984). There­
fore, using (7.54)-(7.55) in the definition of F, and by convexity and dif­
ferentiability of the functions fk, k = 0,1,2,..., K and gi, I = 1,2,..., L,
we obtain
L
F(x,n) > f0(x
*) + (V/0(x’),x -x
*
) + m52s;+(®)
1=1
Sect. 7.2 The Linear-Quadratic Penalty (LQP) Algorithm 207

fc=i
L L

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


- x*
) +
1=1 1=1
K
*
> Mx
)
k=l
L L
-^2zl(9l(x) - 9i(x*)) + 9^7 9? (x)
l=i l=i
K L L

= Mx*) - E^®
*) -
k=l 1=1
+ mE^+(®)-
1=1

The first equality follows from (7.54); the second inequality follows from the
convexity and differentiability of the functions fk for all k — 1,2,..., K,
and gi for all I = 1, 2,..., L; the second equality follows from (7.55). Since
gi(x) < g+(x), we have:

K L
F(x,v) > f0(x
)
* -^ykfk(xk) + " 2DPj+(®)-
k=l 1=1

Now, since p > zf for all I = 1,..., L, > 0, and fk(xk) < 0 for all k =
1,2,..., K because x G X, we get

(7.58)

But from Proposition 7.2.1 we have

F(a?,/z) — F(ce,/z, e) < L/z|. (7.59)

Rewriting (7.58) as
*)
/o(® - F(x, (i) < 0, (7.60)
*)
observing that /o(® = F(x
,ii,e)
* since x* is feasible, and adding up
inequalities (7.59) and (7.60), we obtain

F(x
,n,e)
* — F(x,/i,e) < L^|, (7.61)

which establishes the result. ■


We see that even if the optimal solution x* of (7.33)-(7.35) is not an optimal
solution of (7.45), the objective function value computed at x* is greater
208 Model Decomposition Algorithms Chap. 7

than the true optimal value by at most L/ie/2. As c vanishes we recover


the classical assertion that an optimal solution to the original problem
coincides with an optimal solution to the penalty problem for some value
of the penalty parameter p.

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


In the next proposition we obtain, as an immediate corollary from
Proposition 7.2.4, a bound on the difference between the optimal value of
the original problem (7.33)-(7.35) and the optimal value of the smoothed
penalty problem.
Proposition 7.2.5 Letx be a minimum point of min F(x, /i, e), let x* be
an optimal point for the original problem (7.33)-(7.35), and let z* C IRL
be a Lagrange multiplier vector associated with the inequalities gi(x) < 0
for all I = 1, 2,..., L. Then

0 < fo(x
*
) - F(x, n, e) < Lf! j, (7.62)

provided that p > zf for I = 1,2,..., L.


We can now prove the following result:
Proposition 7.2.6 Let be a sequence of positive numbers with
lim = 0, and assume that for some p > 0, xy is a solution to

min F(x,pyeu). (7.63)


xex

Also let x be an accumulation point of the sequence {a?17}, then

F(x, p) = F(x(p),p), (7.64)

where x(p) is an optimal solution to min F(x,p).


xtx
Proof The result is obtained from the continuity of F, the uniform con­
vergence of (7.39), and inequalities (7.46) as follows. From (7.46) we have,
for all v > 0, F(a?(/z),/i) > F(a?l/, p, ep). Using (7.39), remembering that
the convergence is uniform, and by the definitions of F and F, we obtain

JhnoF(xPs,/z,cPs) = F(®,/z), (7.65)

where lims_+ooxPs =x is the subsequence converging to x. Now, (7.46)


yields F(x(^),^) > F(xz's,//,c^) which, together with (7.65) gives

F(x,n) > F(x(/z),^).

The opposite inequality follows from the definition of x(p). ■


Therefore, as the smoothing parameter e tends to 0, and if the penalty
parameter p is specified larger than the threshold value p*, then the so­
Sect. 7.2 The Linear-Quadratic Penalty (LQP) Algorithm 209

lutions to (7.45) and (7.33)~(7.35) would coincide. Hence, we obtain a


solution to the original problem (7.33)-(7.35) by solving smooth penalty
problems for a decreasing sequence of smoothing parameters and increas­
ing penalty parameters since the threshold value /j,* is not known in

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


practice. Also, driving e to zero is not desirable since we would recover the
nondifferentiable penalty function. Therefore, we cannot expect to ob­
tain a solution to the original problem by solving a linear-quadratic penalty
problem, but we obtain an c-feasible solution instead. In the next section we
show that e-feasibility is attained if the penalized problem (7.45) is solved
using a penalty parameter equal to a constant multiple of the threshold for
the nondifferentiable penalty function.

7.2.2 e-exactness properties of the LQP function


We study now the approximate exactness properties that the smooth LQP
penalty function inherits from its nondifferentiable counterpart. We show,
in particular, that c-feasibility is achieved for some (finite) value of the
penalty parameter.
First we look at the optimality conditions of problems (7.33)~(7.35) and
(7.45)
, respectively.

Optimality Conditions for (7.33)-(7.35)


* to be an optimal
As stated earlier in the proof of Proposition 7.2.4, for a?
solution of (7.33)-(7.35) and for *)($/
,£ to be a Lagrange multiplier vector,
it is necessary and sufficient that *),(a?
z
?/ satisfy (7.54)-(7.55). The
existence of *
,z
)(i/ is guaranteed under Assumption 7.2.2.

Optimality Conditions for (7.45)


We introduce A = {I | gi(x) < e, 1 < I < L} to denote the set of indices
corresponding to constraints that are satisfied or violated to within e at x,
and V = {I | gi(x) > e, 1 < I < L}, for the set of indices corresponding to
constraints violated beyond e at x. Again using Rockafellar (1970, Theorem
28.3) we obtain: for x * to be an optimal solution of (7.45) and for y* to
be a Lagrange multiplier vector, it is necessary and sufficient that *
)(x
,y
satisfy the conditions

v/o(®*) = - £ ^^Vgi(x*) ~^^gi(x*) - ^y^fk((x*)k),


ieA lev fc=i (7.66)
*)
3/fe/fc((^ fe) = 0 for all k = 1,2,..., K, (7.67)
y* k>0 for all k = 1,2,..., K, (7.68)
*
fk«x
) k) < 0 for all k - 1,2,..., K. (7.69)
The existence of y* is guaranteed under Assumption 7.2.2.
210 Model Decomposition Algorithms Chap. 7

These optimality conditions for problems (7.33)-(7.35) and (7.45) have


some features in common. Let us take a pair of vectors (a?,i/) satisfying
conditions (7.66)-(7.68) for some p and e. Consider also an estimate of the
Lagrange multiplier z, denoted by 5, obtained from

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


Zl = ^(<7/(®)) for all I = 1,2,..., L.
~ (7.70)

Then the triplet (x, y, z) would satisfy the optimality conditions for (7.33)-
(7.35), with the exception of the complementary slackness conditions given
by *)zgi(x = 0 for all I = 1,2,..., L, and the feasibility conditions given
by gi(x
)
* < 0 for all I = 1,2,..., L. If p and e are chosen such that x
is e-feasible, then the error in complementary slackness is bounded by the
quantity pe. To see this, observe that all constraints are satisfied to within
e and that estimates for the Lagrange multiplier vector are given by

/imax(0,uj)
Zl —-------------- l for all I = 1,2,..., L,
e
where ui = gi(x). Therefore, we have

u2
ZiUi < max(0, p—) for all I = 1,2,..., L.

However, since ui < e for I = 1,2,..., L, it follows that

ziui < pe for all I — 1,2,..., L, (7.71)

and the assertion is verified.


These observations provide the justification for computing an e-feasible
solution to the original problem by solving the smooth penalty problem
.
(7.45)
In the remainder of this section we study the conditions under which
a solution that is e-feasible for the original problem can be obtained by
solving the linear-quadratic penalty problem. This result characterizes the
threshold value of the penalty parameter p and provides a sufficient condi­
tion for e-feasibility.
Proposition 7.2.7 Let x* be an optimal solution to (7.33)-(7.35), and
let be a Lagrange multiplier vector. Let x be an optimal
solution to the smooth penalty problem (7.45) for some p and e. Then x is
e-feasible if p is chosen such that p > p
/ft,
* where p* is a threshold value
of the penalty parameter with p* > zf for all I = 1,2,..., L, and k, is a
constant given by
(7.72)
1 — Lj
Sect. 7.2 The Linear-Quadratic Penalty (LQP) Algorithm 211

Proof Recall that A is the set of indices corresponding to constraints that


are satisfied or violated to within 6 at x, and V is the set of indices cor­
responding to constraints violated beyond e at i. We will assume that
card(V) (the cardinality of the set V) is at least one (otherwise the propo­

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


sition holds trivially) and argue by negation.
We write the objective function of the smooth penalty problem (7.45)
at the assumed minimum x as
/o(^)~ 9^ • ’■7’73')

lev leA

This can be so written because when I E V then gi(x) > e > 0, thus,
g+(#) = gi(x); however if I 6 A then gi(x) < e, except that when gi(x) < 0
then pe(gi(x)) = 0 by definition.
We consider first the linear term (i.e., the second summand) in (7.73),
and rewrite it as

m J2(5(+(®) -1) =
lev lev
w+
lev
J2^+(®))-
lev

Considering the term in parentheses for = /i*


/ k we get

- |) -A
52
* pz+(*
) = ---- — -9iW). (7.74)
lev lev lev K

Now defining t = g+(x), the term under the summation on the right-hand
side can be written as:

Since t = gi(x) > e, for all I E V, and since from the definition of k in
(7.72) we have 0 < k < | for all L > 1, we obtain

Therefore, from (7.74) we have

M card(V) + TAi (®)- (7-75)


/ev lev

Now we consider the quadratic term in (7.73) which can be rewritten as

E 9i+ li}+(fl E £ 9(+ (i)).


tex leA leA leA
212 Model Decomposition Algorithms Chap. 7

Again considering the term in parentheses, for p = p


* /k we get

"E^-^E^W'^E^-^
1 A 1 A . <£CHz

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


Using t = g+ (x) we see that the term in the summation of the right-hand
side has the form (^2/2ck) — t, which is a convex function in t. Its minimum
is attained at t* = ck, at which point the function takes the value —ck/2.
Since this lower bound is negative by the positivity of e and k, the following
inequality follows from (7.76),

22
leA
- P* 22
leA
gi ~ card(X). (7.77)

Now combining inequalities (7.75) and (7.77) we get the following result
for the objective function value of the smooth penalty problem evaluated
at x, whose derivation is explained below:

i)+J1E<»(-)-l)+'‘E!iF
lev ieA
L
f0(x) + 225'+ — e) — (L — 1) *(^) m

K L
^2y kfk(.xk) + '^zfgi(x)
> fo(x) + *
fc=l Z=1
K L
> /o(^) + 22^a((^)") + 22^(^)
fc=l 1=1
L
= fo(x
)
* = f0(x
*
) + 22Pe(P((a:
))-
*
1=1

The first inequality follows by adding (7.75) and (7.77), based on the as­
sumption that card(V) > 1, and using the fact that card(X) < L — 1.
The next equality follows from (c/2k) — e — (L — 1)ck/2 = 0, by defini­
tion of k. The subsequent inequality follows from the definition of p* and
from (7.67). The third inequality along with the subsequent equality fol­
low from the definition of a Lagrange multiplier vector. The last equality
follows from the feasibility of x*
. But this result is a contradiction since
x was assumed to be an optimal solution to (7.45). Hence the assumption
Sect. 7.3 Notes and References 213

card(V) > 1 is violated, i.e., the set V must be empty. ■

Therefore, for a given problem the threshold value of /i required to attain


e-feasibility is a constant independent of €. The existence of a threshold

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


value of the penalty parameter /z indicates that an e-feasible solution can be
computed in a finite number of minimizations. An important consequence
of Proposition 7.2.7 is that we can now characterize the conditions under
which the upper bound on the difference between the optimal objective
values of problems (7.33)-(7.35) and (7.45), as stated in Proposition 7.2.3,
is achieved. This result is now summarized in the following proposition.
Proposition 7.2.8 Let x* be an optimal solution for (7.33)-(7.35) and
let \ IRJ<+L be the Lagrange multiplier vector. Let p* be such
that p* > Zi for all I = 1,2,...,L. Furthermore let x be an optimal
* /k where k = (1 — \/T)/(l ~ L). Then,
solution to (7.45), with p = /1
*)
0 < fo(® - /o(£) < Lpe.
Proof Since k < 1 it follows that p > p *. Hence, solving (7.37) using
p = p* produces an optimal solution to the original problem. The rest
of the proof follows from Proposition 7.2.3. ■

Propositions 7.2.7 and 7.2.8 motivate the procedure for controlling the
accuracy of the solution in the LQP algorithm (Algorithm 7.2.1). The
procedure starts with appropriately chosen py and ey. The value of the
penalty parameter py is increased according to some criteria, while the
smoothing parameter ey can be decreased after each penalty minimization
in Step 1 is completed. More precisely, if the solution of the penalty problem
is e-feasible, this is an indication that the penalty parameter py is large
enough. Therefore, if the smoothing parameter e is below the final accuracy
tolerance, emjn, the algorithm terminates. In this case Proposition 7.2.8
provides an upper bound on the difference between the optimal value of the
original problem and the optimal value of the smoothed penalty problem.
If the solution of the penalty problem is not e-feasible, then the penalty
parameter py is not large enough. Another round of penalty minimization
is carried out with a larger value of the penalty parameter. The smoothing
parameter ey can be left unchanged in this case.

7.3 Notes and References


The development of algorithms for the decomposition of structured op­
timization models into smaller and simpler subproblems dates back to
the early days of linear programming. See Dantzig (1963) for an early
account. A classification of decomposition algorithms is given by Geof-
frion (1970), and a textbook treatment of algorithms for large-scale sys­
tems developed through the 1960s was provided by Lasdon (1970). The
214 Model Decomposition Algorithms Chap. 7

1980s have witnessed several efforts in parallelizing the most popular de­
composition algorithms, including the Dantzig-Wolfe algorithm in Dantzig
and Wolfe (1960), and Benders decomposition in Benders (1962); see also
Geoffrion (1972). For an early discussion see the introduction in Meyer and

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


Zenios (1988). For parallelizations of the Dantzig-Wolfe algorithm see Ho,
Lee and Sundarraj (1988), and for parallelizations of Benders decomposi­
tion see Ariyawansa (1991), Ariyawansa and Hudson (1991), Dantzig, Ho,
and Infanger (1991), Nielsen and Zenios (1994), and Qi and Zenios (1994).
Significant recent developments that fit into the general framework of
model decomposition methods are the parallel constraint distribution and
parallel variable distribution methods of Ferris and Mangasarian (1991,
1994). Both methods have been proven effective for the solution of large-
scale linear programs and were also efficient when implemented on parallel
machines. See also Mangasarian (1995) for the related gradient distribution
method for unconstrained optimization and Ferris (1994) for extensions to
convex quadratic programs.

7.1 For an alternative classification of decomposition algorithms to the one


taken in this section see Geoffrion (1970).
7.1.1 Penalty methods for the solution of constrained optimization prob­
lems are attributed to Courant (1962), and the barrier method was first
suggested by Caroil (1961). A textbook treatment of penalty methods
can be found in Bertsekas (1982). Barrier, or interior penalty methods
were developed and popularized in Fiacco and McCormick (1968) and
McCormick (1983). For an introduction to both methods see Luen-
berger (1984). A discussion of exact penalty methods can be found in
Han and Mangasarian (1979). The use of split-variable formulations
is fairly standard in large-scale optimization, e.g., Bertsekas and Tsit-
siklis (1989, p.231). For applications see Rockafellar and Wets (1991),
Mulvey and Vladimirou (1989, 1991), and Nielsen and Zenios (1993a,
1996a). The augmented Lagrangian was introduced in Arrow, Hur-
wicz, and Uzawa (1958) as a means for the convexification of noncon-
vex problems, and was further extended and analyzed by Rockafel­
lar (1974, 1976a). An extensive treatment of this topic can be found
in Bertsekas (1982). The method of multipliers was introduced, inde­
pendently, by Hestenes (1969) and Powell (1969). Consult Section 4.5
for more notes and references on penalty and barrier methods.
7.1.2 The Frank-Wolfe algorithm was developed by Frank and Wolfe (1956),
and the simplicial decomposition algorithm was first suggested as a
generalization to Frank-Wolfe by Holloway (1974). The representa­
tion of the master program as a nonlinear program with a simplex
constraint is due to Von Hohenbalken (1977); see also Von Hohen-
balken (1975) and Grinold (1982). A memory-efficient variant of the
algorithm that maintains only a limited number of vertices is due
Sect. 7.3 Notes and References 215

to Hearn, Lawphongpanich, and Ventura (1984). Mulvey, Zenios,


and Ahlfeld (1990) developed numerical procedures for solving the
master program and specialized the algorithm for large-scale network
problems. Nonlinear programs with a single equality constraint, and

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024


bounded variables (such as the master program in Algorithm 7.1.3),
appear in many applications. Algorithms for their solution have been
suggested by Helgason, Kennington, and Lass (1980) and Tseng (1990);
and parallel algorithms were developed, implemented, and compared
by Nielsen and Zenios (1992c).
The use of linearization algorithms in conjunction with barrier or
penalty functions for the decomposition of structured programs for
parallel computing was suggested by Schultz and Meyer (1991) and
Pinar and Zenios (1992). Both studies report encouraging computa­
tional results on parallel machines for large-scale problems.
Diagonal quadratic approximations to nonseparable quadratic func­
tions, and in particular for the augmented Lagrangian, were suggested
by Stephanopoulos and Westerberg (1975) and further extended by
Tatjewski (1989). The use of split-variable formulations in conjunc­
tion with augmented Lagrangians and diagonal quadratic approxima­
tions for the solution of large-scale structured programs was suggested
by Mulvey and Ruszczynski (1992), and was subsequently applied
to the solution of stochastic programming problems by Mulvey and
Ruszczynski (1994) and Berger, Mulvey, and Ruszczynski (1994). The
same references also report encouraging computational results with the
use of the algorithm for solving large-scale problems on a distributed
network of workstations.
7.2 Smoothing approximations to exact penalty functions were introduced
by Bertsekas (1973), who also suggested their use for the solution of
minimax optimization problems. Zang (1980) also discusses smooth­
ing techniques for the solution of minimax problems. Madsen and
Nielsen (1993) introduced smoothing for the solution of estimation
problems. We point out that the linear-quadratic penalty function is
used in statistics to develop robust estimation procedures, and it is
known as the Huber estimator; see Huber (1981). Linear-quadratic
programming problems such as those arising from the smoothing ap­
proximations of this section are analyzed in Rockafellar (1987, 1990).
The LQP algorithm for large-scale problems was developed by Zenios,
Pinar, and Dembo (1994). Its properties were analyzed in Pinar and
Zenios (1994b), and the algorithm was used for the parallel decom­
position of multicommodity flow problems in Pinar and Zenios (1992,
1994a).
7.2.2 The analysis of the properties of the LQP function is due to Pinar
and Zenios (1994b). It is based on earlier works by Bertsekas (1975),
216 Model Decomposition Algorithms Chap. 7

Charalambous (1978), and Charalambous and Conn (1978) who an­


alyzed exact penalty functions, and on the work of Truemper (1975)
who analyzed the quadratic penalty function.

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

You might also like