Model Decomposition Algorithms: Out of Intense Complexities Intense Simplicities Emerge. Winston Churchill

CHAPTER 7
Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

Model Decomposition Algorithms
Out of intense complexities intense simplicities emerge.

Winston Churchill
In the previous chapters we studied algorithms that facilitate parallel

computations due to the structure of their operations. Sometimes, however,
it is the structure of the model of the problem at hand (that is suitable for
some decomposition) that leads to efficient parallel computations. Large-
scale problems often display some characteristic sparsity pattern which is
amenable to decomposition. In time-dependent optimization problems, for
example, one has to optimize a given system at different points in time.
Successive time periods are linked through the flow of inventory. It might
be possible to partially optimize the operations of the system for each time
period, while maintaining a level of inventory consistent with the optimal
operating schedule of successive time periods. In large spatial systems
(e.g., transportation or telecommunication problems) one has to optimize
distinct geographical regions. Adjacent regions are linked through trading
and the flow of traffic. Totally decentralized optimization is not possible,
but it might be possible to partially optimize each region separately, while
restricting the trading between adjacent regions to be consistent with each
region’s optimal state.
Optimization algorithms have been devised over the last fifty years
specifically to deal with such problems; the two most noteworthy exam
ples are the Dant zig-Wolfe decomposition and Benders decomposition. An
important book by Lasdon (1970) discusses decomposition algorithms for
large-scale optimization problems. A common feature of these algorithms
is that they solve a sequence of (smaller) subproblems to optimize the dis
tinct components (time periods, regions, etc.), while a coordinating master
problem synthesizes these solutions into an estimate of the overall opti
mum. The current solution estimate of the master program then defines a
new subproblem and the process repeats iteratively. The subproblems and
the master problem are much smaller than the original program. Hence,
even if the decomposition algorithm requires several iterations between the
master and the subproblems to reach a solution, it is usually faster than
algorithms that attack the original undecomposed problem. We call such
Parallel Optimization, Yair Censor, Oxford University Press (1997), © 1997 by

Oxford University Press, Inc., DOI: 10.1093/9780195100624.003.0007
Sect. 7.1 General Framework of Model Decompositions 191
algorithms here model decomposition algorithms, as they do not solve the

original model of the problem directly, but instead solve a modified decom
posed variant of the model.
In the early days of parallel optimization, it was anticipated that paral

lelism would substantially speed up model decomposition algorithms. This
has not been the case, however. The speedups observed with decomposition
algorithms such as those of Dantzig-Wolfe or Benders have been modest.
Why is this so? The most successful attempts to parallelize these algo
rithms (see references in Section 7.3) solve the subproblems in parallel,
but solve the master program on a single processor. There have also been
cases where the solution of the master program was parallelized, but the
efficiency was low. Thus the master program becomes a serial bottleneck:
as the decomposition algorithm iterates the master programs increase in
size and the serial bottleneck becomes more restrictive, as Amdahl’s law
(Definition 1.4.6) dictates.
With a view toward parallelism other model decomposition algorithms
have recently been designed and implemented, which either use a very
simple coordination phase that does not create any serial bottleneck, or
have a coordination phase that is itself suitable for parallel computations.
In this chapter we will discuss one such algorithm. Section 7.1 contains
preliminary discussion on model decompositions and discusses parallel de
compositions based on linearization or diagonal-quadratic approximations.
Section 7.2 discusses the Linear-Quadratic Penalty (LQP) algorithm for
large-scale structured problems. Notes and references are given in Sec
tion 7.3.
7.1 General Framework of Model Decompositions

Consider the minimization of a convex, continuously differentiable, block-
separable function F : IRnK —* IR, written as F(x) = A(^fc) where
fk : IRn —> IR for all k = 1,2,..., K. The vector x e IRn/< is the con
catenation of K subvectors x = ((z1)T, (x2)T,..., (xK)T) where xk €
]Rn for all k = 1,2,..., K. (Boldface letters denote vectors in the product
space IRn/<.) Consider now the following constrained optimization prob
lem:
Problem [P]:
Minimize F(x) (7.1)
s.t. for all k = 1,2,...,K, (7.2)

x e Q C IRnK. (7.3)
The sets X&, k = 1,2,..., K, and Q are assumed to be closed and convex.
Figure 7.1 illustrates the structure of this problem in two dimensions.
192 Model Decomposition Algorithms Chap. 7

Figure 7.1 Constraint sets and set of feasible solutions of
problem [P] in IR2 with k = 2.
If the product set Xi x X2 x ••• x Xk Q Q, then problem [P] can

be solved by simply ignoring the constraints x 6 Q and solving K inde
pendent subproblems in each of the xk vector variables. In this respect,
the constraints x € Q are complicating (or coupling) constraints. When the
complicating constraints cannot be ignored, a model decomposition applies
a modifier to problem [P] to obtain a problem [Pz] in which the compli
cating constraints are not explicitly present. It then employs a suitable
algorithm to solve [P']. If the solution to [Pz] is sufficiently close to a so
lution of the original problem [P] then the process terminates. Otherwise,
the current solution is used to construct a new modified problem and the
process repeats. Figure 1.2 illustrates the model decomposition algorithmic
framework (see Chapter 1).
It is the judicious combination of a modifier and a suitable algorithm
for the solution of the modified problem [P'] that leads to a decomposition
of the original problem [P] suitable for parallel computations. We discuss
in this section modifiers and algorithms suitable for solving the modified
problems.
7.1.1 Problem modifiers

We present now two modifiers for problem [P], drawing on the general
theory developed in Chapter 4.
Modifier I: Penalty or Barrier Functions

The first modifier eliminates the complicating constraints x e fi by using
a penalty or a barrier function. Using a penalty function p : IRnK -> IR
with respect to the set fi defining the complicating contraints (see Defini

tion 4.1.1), the modified problem can be written as:
Problem [P']:
Minimize F(x) + cp(x) (7.4)

s.t. xkeXk, for all fc = 1,2,..., K. (7.5)
We know (see Section 7.3) that it is possible to construct penalty functions

that are exact, i.e., there exists some constant c > 0, such that for c > c
any solution of [P'] is also a solution to [P]. Hence, a solution to [P] can
be obtained by solving [P']. Note that [P'] has a simpler constraint set
than [P] because the complicating constraints x € fi have been removed.
However, problem [P'] still cannot be solved by solving K independent
subproblems, since the function p is not necessarily block-separable (see
Definition 1.3.1 and Section 4.3 for definitions of separability). The next
section explores algorithms that induce separability of this function.
Consider now situations when the set fi has a nonempty interior. Such
sets arise when inequality constraints are used to define them, e.g.,
fi = {jc | gi(x) < 0, for all I = 1,2,..., L} , (7.6)
where gi : IRnAr —» IR. In this case we can use a barrier function (see
Definition 4.2.1) to establish a barrier on the boundary of fi so that the
iterates of an algorithm that starts with an interior point remain in the
interior of the set, therefore satisfying the constraints x e fi. For example,
a barrier function for the set fi defined by (7.6) can be constructed with
the aid of Burg’s entropy (6.149) (see Example 4.2.2) as:
L
1=1
where gi for I = 1, 2,..., L are the functions used in the definition of fi.
With the use of such a barrier function the modified problem is written as:
Problem [P']:
Minimize F(x) 4- cq(x) (7.7)

s.t. xktXk, for all fc = 1,2,..., (7.8)
A solution to the problem [P] can be approximated by solving the modified

problem with the barrier function for a sufficiently small value of the param
eter c. It is also possible to solve the barrier-modified problem repeatedly

for a sequence of barrier parameters {c^}, such that cy > cp+i > 0. If {se17}
denotes the sequence of solutions of these barrier problems, then it is known
that {x"} converges to a solution of [P] as cy —* 0 (see Theorem 4.2.1).

Like the penalty-modified problem [P'], the barrier-modified problem
has a simpler constraint structure than problem [P]. However, it still can
not be decomposed into independent components since the barrier function
is not necessarily block-separable. The algorithms of the next section can
be used to induce separability of penalty and barrier functions.
Modifier II: Variable Splitting and Augmented Lagrangian
The second modifier first replicates (or splits) the components xk of the
vector x into two copies, one of which is constrained to belong to the set Xk
and the other constrained to satisfy the complicating constraints. Let z e
lRnK denote the replication of a?, where z = ((^1)T, (^2)T, • •., (zK)T)T and
the vector zk C IRn for all k = 1,2,..., K. Consider now the equivalent
split-variable formulation of [P]:
Problem [Split-P]:
K
Minimize (7.9)
fc=i
s.t. xkeXk, for all fc = 1,2,..., K, (7.10)
z e Q, (7.11)
zk=xk, for all k = 1,2,..., K. (7.12)
The constraints zk = xk link the variables that appear in the constraint sets
Xk with the variables that appear in the set Q. An augmented Lagrangian
formulation (see Section 4.4) is now used to eliminate these complicating
constraints. We let tt= ((tt1)t, (tt2)t, ..., (ttk)t)t where 7tk 6 IRn denotes
the Lagrange multiplier vector for the complicating constraints zk = xk,
and let c > 0 be a constant. Then a partial augmented Lagrangian for
(7.9)-(7.12) can be written as:
K K K
£c(x, 7T) = £ /fc(xfc) + £(tt\ zk - xk} + C
- £ II zk - xk ||2 . (7.13)
fc=l k=l k=l
A solution to problem [Split-P] can be obtained by solving the dual prob
lem:
Problem [P"]:
Maximize <£>c(tt), (7.14)
7reiRnK
where <£>c(tt) = minx/c6xfc, zea C^xît). This is the modified problem
whose solution yields a solution of [P].
An algorithm for solving convex optimization problems using augmented

Lagrangians is the method of multipliers, which is an instance of the aug
mented Lagrangian algorithmic scheme (Algorithm 4.4.1). It proceeds by
minimizing the augmented Lagrangian for a fixed value of the Lagrange

multiplier vector, followed by a simple update of this vector. Using the
method of multipliers to solve the dual problem [Pff] we obtain the follow
ing algorithmic scheme:
Algorithm 7.1.1 Method of Multipliers for Solving the Modified

Problem [P"].
Set v = 0. Let 7T° be an arbitrary Lagrange

Step 0: (Initialization.)
multiplier vector.
Step 1: (Minimizing the augmented Lagrangian.)
(®I'+1,zI'+1) = argmin Cc (x, z, (7.15)

xkexk, zcsi
Step 2: (Updating the Lagrange multiplier vector.) For k = 1,2,..., K,

update:
(7rfc)t'+1 = (7rk)1' + c ((zky+1 - G?)"44) • (7.16)
Step 3: Replace v <— v 4-1 and return to Step 1.
The minimization problem in Step 1 has a block-decomposable constraint

set. The problem, however, still cannot be decomposed into K independent
subproblems since the augmented Lagrangian is not block-separable due to
the cross-products {zk, xk} in the quadratic term of (7.13). The next section
explores algorithms that induce separability of this term. Step 2 consists
of simple vector operations that can be executed very efficiently on parallel
architectures.
7.1.2 Solution algorithms

Both modified problems [P'] and [P"] have a block-decomposable con
straint set, but the objective function is not block-separable. These prob
lems can be written in the general form:
Minimize 4>(se) = $(a:1,^2,... ,xK) (7.17)

s.t. ® GX = Xi xX2 x ••• x XK. (7.18)
We consider in this section two solution algorithms that—when applied

to problems of this form—give rise to block-separable functions and thus
decompose the problem into K subproblems, which can then be solved
in parallel. The first algorithm uses linear approximations to the non
linear function These linear approximations are block-separable. The

Figure 7.2 Illustration of the Frank-Wolfe linearization algo
rithm for a function 0(.T| , .7'2)- (The curves denote level sets.)
second algorithm uses a diagonal approximation for cases where $ is a non-

separable quadratic function, such as the last summand appearing in the
augmented Lagrangian (7.13).
Solution Algorithms Based on Linearization

One of the earliest algorithms suggested for the solution of nonlinear op
timization problems using linear approximations is the Frank-Wolfe algo
rithm. It uses Taylor’s expansion formula to obtain a first-order approxi
mation <I> to around the current iterate x\ i.e.,
4>(x) = $(a?p) 4- (V#(a/),a? - x"),
ignoring second- and higher order terms. The Frank-Wolfe algorithm now
minimizes this linear function, subject to the original constraints. The
solution of this linear program, y, is a vertex of the constraint set, which
determines a direction of descent for the original nonlinear function, given
by p = y—xy. The algorithm then performs a one-dimensional search along
this direction to determine the step length where the nonlinear function
attains its minimum. Figure 7.2 illustrates the algorithm.
Applied to problem (7.17)-(7.18) the Frank-Wolfe algorithm is formally
stated below:
Algorithm 7.1.2 The Frank-Wolfe Algorithm
Step 0: (Initialization.) Set v ~ 0. Let x° e X be an arbitrary vector.

Step 1: (Solving linearized subproblem.) Evaluate the first-order approx

imation of $(;r) at xy, i.e., $(a?) = ^(a^) + (V$(a?zy),a? — xy}, and
compute a direction of descent of $ by solving the linear program:
Minimize (V$ {xy\y — xy} (7.19)
s.t. y e X. (7.20)
Let y denote the optimal vertex of the linear program. Then p — y—xy
is a direction of descent for $.
Step 2: (Linesearch.) Compute a step length a * along the direction p that
minimizes the nonlinear function $(a?) by solving the one-dimensional
nonlinear program:
* = argmin $(xy + ap).

ci (7.21)
O<Q<1
Step 3: (Updating the iterate.) Let — xy + a

p
* and return to Step 1.
The subproblem in Step 1 is a linear programming problem over a Cartesian
product of linear constraint sets. It can be solved by solving K independent
linear programs in the variables k = 1,2,... , TV, because although £(a?)
need not to be a block-separable function, the objective function in (7.19)
is. Denoting the subvector of V$(a?), which is calculated with respect to
xk, by Va.fc^(se) we obtain
K
(V$(^),y - xv) = ^V^^)"),/ - (xkYY (7.22)
fc=l
The independent linear subproblems are then:
Minimize (Vxk$((xkY),yk - {xk)u} (7.23)
s.t. yk G Xk. (7.24)
The problem in Step 2 is a nonlinear program in a single bounded

variable. For large-scale problems the calculations of Step 2 are insignificant
in comparison to the calculations of Step 1, and the algorithm parallelizes
efficiently.
It is well-known that the Frank-Wolfe algorithm can zigzag toward the
solution. If the optimal solution lies on a facet of the constraint set, the
algorithm cannot reach it by a linesearch between one vertex (i.e., y) and
an interior point (i.e., xy). A mild version of this effect is illustrated in

Figure 7.2 where the optimal solution is close to a facet. The simplicial
decomposition algorithm (described next) avoids the zigzagging effect of the
Frank-Wolfe algorithm. It uses information contained in multiple vertices

generated during successive iterations of the algorithm and solves a larger
nonlinear master program rather than the simple linesearch. We describe
next the simplicial decomposition algorithm.
Let y = {y1 ,y2,... ,yp} denote the set of vertices of the feasible region
X generated during the first v iterations of the Frank-Wolfe algorithm.
The convex hull of T (i.e., the set defined by all convex combinations of
the vertices) is
V v
conv(» = {y = wiy1 | yl e V, wi > 0, I = 1,2,..., z/, Wl = 1 }>
i=i 1=1
(7.25)
which is a subset of the feasible set X. Simplicial decomposition generates
vertices as in the Frank-Wolfe algorithm by solving linear programming
subproblems. It then optimizes the original objective function by solving
a master program over the set conv (T). The dimension of the master pro
gram is usually much smaller than the dimension of the original program.
The master program also has a simple constraint structure—consisting of a
single equality constraint and bounds on the variables— which can be ex
ploited to convert the problem into a locally unconstrained program. Now
the problem can be solved using standard unconstrained optimization tech
niques, and it can also be solved inexactly. Vertices that do not contribute
to the representation of the optimal solution of the master program can be
removed, thereby reducing the size of the master program.
Applied to problem (7.17)—(7.18), the simplicial decomposition algo
rithm is described next.
Algorithm 7.1.3 The Simplicial Decomposition Algorithm
Step 0: (Initialization.) Set z/ = 0. Let x° e X be an arbitrary vector.

Let y = 0 be the set of vertices and its cardinality v = 0.
Step 1: (Solving linearized subproblem.) Evaluate the first-order approx
imation <t>(x) at xy, i.e., <t>(x) = ^(x17) 4- (V^x"), x - xy), and com
pute a direction of descent of $ by solving the linear programming
problem:
Minimize (V<£(a?p), y - xy) (7.26)
s.t. y e X. (7.27)
Let yy denote the optimal solution of this linear program. Update the
set of vertices y <— y U {^p}, and its cardinality v <— v 4-1.
Step 2: (Solving the nonlinear master program.) Optimize the nonlinear

function <£ over the convex hull of y (equation (7.25)). That is, com
pute
v \
(

(7.28)
1^1 /
where yl e y, for all I = 1,2,..., v, and

V
Wv = {w = (w/) e IRt’ I ^2 = 1, > 0, for all I = 1,2,..., v}.
i=i
V
Step 3: (Updating the iterate.) Let Update the set of
i=i
vertices y by deleting from it any vertices with zero weight in the
representation of xy+i, i.e., set y <— y\{yl | wf = 0,1 < I < u} and
let v = card (}>). Set v v 4-1 and return to Step 1.
At Step 1 the algorithm solves a linear program with a Cartesian prod
uct of linear constraint sets. This subproblem can be decomposed, and its
components solved independently and in parallel:
Step 1 (alternate): (Solving the decomposed linearized subproblems.)
Let Vdenote the subvector of the gradient vector V3 *(x)
corresponding to the A?th block of x evaluated at the current iterate
xy. For each k = 1,2,..., K, solve
Minimize (V^G?)"), yk - (xky) (7.29)

?/fceiRn
s.t. yk e Xk. (7.30)
Let (yk)y denote the optimal solution of this linear program and form
yy as the concatenation of the subvectors {(yl)y | I = 1,2,..., K }.
The nonlinear master program in Step 2 is much smaller in size than the
original problem. Typically, the number of vertices upon termination of the
algorithm does not exceed one hundred. Furthermore, it has a simple struc
ture, i.e., a simplex equality constraint, namely = L and bounds
on the variables 0 < wi < 1. This structure can be exploited by designing an
unconstrained optimization procedure for its solution as follows. Using the
simplex equality constraint we can substitute wv with wv — 1 — wi.
Then we can write the master program (7.28) as:
v—1
w* = argmin $(yv + V wi(yl - yu)). (7.31)
{0<w/<l|Z=l,2,...,v —1} TT
Recall that at the current iteration we have v — 1 active vertices (i.e.,

wi > 0 for I = 1,..., v — 1) and the last vertex yv lies along a direction of
descent. Hence the nonlinear master program is locally unconstrained in
the neighborhood of the current iterate xy. Any unconstrained optimiza

tion algorithm can be used to compute a descent direction, followed by a
simple test to determine the maximum allowable step that will keep the w’s
within the bounds. Note that the evaluation of the objective function in
volves operations on dense vectors, i.e, the vectors yl, / = 1,2,..., u. Such
operations parallelize naturally and the solution of the master program is
also amenable to parallel computations.
Solution Algorithms Based on Diagonalization

Consider now a special structure of the objective function (7.17) arising
from the modified problem [P"]. In particular, we assume that $(•) can be
written as
K
^x,Z) = ^\\xk-zk\\2 . (7.32)
This is the structure of the quadratic terms of the augmented Lagrangi

an (7.13). (We consider only the quadratic terms of the augmented La
grangian, since these are the nonseparable terms that prevent us from
decomposing the minimization of Step 1 of Algorithm 7.1.1 into K in
dependent subproblems.) We will approximate this nonseparable function
using a separable quadratic function. The term diagonal quadratic approx
imation is also used, which indicates that the Hessian matrix of $(x, z) is
approximated by a diagonal matrix.
The terms in (7.32) can be expanded as
|| xk - zk ||2 = || xk ||2 + || zk ||2-2(^,^),
and we only discuss the cross-product terms (xk,zk) for k = 1,2,... ,K.
Using Taylor’s expansion formula we obtain a first-order approximation of
the cross-product terms around the current iterate as:
(xk,zk)«(xk, (zky) - {(xky, (zkyj + {(xky,zk),
for k = 1,2,...,JC With this approximation of its cross-product terms,

the function (7.32) is approximated by the expression:
K k k
E
fc=l
ii - {zky ii2 -E
k=l
ii <xkr - (zkr n2 +E n (zT -zk n2 •
fc=l
A solution algorithm, based on diagonalization solves problem [Pff] us

ing the method of multipliers (Algorithm 7.1.1) but instead of minimizing
Sect. 7.2 The Linear-Quadratic Penalty Algorithm 201
the augmented Lagrangian in Step 1 it minimizes the diagonal quadratic

approximation. This approximation is block-separable into the variable
blocks xk,k = 1,2,... ,K, and the minimization is decomposed into K
independent problems. These problems can be solved in parallel.

7.2 The Linear-Quadratic Penalty (LQP) Algorithm
Many different model decomposition algorithms can be designed using
problem modifiers based on penalty or barrier methods and then followed
by the use of linearization. We discuss here one such algorithm based on
a linear-quadratic penalty (LQP) function. The algorithm has been shown
to be efficient for large-scale, structured optimization problems. It has also
been implemented on different parallel architectures.
We consider the problem:
Minimize /o(#) (7.33)

s.t. fk(xk) < 0, for all k = 1,2,..., K, (7.34)
gfix) < 0, for all Z = 1,2,..., L. (7.35)
The functions fk : IRn —> IR, k = 0,1,...,JC and gi : lRnK —* IR, for
I = 1,2,..., L are convex and continuously differentiable. The constraints
(7.34) decompose into blocks, one for each subvector xk, while constraints
(7.35) are complicating. Let X = X± x X2 x • • • x Xk be a product of the
sets Xk = {xk € IRn | fk(%k) < 0} for all k = 1, 2,..., K and assume that
X is a compact set. We further make the following assumptions:
Assumption 7.2.1 Problem (7.33)-(7.35) has a nonempty and compact
optimal solutions set.
Assumption 7.2.2 Problem (7.33)-(7.35) has at least one feasible solution

that satisfies all the constraints with strict inequality.
Under these assumptions a Kuhn-Tucker vector (i.e., a Lagrange mul
tiplier vector as defined in Rockafellar (1970, p. 274)) exists for problem
(7.33)-(7.35).
Consider now the £i-norm penalty function p : IR —> IR given by:
p(t) = p max(0, Z), (7.36)
where t is a scalar variable, and p is a positive constant (see Figure 7.3).

We want to obtain a solution to (7.33)-(7.35) by solving the following exact
penalty problem:
l \
(foW+ ^p(gi(xy)}.
1=1 /
(7.37)
It is known (see, e.g., Bertsekas (1982, Chapter 4)) that under the as
sumptions stated above there exists a penalty parameter p for which the
optimal solutions to (7.37) and (7.33)-(7.35) coincide. In particular, if the
penalty parameter is larger than a threshold value p* given by the largest

component of a Lagrange multiplier vector of (7.33)-(7.35), then a solution
to (7.37) is also a solution to (7.33)-(7.35). However the £i-norm exact
penalty function is nondifferentiable, and this precludes the application of
gradient-based descent methods for the solution of the exact penalty prob
lem. In order to gain access to gradient-based minimization techniques for
solving (7.37) we consider an e-smoothing of the function p around t = 0.
In particular we introduce the following e-smoothed function pe:
0, if t < 0,
t2
Pe(0 = < ifO<t<e, (7.38)
|)> if < > e,
where e is a positive scalar; see Figure 7.3. It is easy to see that
lim pe(t) = p(t), (7.39)
and, furthermore, this convergence is uniform, that is, given an arbitrary

T] > 0 we can find a <5 > 0 such that if 0 < e < 6 then | pe(t) — p(t) | < 77,
for all t (i.e., b does not depend on f).
With the introduction of e-smoothing we obtain a continuously differen
tiable penalty function that can be optimized using algorithms such as the
simplicial decomposition Algorithm 7.1.3. Furthermore, the e-smooihing
device provides a natural mechanism for handling “soft” constraints that
need not be satisfied exactly.
The use of e-smoothing might introduce some approximation error be
cause an optimum point of the original problem is not necessarily an opti
mum of the smoothed penalty problem, even for penalty parameter values
larger than the threshold value. However, an a priori upper bound to this
error can be computed as a function of the penalty parameters. It is also
possible to compute a solution that is feasible to within any given e > 0 for
given values of the penalty parameter. Such a solution is termed e-feasible.
Definition 7.2.1 (c-feasibility) Given some e > 0, a vector x e X is

e-feasible for problem (7.33)-(7.35) if gi(x) < e, for all I = 1,2,..., L.
We first describe the linear-quadratic penalty (LQP) algorithm that
uses the smoothed penalty function pe(t) instead of the exact penalty p(i) in
(7.37). We then proceed with the analysis of the properties of e-smoothing,
and derive bounds on the difference between the solution of the smoothed
Sect. 7.2 The Linear-Quadratic Penalty (LQP) Algorithm 203

Figure 7.3 The /h-norm penalty function and the linear-
quadratic smooth penalty function defined by (7.38).
penalty problem and the original problem (7.33)-(7.35).

Define first the objective function for the exact penalty problem
L
F(x, n) = f0(x) + ^2p(gi(xy), (7.40)
z=i
and then express the objective function for the c-smoothed penalty problem
L
F(x,p,e) = f0(x)+ '^/Pe(.gi(x)). (7.41)
1=1
The LQP algorithm can now be described in detail:
Algorithm 7.2.1 The Linear-Quadratic Penalty (LQP) Algorithm

Step 0: (Initialization.) Set v — 0, set the initial parameter values //q >
0, 6q > 0, and define some €min > 0.
Step 1: Solve the problem
min F(x,
x^x
and let x" denote its optimal solution.

Step 2: If x" is e^-feasible and ey < Cmin, then stop. Otherwise, update
the penalty parameters /iy and ey according to the rules given below,
set v v 4-1 and go to Step 1.
The parameter emin > 0 is a user-determined final feasibility tolerance.
At Step 1 the algorithm solves the modified penalty problem, whereby the
complicating constraints have been placed in the objective function, and

at Step 2 the modifier is updated. Two situations may arise:
• The point xy is c^-feasible but ey > ernin. In this case the penalty
parameter remains unchanged, i.e., py^\ = py and ey is decreased

by replacing it with = rp maxi</<£ ui where ui = g/.(xy) for all
I = 1,2,..., L, and 0 < z/i < 1 is a user-specified fixed parameter.
• The point xy is not c^-feasible. This indicates that the value of the
penalty parameter py is not large enough and must be increased by
replacing it with py±i = rppy where 772 > 1 is a user-specified fixed
parameter. The feasibility tolerance remains unchanged, i.e., eyî =
7.2.1 Analysis of the c-smoothed linear-quadratic penalty function

We now analyze the approximation error introduced with the use of e-
smoothing. The following result gives an a priori upper bound on the
difference between the exact and the smoothed penalty functions.
Proposition 7.2.1 Let the functions F and F be defined by (7.40) and
(7.41), respectively. Then
0 < F(x, p) — F(x, p, e) < (7.42)
for any x e p > 0 and e > 0.

Proof From the definitions of p and pe we have
0 < p(gfixf) — pe(gi(x)) < p~ for all I = 1,2,..., L, and x C IRnK.

(7.43)
Adding these inequalities up, for I = 1, 2,..., L, we obtain
l L
0 - ]Tpe(0j(a:)) < for all x e IRnK, (7.44)
and the result follows from the definitions of F and F. ■
The LQP algorithm solves in Step 1 the smoothed penalty problem
min F(x,p,,e), (7.45)
instead of solving the exact penalty problem (7.37). The following results
give a priori upper bounds on the error incurred by solving the smooth
penalty problem (7.45) in lieu of the nondifferentiable penalty problem
(7.37).
Proposition 7.2.2 Let x* G X be an optimal solution of (7.33)-(7.35)

and x e X be an optimal solution of (7.45) for some p and e. Then
0< e) < L/i|. (7-46)

Proof From Proposition 7.2.1 we have
F(x,fi)<F(x,n,e) + L^. (7.47)
Taking the infimum, we obtain
inf F(®,p) < inf' F\x,n,e) + (7.48)

*Z/€A Z
which proves the right-hand side inequality. The left-hand side inequality
can be similarly proved. ■
The next proposition tells us that the difference between the optimal
values of the exact penalty problem and the smoothed penalty problem
can be controlled through the parameter e provided that the solution of
the penalty problem is e-feasible.
* G X be an optimal solution of (7.33)-(7.35) and
Proposition 7.2.3 Let x
x G X be an optimal solution of (7.45) for some p and e. Furthermore let
x be e-feasible. Then
0 < /o(®
*) ~ /o(^) < Lpe. (7.49)
Proof Since x is e-feasible, we have by the definition of pe that
L
< L^. (7.50)
1=1
Also, x* is a solution to (7.33)-(7.35), which implies that

L
))
J2?to(®
* = 0. (7.51)
1=1
From Proposition 7.2.2 we have

L l
)
*
o < (fo(x + ^p(5z(a:
*))) - (/o(i) + J^Pe^C®))) < . (7.52)
/=i 1=1 2
Substituting (7.50) and (7.51) into (7.52) and rearranging terms, the result
is established. ■
The next question is how to specify conditions on the penalty parameter

Ii to ensure that e-feasibility is achieved. For the exact penalty function
case it is known (see, for example, Bertsekas (1975, Proposition 1)) that
for fi larger than a threshold * = maxî^,...,! z* where z* =
given by /z

(zf) is a Lagrange multiplier vector for the constraints (7.35), an optimal
solution x* to (7.33)-(7.35) is also an optimal solution for (7.37). This
result provides the motivation for exact penalty methods. The question
we address next is how this result is affected by the e-smoothing of the
exact penalty function. The answer is that although an optimal solution
to (7.33)-(7.35) does not necessarily coincide with an optimal solution to
,
(7.45) its suboptimality with respect to the latter can be bounded as a
function of the penalty and smoothing parameters.
To simplify notation we introduce, for each constraint function gi(x),
the function g+(x) = max(0,£/(#)), thus p(gi(x)) = /igf(x). Obviously if
9l(x) > 0 then gf (x) = gt(x).
Proposition 7.2.4 Let x* be an optimal solution for (7.33)-(7.35) and let
6 IRK+L be a Lagrange multiplier vector. Then, for some e > 0,
F(x
* ,fi, e) < F(x, fi, e) + L/i^, for all x € X, (7.53)
provided that fi > zz*, for all I = 1, 2,..., L.

Proof Since x * is an optimal solution for (7.33)-(7.35) and *)(?/,z is
a Lagrange multiplier vector whose existence is guaranteed by Assump
tion 7.2.2, we have (see, e.g., Rockafellar (1970, Theorem 28.3)):
K L
*
Vf0(x
) = - £ ^V/fc((x
*
) fc) - £ zfVg^x
),
* (7.54)
1=1
ykfk<fx
)
* k) = 0 for all k — 1,2,..., K, zfgi(x
)
* = 0 for all I = 1,..., L,
(7.55)
y* k > 0 for all k = 1,2,. ..,K, zf > 0 for all I = 1,2,...,L, (7.56)
*
A((z
) fc) < 0 for all k = 1,2,..., K, gt(x
*
) < 0 for all I = 1,2,..., L.
(7.57)
It is known that convexity and differentiability of any function h guarantee
that h(x) > h(x
*
) + (Vh(z
*
),z - x*); see, e.g., Luenberger (1984). There
fore, using (7.54)-(7.55) in the definition of F, and by convexity and dif
ferentiability of the functions fk, k = 0,1,2,..., K and gi, I = 1,2,..., L,
we obtain
L
F(x,n) > f0(x
*) + (V/0(x’),x -x
*
) + m52s;+(®)
1=1
fc=i
L L

- x*
) +
1=1 1=1
K
*
> Mx
)
k=l
L L
-^2zl(9l(x) - 9i(x*)) + 9^7 9? (x)
l=i l=i
K L L
= Mx*) - E^®
*) -
k=l 1=1
+ mE^+(®)-
1=1
The first equality follows from (7.54); the second inequality follows from the
convexity and differentiability of the functions fk for all k — 1,2,..., K,
and gi for all I = 1, 2,..., L; the second equality follows from (7.55). Since
gi(x) < g+(x), we have:
K L
F(x,v) > f0(x
)
* -^ykfk(xk) + " 2DPj+(®)-
k=l 1=1
Now, since p > zf for all I = 1,..., L, > 0, and fk(xk) < 0 for all k =
1,2,..., K because x G X, we get
(7.58)
But from Proposition 7.2.1 we have
F(a?,/z) — F(ce,/z, e) < L/z|. (7.59)
Rewriting (7.58) as
*)
/o(® - F(x, (i) < 0, (7.60)
*)
observing that /o(® = F(x
,ii,e)
* since x* is feasible, and adding up
inequalities (7.59) and (7.60), we obtain
F(x
,n,e)
* — F(x,/i,e) < L^|, (7.61)
which establishes the result. ■

We see that even if the optimal solution x* of (7.33)-(7.35) is not an optimal
solution of (7.45), the objective function value computed at x* is greater
than the true optimal value by at most L/ie/2. As c vanishes we recover

the classical assertion that an optimal solution to the original problem
coincides with an optimal solution to the penalty problem for some value
of the penalty parameter p.

In the next proposition we obtain, as an immediate corollary from
Proposition 7.2.4, a bound on the difference between the optimal value of
the original problem (7.33)-(7.35) and the optimal value of the smoothed
penalty problem.
Proposition 7.2.5 Letx be a minimum point of min F(x, /i, e), let x* be
an optimal point for the original problem (7.33)-(7.35), and let z* C IRL
be a Lagrange multiplier vector associated with the inequalities gi(x) < 0
for all I = 1, 2,..., L. Then
0 < fo(x
*
) - F(x, n, e) < Lf! j, (7.62)
provided that p > zf for I = 1,2,..., L.

We can now prove the following result:
Proposition 7.2.6 Let be a sequence of positive numbers with
lim = 0, and assume that for some p > 0, xy is a solution to
min F(x,pyeu). (7.63)

xex
Also let x be an accumulation point of the sequence {a?17}, then
F(x, p) = F(x(p),p), (7.64)
where x(p) is an optimal solution to min F(x,p).

xtx
Proof The result is obtained from the continuity of F, the uniform con
vergence of (7.39), and inequalities (7.46) as follows. From (7.46) we have,
for all v > 0, F(a?(/z),/i) > F(a?l/, p, ep). Using (7.39), remembering that
the convergence is uniform, and by the definitions of F and F, we obtain
JhnoF(xPs,/z,cPs) = F(®,/z), (7.65)
where lims_+ooxPs =x is the subsequence converging to x. Now, (7.46)

yields F(x(^),^) > F(xz's,//,c^) which, together with (7.65) gives
F(x,n) > F(x(/z),^).
The opposite inequality follows from the definition of x(p). ■

Therefore, as the smoothing parameter e tends to 0, and if the penalty
parameter p is specified larger than the threshold value p*, then the so
lutions to (7.45) and (7.33)~(7.35) would coincide. Hence, we obtain a

solution to the original problem (7.33)-(7.35) by solving smooth penalty
problems for a decreasing sequence of smoothing parameters and increas
ing penalty parameters since the threshold value /j,* is not known in

practice. Also, driving e to zero is not desirable since we would recover the
nondifferentiable penalty function. Therefore, we cannot expect to ob
tain a solution to the original problem by solving a linear-quadratic penalty
problem, but we obtain an c-feasible solution instead. In the next section we
show that e-feasibility is attained if the penalized problem (7.45) is solved
using a penalty parameter equal to a constant multiple of the threshold for
the nondifferentiable penalty function.
7.2.2 e-exactness properties of the LQP function

We study now the approximate exactness properties that the smooth LQP
penalty function inherits from its nondifferentiable counterpart. We show,
in particular, that c-feasibility is achieved for some (finite) value of the
penalty parameter.
First we look at the optimality conditions of problems (7.33)~(7.35) and
(7.45)
, respectively.
Optimality Conditions for (7.33)-(7.35)

* to be an optimal
As stated earlier in the proof of Proposition 7.2.4, for a?
solution of (7.33)-(7.35) and for *)($/
,£ to be a Lagrange multiplier vector,
it is necessary and sufficient that *),(a?
z
?/ satisfy (7.54)-(7.55). The
existence of *
,z
)(i/ is guaranteed under Assumption 7.2.2.
Optimality Conditions for (7.45)

We introduce A = {I | gi(x) < e, 1 < I < L} to denote the set of indices
corresponding to constraints that are satisfied or violated to within e at x,
and V = {I | gi(x) > e, 1 < I < L}, for the set of indices corresponding to
constraints violated beyond e at x. Again using Rockafellar (1970, Theorem
28.3) we obtain: for x * to be an optimal solution of (7.45) and for y* to
be a Lagrange multiplier vector, it is necessary and sufficient that *
)(x
,y
satisfy the conditions
v/o(®*) = - £ ^^Vgi(x*) ~^^gi(x*) - ^y^fk((x*)k),

ieA lev fc=i (7.66)
*)
3/fe/fc((^ fe) = 0 for all k = 1,2,..., K, (7.67)
y* k>0 for all k = 1,2,..., K, (7.68)
*
fk«x
) k) < 0 for all k - 1,2,..., K. (7.69)
The existence of y* is guaranteed under Assumption 7.2.2.
These optimality conditions for problems (7.33)-(7.35) and (7.45) have

some features in common. Let us take a pair of vectors (a?,i/) satisfying
conditions (7.66)-(7.68) for some p and e. Consider also an estimate of the
Lagrange multiplier z, denoted by 5, obtained from

Zl = ^(<7/(®)) for all I = 1,2,..., L.
~ (7.70)
Then the triplet (x, y, z) would satisfy the optimality conditions for (7.33)-
(7.35), with the exception of the complementary slackness conditions given
by *)zgi(x = 0 for all I = 1,2,..., L, and the feasibility conditions given
by gi(x
)
* < 0 for all I = 1,2,..., L. If p and e are chosen such that x
is e-feasible, then the error in complementary slackness is bounded by the
quantity pe. To see this, observe that all constraints are satisfied to within
e and that estimates for the Lagrange multiplier vector are given by
/imax(0,uj)
Zl —-------------- l for all I = 1,2,..., L,
e
where ui = gi(x). Therefore, we have
u2
ZiUi < max(0, p—) for all I = 1,2,..., L.
However, since ui < e for I = 1,2,..., L, it follows that
ziui < pe for all I — 1,2,..., L, (7.71)
and the assertion is verified.

These observations provide the justification for computing an e-feasible
solution to the original problem by solving the smooth penalty problem
.
(7.45)
In the remainder of this section we study the conditions under which
a solution that is e-feasible for the original problem can be obtained by
solving the linear-quadratic penalty problem. This result characterizes the
threshold value of the penalty parameter p and provides a sufficient condi
tion for e-feasibility.
Proposition 7.2.7 Let x* be an optimal solution to (7.33)-(7.35), and
let be a Lagrange multiplier vector. Let x be an optimal
solution to the smooth penalty problem (7.45) for some p and e. Then x is
e-feasible if p is chosen such that p > p
/ft,
* where p* is a threshold value
of the penalty parameter with p* > zf for all I = 1,2,..., L, and k, is a
constant given by
(7.72)
1 — Lj
Proof Recall that A is the set of indices corresponding to constraints that

are satisfied or violated to within 6 at x, and V is the set of indices cor
responding to constraints violated beyond e at i. We will assume that
card(V) (the cardinality of the set V) is at least one (otherwise the propo

sition holds trivially) and argue by negation.
We write the objective function of the smooth penalty problem (7.45)
at the assumed minimum x as
/o(^)~ 9^ • ’■7’73')
lev leA
This can be so written because when I E V then gi(x) > e > 0, thus,
g+(#) = gi(x); however if I 6 A then gi(x) < e, except that when gi(x) < 0
then pe(gi(x)) = 0 by definition.
We consider first the linear term (i.e., the second summand) in (7.73),
and rewrite it as
m J2(5(+(®) -1) =
lev lev
w+
lev
J2^+(®))-
lev
Considering the term in parentheses for = /i*

/ k we get
- |) -A
52
* pz+(*
) = ---- — -9iW). (7.74)
lev lev lev K
Now defining t = g+(x), the term under the summation on the right-hand
side can be written as:
Since t = gi(x) > e, for all I E V, and since from the definition of k in
(7.72) we have 0 < k < | for all L > 1, we obtain
Therefore, from (7.74) we have
M card(V) + TAi (®)- (7-75)

/ev lev
Now we consider the quadratic term in (7.73) which can be rewritten as
E 9i+ li}+(fl E £ 9(+ (i)).

tex leA leA leA
Again considering the term in parentheses, for p = p

* /k we get
"E^-Ê^W'Ê^-^
1 A 1 A . <£CHz

Using t = g+ (x) we see that the term in the summation of the right-hand
side has the form (^2/2ck) — t, which is a convex function in t. Its minimum
is attained at t* = ck, at which point the function takes the value —ck/2.
Since this lower bound is negative by the positivity of e and k, the following
inequality follows from (7.76),
22
leA
- P* 22
leA
gi ~ card(X). (7.77)
Now combining inequalities (7.75) and (7.77) we get the following result
for the objective function value of the smooth penalty problem evaluated
at x, whose derivation is explained below:
i)+J1E<»(-)-l)+'‘E!iF
lev ieA
L
f0(x) + 225'+ — e) — (L — 1) *(^) m
K L
^2y kfk(.xk) + '^zfgi(x)
> fo(x) + *
fc=l Z=1
K L
> /o(^) + 22â((^)") + 22^(^)
fc=l 1=1
L
= fo(x
)
* = f0(x
*
) + 22Pe(P((a:
))-
*
1=1
The first inequality follows by adding (7.75) and (7.77), based on the as
sumption that card(V) > 1, and using the fact that card(X) < L — 1.
The next equality follows from (c/2k) — e — (L — 1)ck/2 = 0, by defini
tion of k. The subsequent inequality follows from the definition of p* and
from (7.67). The third inequality along with the subsequent equality fol
low from the definition of a Lagrange multiplier vector. The last equality
follows from the feasibility of x*
. But this result is a contradiction since
x was assumed to be an optimal solution to (7.45). Hence the assumption
Sect. 7.3 Notes and References 213
card(V) > 1 is violated, i.e., the set V must be empty. ■
Therefore, for a given problem the threshold value of /i required to attain

e-feasibility is a constant independent of €. The existence of a threshold

value of the penalty parameter /z indicates that an e-feasible solution can be
computed in a finite number of minimizations. An important consequence
of Proposition 7.2.7 is that we can now characterize the conditions under
which the upper bound on the difference between the optimal objective
values of problems (7.33)-(7.35) and (7.45), as stated in Proposition 7.2.3,
is achieved. This result is now summarized in the following proposition.
Proposition 7.2.8 Let x* be an optimal solution for (7.33)-(7.35) and
let \ IRJ<+L be the Lagrange multiplier vector. Let p* be such
that p* > Zi for all I = 1,2,...,L. Furthermore let x be an optimal
* /k where k = (1 — \/T)/(l ~ L). Then,
solution to (7.45), with p = /1
*)
0 < fo(® - /o(£) < Lpe.
Proof Since k < 1 it follows that p > p *. Hence, solving (7.37) using
p = p* produces an optimal solution to the original problem. The rest
of the proof follows from Proposition 7.2.3. ■
Propositions 7.2.7 and 7.2.8 motivate the procedure for controlling the
accuracy of the solution in the LQP algorithm (Algorithm 7.2.1). The
procedure starts with appropriately chosen py and ey. The value of the
penalty parameter py is increased according to some criteria, while the
smoothing parameter ey can be decreased after each penalty minimization
in Step 1 is completed. More precisely, if the solution of the penalty problem
is e-feasible, this is an indication that the penalty parameter py is large
enough. Therefore, if the smoothing parameter e is below the final accuracy
tolerance, emjn, the algorithm terminates. In this case Proposition 7.2.8
provides an upper bound on the difference between the optimal value of the
original problem and the optimal value of the smoothed penalty problem.
If the solution of the penalty problem is not e-feasible, then the penalty
parameter py is not large enough. Another round of penalty minimization
is carried out with a larger value of the penalty parameter. The smoothing
parameter ey can be left unchanged in this case.
7.3 Notes and References

The development of algorithms for the decomposition of structured op
timization models into smaller and simpler subproblems dates back to
the early days of linear programming. See Dantzig (1963) for an early
account. A classification of decomposition algorithms is given by Geof-
frion (1970), and a textbook treatment of algorithms for large-scale sys
tems developed through the 1960s was provided by Lasdon (1970). The
1980s have witnessed several efforts in parallelizing the most popular de
composition algorithms, including the Dantzig-Wolfe algorithm in Dantzig
and Wolfe (1960), and Benders decomposition in Benders (1962); see also
Geoffrion (1972). For an early discussion see the introduction in Meyer and

Zenios (1988). For parallelizations of the Dantzig-Wolfe algorithm see Ho,
Lee and Sundarraj (1988), and for parallelizations of Benders decomposi
tion see Ariyawansa (1991), Ariyawansa and Hudson (1991), Dantzig, Ho,
and Infanger (1991), Nielsen and Zenios (1994), and Qi and Zenios (1994).
Significant recent developments that fit into the general framework of
model decomposition methods are the parallel constraint distribution and
parallel variable distribution methods of Ferris and Mangasarian (1991,
1994). Both methods have been proven effective for the solution of large-
scale linear programs and were also efficient when implemented on parallel
machines. See also Mangasarian (1995) for the related gradient distribution
method for unconstrained optimization and Ferris (1994) for extensions to
convex quadratic programs.
7.1 For an alternative classification of decomposition algorithms to the one

taken in this section see Geoffrion (1970).
7.1.1 Penalty methods for the solution of constrained optimization prob
lems are attributed to Courant (1962), and the barrier method was first
suggested by Caroil (1961). A textbook treatment of penalty methods
can be found in Bertsekas (1982). Barrier, or interior penalty methods
were developed and popularized in Fiacco and McCormick (1968) and
McCormick (1983). For an introduction to both methods see Luen-
berger (1984). A discussion of exact penalty methods can be found in
Han and Mangasarian (1979). The use of split-variable formulations
is fairly standard in large-scale optimization, e.g., Bertsekas and Tsit-
siklis (1989, p.231). For applications see Rockafellar and Wets (1991),
Mulvey and Vladimirou (1989, 1991), and Nielsen and Zenios (1993a,
1996a). The augmented Lagrangian was introduced in Arrow, Hur-
wicz, and Uzawa (1958) as a means for the convexification of noncon-
vex problems, and was further extended and analyzed by Rockafel
lar (1974, 1976a). An extensive treatment of this topic can be found
in Bertsekas (1982). The method of multipliers was introduced, inde
pendently, by Hestenes (1969) and Powell (1969). Consult Section 4.5
for more notes and references on penalty and barrier methods.
7.1.2 The Frank-Wolfe algorithm was developed by Frank and Wolfe (1956),
and the simplicial decomposition algorithm was first suggested as a
generalization to Frank-Wolfe by Holloway (1974). The representa
tion of the master program as a nonlinear program with a simplex
constraint is due to Von Hohenbalken (1977); see also Von Hohen-
balken (1975) and Grinold (1982). A memory-efficient variant of the
algorithm that maintains only a limited number of vertices is due
Sect. 7.3 Notes and References 215
to Hearn, Lawphongpanich, and Ventura (1984). Mulvey, Zenios,

and Ahlfeld (1990) developed numerical procedures for solving the
master program and specialized the algorithm for large-scale network
problems. Nonlinear programs with a single equality constraint, and

bounded variables (such as the master program in Algorithm 7.1.3),
appear in many applications. Algorithms for their solution have been
suggested by Helgason, Kennington, and Lass (1980) and Tseng (1990);
and parallel algorithms were developed, implemented, and compared
by Nielsen and Zenios (1992c).
The use of linearization algorithms in conjunction with barrier or
penalty functions for the decomposition of structured programs for
parallel computing was suggested by Schultz and Meyer (1991) and
Pinar and Zenios (1992). Both studies report encouraging computa
tional results on parallel machines for large-scale problems.
Diagonal quadratic approximations to nonseparable quadratic func
tions, and in particular for the augmented Lagrangian, were suggested
by Stephanopoulos and Westerberg (1975) and further extended by
Tatjewski (1989). The use of split-variable formulations in conjunc
tion with augmented Lagrangians and diagonal quadratic approxima
tions for the solution of large-scale structured programs was suggested
by Mulvey and Ruszczynski (1992), and was subsequently applied
to the solution of stochastic programming problems by Mulvey and
Ruszczynski (1994) and Berger, Mulvey, and Ruszczynski (1994). The
same references also report encouraging computational results with the
use of the algorithm for solving large-scale problems on a distributed
network of workstations.
7.2 Smoothing approximations to exact penalty functions were introduced
by Bertsekas (1973), who also suggested their use for the solution of
minimax optimization problems. Zang (1980) also discusses smooth
ing techniques for the solution of minimax problems. Madsen and
Nielsen (1993) introduced smoothing for the solution of estimation
problems. We point out that the linear-quadratic penalty function is
used in statistics to develop robust estimation procedures, and it is
known as the Huber estimator; see Huber (1981). Linear-quadratic
programming problems such as those arising from the smoothing ap
proximations of this section are analyzed in Rockafellar (1987, 1990).
The LQP algorithm for large-scale problems was developed by Zenios,
Pinar, and Dembo (1994). Its properties were analyzed in Pinar and
Zenios (1994b), and the algorithm was used for the parallel decom
position of multicommodity flow problems in Pinar and Zenios (1992,
1994a).
7.2.2 The analysis of the properties of the LQP function is due to Pinar
and Zenios (1994b). It is based on earlier works by Bertsekas (1975),
Charalambous (1978), and Charalambous and Conn (1978) who an

alyzed exact penalty functions, and on the work of Truemper (1975)
who analyzed the quadratic penalty function.

Model Decomposition Algorithms: Out of Intense Complexities Intense Simplicities Emerge. Winston Churchill

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Model Decomposition Algorithms: Out of Intense Complexities Intense Simplicities Emerge. Winston Churchill

Uploaded by

Copyright:

Available Formats

CHAPTER 7

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

Out of intense complexities intense simplicities emerge.

In the previous chapters we studied algorithms that facilitate parallel

Parallel Optimization, Yair Censor, Oxford University Press (1997), © 1997 by

algorithms here model decomposition algorithms, as they do not solve the

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

7.1 General Framework of Model Decompositions

Minimize F(x) (7.1)

s.t. for all k = 1,2,...,K, (7.2)

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

If the product set Xi x X2 x ••• x Xk Q Q, then problem [P] can

7.1.1 Problem modifiers

Modifier I: Penalty or Barrier Functions

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

Minimize F(x) + cp(x) (7.4)

We know (see Section 7.3) that it is possible to construct penalty functions

fi = {jc | gi(x) < 0, for all I = 1,2,..., L} , (7.6)

Minimize F(x) 4- cq(x) (7.7)

A solution to the problem [P] can be approximated by solving the modified

eter c. It is also possible to solve the barrier-modified problem repeatedly

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

An algorithm for solving convex optimization problems using augmented

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

Algorithm 7.1.1 Method of Multipliers for Solving the Modified

Set v = 0. Let 7T° be an arbitrary Lagrange

(®I'+1,zI'+1) = argmin Cc (x, z, (7.15)

Step 2: (Updating the Lagrange multiplier vector.) For k = 1,2,..., K,

The minimization problem in Step 1 has a block-decomposable constraint

7.1.2 Solution algorithms

Minimize 4>(se) = $(a:1,^2,... ,xK) (7.17)

We consider in this section two solution algorithms that—when applied

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

second algorithm uses a diagonal approximation for cases where $ is a non-

Solution Algorithms Based on Linearization

4>(x) = $(a?p) 4- (V#(a/),a? - x"),

Algorithm 7.1.2 The Frank-Wolfe Algorithm

Step 0: (Initialization.) Set v ~ 0. Let x° e X be an arbitrary vector.

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

Minimize (V$ {xy\y — xy} (7.19)

* = argmin $(xy + ap).

Step 3: (Updating the iterate.) Let — xy + a

The independent linear subproblems are then:

Minimize (Vxk$((xkY),yk - {xk)u} (7.23)

s.t. yk G Xk. (7.24)

The problem in Step 2 is a nonlinear program in a single bounded

an interior point (i.e., xy). A mild version of this effect is illustrated in

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

Step 0: (Initialization.) Set z/ = 0. Let x° e X be an arbitrary vector.

Minimize (V<£(a?p), y - xy) (7.26)

Step 2: (Solving the nonlinear master program.) Optimize the nonlinear

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

where yl e y, for all I = 1,2,..., v, and

Minimize (V^G?)"), yk - (xky) (7.29)

s.t. yk e Xk. (7.30)

Recall that at the current iteration we have v — 1 active vertices (i.e.,

Downloaded from https://academic.oup.com/book/53915/chapter/422193740 by OUP site access user on 12 May 2024

Solution Algorithms Based on Diagonalization

This is the structure of the quadratic terms of the augmented Lagrangi­

|| xk - zk ||2 = || xk ||2 + || zk ||2-2(^,^),

(xk,zk)«(xk, (zky) - {(xky, (zkyj + {(xky,zk),

for k = 1,2,...,JC With this approximation of its cross-product terms,

A solution algorithm, based on diagonalization solves problem [Pff] us­

This is the structure of the quadratic terms of the augmented Lagrangi

A solution algorithm, based on diagonalization solves problem [Pff] us

v/o(®) = - £ ^^Vgi(x) ~^^gi(x) - ^y^fk((x)k),