Professional Documents
Culture Documents
Model Decomposition Algorithms: Out of Intense Complexities Intense Simplicities Emerge. Winston Churchill
Model Decomposition Algorithms: Out of Intense Complexities Intense Simplicities Emerge. Winston Churchill
The sets X&, k = 1,2,..., K, and Q are assumed to be closed and convex.
Figure 7.1 illustrates the structure of this problem in two dimensions.
192 Model Decomposition Algorithms Chap. 7
where gi : IRnAr —» IR. In this case we can use a barrier function (see
Definition 4.2.1) to establish a barrier on the boundary of fi so that the
iterates of an algorithm that starts with an interior point remain in the
interior of the set, therefore satisfying the constraints x e fi. For example,
a barrier function for the set fi defined by (7.6) can be constructed with
the aid of Burg’s entropy (6.149) (see Example 4.2.2) as:
L
1=1
where gi for I = 1, 2,..., L are the functions used in the definition of fi.
With the use of such a barrier function the modified problem is written as:
Problem [P']:
The constraints zk = xk link the variables that appear in the constraint sets
Xk with the variables that appear in the set Q. An augmented Lagrangian
formulation (see Section 4.4) is now used to eliminate these complicating
constraints. We let tt= ((tt1)t, (tt2)t, ..., (ttk)t)t where 7tk 6 IRn denotes
the Lagrange multiplier vector for the complicating constraints zk = xk,
and let c > 0 be a constant. Then a partial augmented Lagrangian for
(7.9)-(7.12) can be written as:
K K K
£c(x, 7T) = £ /fc(xfc) + £(tt\ zk - xk} + C
- £ II zk - xk ||2 . (7.13)
fc=l k=l k=l
A solution to problem [Split-P] can be obtained by solving the dual prob
lem:
Problem [P"]:
Maximize <£>c(tt), (7.14)
7reiRnK
where <£>c(tt) = minx/c6xfc, zea C^x^it). This is the modified problem
whose solution yields a solution of [P].
Sect. 7.1 General Framework of Model Decompositions 195
ignoring second- and higher order terms. The Frank-Wolfe algorithm now
minimizes this linear function, subject to the original constraints. The
solution of this linear program, y, is a vertex of the constraint set, which
determines a direction of descent for the original nonlinear function, given
by p = y—xy. The algorithm then performs a one-dimensional search along
this direction to determine the step length where the nonlinear function
attains its minimum. Figure 7.2 illustrates the algorithm.
Applied to problem (7.17)-(7.18) the Frank-Wolfe algorithm is formally
stated below:
Sect. 7.1 General Framework of Model Decompositions 197
s.t. y e X. (7.20)
Let y denote the optimal vertex of the linear program. Then p — y—xy
is a direction of descent for $.
Step 2: (Linesearch.) Compute a step length a * along the direction p that
minimizes the nonlinear function $(a?) by solving the one-dimensional
nonlinear program:
K
(V$(^),y - xv) = ^V^^)"),/ - (xkYY (7.22)
fc=l
s.t. y e X. (7.27)
Let yy denote the optimal solution of this linear program. Update the
set of vertices y <— y U {^p}, and its cardinality v <— v 4-1.
Sect. 7.1 General Framework of Model Decompositions 199
V
Step 3: (Updating the iterate.) Let Update the set of
i=i
vertices y by deleting from it any vertices with zero weight in the
representation of xy+i, i.e., set y <— y\{yl | wf = 0,1 < I < u} and
let v = card (}>). Set v v 4-1 and return to Step 1.
At Step 1 the algorithm solves a linear program with a Cartesian prod
uct of linear constraint sets. This subproblem can be decomposed, and its
components solved independently and in parallel:
Step 1 (alternate): (Solving the decomposed linearized subproblems.)
Let Vdenote the subvector of the gradient vector V3 *(x)
corresponding to the A?th block of x evaluated at the current iterate
xy. For each k = 1,2,..., K, solve
Let (yk)y denote the optimal solution of this linear program and form
yy as the concatenation of the subvectors {(yl)y | I = 1,2,..., K }.
The nonlinear master program in Step 2 is much smaller in size than the
original problem. Typically, the number of vertices upon termination of the
algorithm does not exceed one hundred. Furthermore, it has a simple struc
ture, i.e., a simplex equality constraint, namely = L and bounds
on the variables 0 < wi < 1. This structure can be exploited by designing an
unconstrained optimization procedure for its solution as follows. Using the
simplex equality constraint we can substitute wv with wv — 1 — wi.
Then we can write the master program (7.28) as:
v—1
w* = argmin $(yv + V wi(yl - yu)). (7.31)
{0<w/<l|Z=l,2,...,v —1} TT
200 Model Decomposition Algorithms Chap. 7
and we only discuss the cross-product terms (xk,zk) for k = 1,2,... ,K.
Using Taylor’s expansion formula we obtain a first-order approximation of
the cross-product terms around the current iterate as:
K k k
E
fc=l
ii - {zky ii2 -E
k=l
ii <xkr - (zkr n2 +E n (zT -zk n2 •
fc=l
The functions fk : IRn —> IR, k = 0,1,...,JC and gi : lRnK —* IR, for
I = 1,2,..., L are convex and continuously differentiable. The constraints
(7.34) decompose into blocks, one for each subvector xk, while constraints
(7.35) are complicating. Let X = X± x X2 x • • • x Xk be a product of the
sets Xk = {xk € IRn | fk(%k) < 0} for all k = 1, 2,..., K and assume that
X is a compact set. We further make the following assumptions:
Assumption 7.2.1 Problem (7.33)-(7.35) has a nonempty and compact
optimal solutions set.
(foW+ ^p(gi(xy)}.
1=1 /
(7.37)
202 Model Decomposition Algorithms Chap. 7
It is known (see, e.g., Bertsekas (1982, Chapter 4)) that under the as
sumptions stated above there exists a penalty parameter p for which the
optimal solutions to (7.37) and (7.33)-(7.35) coincide. In particular, if the
penalty parameter is larger than a threshold value p* given by the largest
0, if t < 0,
t2
Pe(0 = < ifO<t<e, (7.38)
|)> if < > e,
and then express the objective function for the c-smoothed penalty problem
L
F(x,p,e) = f0(x)+ '^/Pe(.gi(x)). (7.41)
1=1
min F(x,
x^x
l L
0 - ]Tpe(0j(a:)) < for all x e IRnK, (7.44)
instead of solving the exact penalty problem (7.37). The following results
give a priori upper bounds on the error incurred by solving the smooth
penalty problem (7.45) in lieu of the nondifferentiable penalty problem
(7.37).
Sect. 7.2 The Linear-Quadratic Penalty (LQP) Algorithm 205
which proves the right-hand side inequality. The left-hand side inequality
can be similarly proved. ■
The next proposition tells us that the difference between the optimal
values of the exact penalty problem and the smoothed penalty problem
can be controlled through the parameter e provided that the solution of
the penalty problem is e-feasible.
* G X be an optimal solution of (7.33)-(7.35) and
Proposition 7.2.3 Let x
x G X be an optimal solution of (7.45) for some p and e. Furthermore let
x be e-feasible. Then
0 < /o(®
*) ~ /o(^) < Lpe. (7.49)
L
< L^. (7.50)
1=1
Substituting (7.50) and (7.51) into (7.52) and rearranging terms, the result
is established. ■
206 Model Decomposition Algorithms Chap. 7
F(x
* ,fi, e) < F(x, fi, e) + L/i^, for all x € X, (7.53)
ykfk<fx
)
* k) = 0 for all k — 1,2,..., K, zfgi(x
)
* = 0 for all I = 1,..., L,
(7.55)
y* k > 0 for all k = 1,2,. ..,K, zf > 0 for all I = 1,2,...,L, (7.56)
*
A((z
) fc) < 0 for all k = 1,2,..., K, gt(x
*
) < 0 for all I = 1,2,..., L.
(7.57)
It is known that convexity and differentiability of any function h guarantee
that h(x) > h(x
*
) + (Vh(z
*
),z - x*); see, e.g., Luenberger (1984). There
fore, using (7.54)-(7.55) in the definition of F, and by convexity and dif
ferentiability of the functions fk, k = 0,1,2,..., K and gi, I = 1,2,..., L,
we obtain
L
F(x,n) > f0(x
*) + (V/0(x’),x -x
*
) + m52s;+(®)
1=1
Sect. 7.2 The Linear-Quadratic Penalty (LQP) Algorithm 207
fc=i
L L
= Mx*) - E^®
*) -
k=l 1=1
+ mE^+(®)-
1=1
The first equality follows from (7.54); the second inequality follows from the
convexity and differentiability of the functions fk for all k — 1,2,..., K,
and gi for all I = 1, 2,..., L; the second equality follows from (7.55). Since
gi(x) < g+(x), we have:
K L
F(x,v) > f0(x
)
* -^ykfk(xk) + " 2DPj+(®)-
k=l 1=1
Now, since p > zf for all I = 1,..., L, > 0, and fk(xk) < 0 for all k =
1,2,..., K because x G X, we get
(7.58)
Rewriting (7.58) as
*)
/o(® - F(x, (i) < 0, (7.60)
*)
observing that /o(® = F(x
,ii,e)
* since x* is feasible, and adding up
inequalities (7.59) and (7.60), we obtain
F(x
,n,e)
* — F(x,/i,e) < L^|, (7.61)
0 < fo(x
*
) - F(x, n, e) < Lf! j, (7.62)
Then the triplet (x, y, z) would satisfy the optimality conditions for (7.33)-
(7.35), with the exception of the complementary slackness conditions given
by *)zgi(x = 0 for all I = 1,2,..., L, and the feasibility conditions given
by gi(x
)
* < 0 for all I = 1,2,..., L. If p and e are chosen such that x
is e-feasible, then the error in complementary slackness is bounded by the
quantity pe. To see this, observe that all constraints are satisfied to within
e and that estimates for the Lagrange multiplier vector are given by
/imax(0,uj)
Zl —-------------- l for all I = 1,2,..., L,
e
where ui = gi(x). Therefore, we have
u2
ZiUi < max(0, p—) for all I = 1,2,..., L.
lev leA
This can be so written because when I E V then gi(x) > e > 0, thus,
g+(#) = gi(x); however if I 6 A then gi(x) < e, except that when gi(x) < 0
then pe(gi(x)) = 0 by definition.
We consider first the linear term (i.e., the second summand) in (7.73),
and rewrite it as
m J2(5(+(®) -1) =
lev lev
w+
lev
J2^+(®))-
lev
- |) -A
52
* pz+(*
) = ---- — -9iW). (7.74)
lev lev lev K
Now defining t = g+(x), the term under the summation on the right-hand
side can be written as:
Since t = gi(x) > e, for all I E V, and since from the definition of k in
(7.72) we have 0 < k < | for all L > 1, we obtain
"E^-^E^W'^E^-^
1 A 1 A . <£CHz
22
leA
- P* 22
leA
gi ~ card(X). (7.77)
Now combining inequalities (7.75) and (7.77) we get the following result
for the objective function value of the smooth penalty problem evaluated
at x, whose derivation is explained below:
i)+J1E<»(-)-l)+'‘E!iF
lev ieA
L
f0(x) + 225'+ — e) — (L — 1) *(^) m
K L
^2y kfk(.xk) + '^zfgi(x)
> fo(x) + *
fc=l Z=1
K L
> /o(^) + 22^a((^)") + 22^(^)
fc=l 1=1
L
= fo(x
)
* = f0(x
*
) + 22Pe(P((a:
))-
*
1=1
The first inequality follows by adding (7.75) and (7.77), based on the as
sumption that card(V) > 1, and using the fact that card(X) < L — 1.
The next equality follows from (c/2k) — e — (L — 1)ck/2 = 0, by defini
tion of k. The subsequent inequality follows from the definition of p* and
from (7.67). The third inequality along with the subsequent equality fol
low from the definition of a Lagrange multiplier vector. The last equality
follows from the feasibility of x*
. But this result is a contradiction since
x was assumed to be an optimal solution to (7.45). Hence the assumption
Sect. 7.3 Notes and References 213
Propositions 7.2.7 and 7.2.8 motivate the procedure for controlling the
accuracy of the solution in the LQP algorithm (Algorithm 7.2.1). The
procedure starts with appropriately chosen py and ey. The value of the
penalty parameter py is increased according to some criteria, while the
smoothing parameter ey can be decreased after each penalty minimization
in Step 1 is completed. More precisely, if the solution of the penalty problem
is e-feasible, this is an indication that the penalty parameter py is large
enough. Therefore, if the smoothing parameter e is below the final accuracy
tolerance, emjn, the algorithm terminates. In this case Proposition 7.2.8
provides an upper bound on the difference between the optimal value of the
original problem and the optimal value of the smoothed penalty problem.
If the solution of the penalty problem is not e-feasible, then the penalty
parameter py is not large enough. Another round of penalty minimization
is carried out with a larger value of the penalty parameter. The smoothing
parameter ey can be left unchanged in this case.
1980s have witnessed several efforts in parallelizing the most popular de
composition algorithms, including the Dantzig-Wolfe algorithm in Dantzig
and Wolfe (1960), and Benders decomposition in Benders (1962); see also
Geoffrion (1972). For an early discussion see the introduction in Meyer and