Professional Documents
Culture Documents
Claire S. Adjiman
January 2017
Chapter 1
Equipment design Find the optimum size and operating conditions of a unit operation
to achieve desired production (including quality constraints). This may include detailed
design aspects such as the shape of mixer blades.
Process synthesis Process design is the set of activities which take us from the decision
to develop a process — in order to increase company profits — to its construction and
operation. Synthesis is the conversion of an abstract description into a more concrete
one through the consideration of alternatives. Thus, process synthesis is one of the
many aspects of process design. It may be define as “the systematic development of
process flowsheet(s) that transform the available raw materials into the desired products
and which meet the specified performance criteria of (a) maximum profit or minimum
cost, (b) energy efficiency, (c) good operability [...]” ([2]).
Optimal control Identify the set of control actions that will allow to achieve the target
process behaviour in the best way.
1.1. DECISION-MAKING IN CHEMICAL ENGINEERING 2
Molecular design Identify the structure of the molecule(s) that make it possible to
maximise performance.
Three alternatives are considered in Figs 1.1 to 1.3. The first represents the base case which
makes use of the steam and cooling water utilities. It has a total annualised cost of $90220/yr,
with an investment cost of only $5520/yr and utilities of $84700/yr. In the second, the need
for cooling water has been eliminated and the need for steam reduced by transferring heat
from the effluent to the feed streams. The capital increases slightly to $5880/yr and the
utilities decrease to $36720/yr. This corresponds to a 50% reduction in total cost. The third
alternative results from splitting the effluent stream. In this case, only the hot water utility is
required. The investment cost more than doubles to $12200/yr, but the utilities fall further
to $5900/yr. This corresponds to an 80% reduction from the base case.
This example shows that integration can bring about great benefits. However, the number
of alternatives is very large. How can we identify the optimal process configuration reliably?
C1
300 oC 480 oC
500 oC H1
Reactor
400 oC
C2
300 oC 480 oC
C1
300 oC 480 oC
440 oC 500 oC
Reactor
(steam)
C2
300 oC 360 oC 480 oC
400 oC
H1
C1
300 oC 480 oC
500 oC
Reactor
(hot water)
C2 480 oC
300 oC
440 oC 320 oC
400 oC
H1
Provided that sufficiently reliable models of the process to be optimised are available, the
process synthesis problem may be formulated as an optimisation problem. The following
three steps are taken:
The solution of the optimisation problem gives the optimum configuration as well as the
optimum operating parameters. In other words, both discrete and continuous decisions are
made simultaneously.
The systematic framework of the algorithmic approach accounts for nonlinear interactions
(capital cost, raw material costs, . . .). However, solution of such problems is limited by
currently available optimisation technology — which is constantly improving.
U1 (kg/yr) and that for P2 is U2 (kg/yr). The profit for each kg of product sold depends
on the product value and the plant running costs which are a function of plant location and
production rate. The profit is given by Sij = S(Mij , Fi , Pj ), where Mij is the production rate
for product Pj at plant Fi in kg/day, i = {A, B} and j = {1, 2}.
What should the capacity of each plant be in kg/day to maximise profits and not exceed
market demand?
1.1.3.2 Methodology/Issues
tAj ≥ 0 j = 1, 2
tBj ≥ 0 j = 1, 2
MAj ≥ 0 j = 1, 2
MBj ≥ 0 j = 1, 2
Management
max tA1 MA1 SA1 + tA2 MA2 SA2 + tB1 MB1 SB1 + tB2 MB2 SB2
MA1 ,MA2 ,MB1 ,MB2 ,tA1 ,tA2 ,tB1 ,tB2
tA1 + tA2 = 365
tB1 + tB2 = 365
tAj ≥ 0 j = 1, 2
tBj ≥ 0 j = 1, 2
MAj ≥ 0 j = 1, 2
MBj ≥ 0 j = 1, 2
tA1 MA1 + tB1 MB1 ≤ U1
tA2 MA2 + tB2 MB2 ≤ U2
The outcome of the above optimisation problem are the solution vectors M and t.
1. Management
3. Plant operations
As shown in Fig. 1.4, these levels are not independent. However, at each level, optimisation
is used to make different types of decisions:
These differences are reflected in different time scales and different model granularities.
Furthermore, all of these activities require models and parameters, and therefore model
building activities. This often requires the solution of optimisation problems (e.g, parameter
estimation, experiment design).
Better solutions can usually be found by increasing:
• the number of decisions we consider simultaneously (i.e. the boundaries of the problems)
We will use algebraic models in this course, i.e., systems of linear or nonlinear equations.
The mathematical representation of superstructures is also an issue we will con-
sider. We will see how alternatives can be expressed through integer and binary variables.
We will study methods to formulate and solve optimisation problems with binary and con-
tinuous variables in a later chapter. To start with, we focus on problems with continuous
variables only.
Bibliography
[1] J. M. Douglas. Conceptual Design of Chemical Processes. New York: McGraw Hill,
1988.
[2] C. A. Floudas. Nonlinear and Mixed-Integer Optimization: Fundamentals and Applica-
tions. Oxford: Oxford University Press, 1995.
Chapter 2
Find values of the model parameters which give the best fit between the model and the
experimental results
It is important to carry out a statistical analysis of the results to ensure that the model
is meaningful. This is however outside the scope of this course and will not be described
here.
• controlled variables,
12
Diffusion coefficient 10 cm /s
2
10
6
8
4
10 20 30 40
o
Temperature C
Figure 2.1: Experimental data and fitted line for diffusion coefficient of para-hydroxybenzoic
acid in water vs. temperature.
D(T ) = θ1 T + θ2 (2.1)
∑
5
(D(Tk ) − Dk )2 (2.2)
k=1
where Dk is the measured value of the diffusion coefficients in the kth experiment, Tk is the
measured value of the temperature in the kth experiment and D(Tk ) is obtained by evaluating
Eq. (2.1) at Tk .
Then, the best fit according to this measure is obtained by solving the following optimi-
sation problem
∑5
min (D(Tk ) − Dk )2
θ1 ,θ2 k=1
(2.3)
s.t. D(Tk ) = θ1 Tk + θ2 , k = 1, . . . , 5
θ1 ∈ R, θ2 ∈ R
The smaller the quality measure, the better the fit. This type of problem is called a least-
squares problem. The solution of this problem gives θ1 = 0.1889 and θ2 =2.805 with a “quality
measure” of 0.0249. This corresponds to the line plotted in Fig. 2.1.
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 11
• It involves constraints.
x2 Nonlinear inequality
constraints
Linear
Feasible equality
region constraint
Optimal solution
max x12 + x22
x1
Linear
inequality constraints
Optimal solution
min x12+ x22
2.2.2 Definitions
1. Degrees of freedom Let DIM{h} = m and DIM{x} = n. If n > m then there
are n − m degrees of freedom. The number of degrees of freedom is the number of
decision variables. We want to select values of the decision variables to optimise a
scalar objective function.
2. Feasible Solution Values of the variable vector that satisfy equality and inequality
constraints. Any x ∈ Rn : h(x) = 0, g(x) ≤ 0.
4. Optimal Solution A feasible solution that provides the optimal value for the objective
function. For a minimisation problem, an optimal solution is any x∗ ∈ F : f (x∗ ) ≤
f (x), ∀x ∈ F .
Figure 2.2 illustrates these concepts for a problem with two variables.
min f (x)
x
s.t. h(x) = 0 (2.5)
g(x) ≤ 0
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 13
The mathematical structure of the objective function and the constraints affects the way in
which the problem can be solved. The main classes of optimisation problems are:
Linear Programming (LP) f (x) is a linear function; h(x) and g(x) are linear func-
tions.
• Are there any constraints? If not, the problem is an unconstrained NLP. Oth-
erwise, it is a constrained NLP.
• Are the functions continuous? differentiable? twice-differentiable?
• Are the non-linearities arbitrary or do they follow certain patterns? See for in-
stance linearly constrained quadratic programming where f (x) = xT Qx and h(x)
and g(x) are linear functions.
2.3.2 Examples
A stream with flowrate 75,000 kg/yr must be cooled from T0 =350o C to Tf =120o C. The
available utility is cooling water which enters the cooler at t0 =30o C and must leave at a
maximum of 60o C (tmax ). Its cost is cW (in £/kg). The cooler has a heat transfer coefficient
of U (in J/m2 K) and an annualised cost of cA (in £/m2o C). Design a cooler with minimum
annualised cost.
2. Write the formulation in general form (using symbols for the data). It is good practice
to retain general equations as much as possible.
o
W 30 C
75000 kg/yr
A
o 120 oC
350 C Q
A: Area
tW Q: Heat load
• Formulation:
min C = cA A + cW W
Q,A,W,tW
Q − F CP (T0 − Tf ) = 0
Q − W CP W (tW − t0 ) = 0
s.t (T −t )−(T −t ) Equality constraints
Q − U A 0 (WT −t f ) 0 = 0
ln T0 −tW
f 0
tW − tmax ≤ 0
t0 − tW ≤ 0 Inequality constraints
Q, W, A ≥ 0
A function f (x) is convex over a region R (Fig. 2.5) if and only if for any two different
values x1 , x2 lying in the region R,
f(x) f(x)
f(x2)
f(x 1)
f(x2)
f(x 1)
x1 x2 x x1 x2 x
A function f (x) is concave over R (Fig 2.6) if and only if, for any two points x1 , x2 lying
in the region R,
Convexity is a useful concept in optimisation because a convex function has at most one
(global) minimum. Thus, once we know how to characterise and identify this point, we do
f(x)
f(x2)
f(x 1)
x1 x2 x
not need to worry about its uniqueness. It should be noted that convexity is only a sufficient
condition for the uniqueness of the minimum. Consider the following classes of functions:
f (αx1 + (1 − α)x2 ) ≤ max{f (x1 ), f (x2 )}, ∀α ∈ [0, 1], ∀x1 , x2 ∈ R (2.8)
Invex functions A differentiable function f (x) is invex over R if there exists a vector
function η(x1 , x2 ) such that
All differentiable convex functions are invex. What function η(x1 , x2 ) satisfies the above
condition for all differentiable convex functions?
A differentiable function has a unique minimum if and only if it is invex. Invexity is a
necessary and sufficient condition for uniqueness of the minimum.
Floudas [2] provides a discussion of quasi and pseudo convexity. Many other generalisa-
tions of convexity have been proposed in the literature.
Examples
• The function f (x) = 2x2 is strictly convex. Note that f ′′ (x) = 4 > 0.
• The function f (x) = 2x2 − x3 is not convex. f ′′ (x) = 4 − 6x may be positive or negative
depending on the value of x.
Definition The matrix H(x) of second-order derivatives of f (x) is called its Hessian matrix,
H(x) = ∇2 f (x). A Hessian matrix is always symmetric.
• H(x) is positive definite if and only if all its eigenvalues are positive.
Definition A region R is convex if and only if for any x1 , x2 ∈ R, the point X = αx1 +
(1 − α)x2 is such that X ∈ R, ∀α ∈ [0, 1].
Convex Feasible Region Recall that the feasible region is defined by F = {x ∈ Rn |h(x) =
0, g(x) ≤ 0}.
Sufficient condition for convexity: If the equality constraints h(x) are linear and the
inequality constraints g(x) are convex functions then F is convex.
The nature of the search region has an important bearing on the potential for obtaining
suitable results in optimisation, as shown in Fig. 2.8. A problem with a convex objective
function and a convex feasible region has a unique global minimum.
2.5. UNCONSTRAINED OPTIMISATION 18
(a) (b)
x x
If the Hessian matrix H(x) of f (x) is such that xT H(x∗ )x > 0, ∀x ̸= 0, the stationary point
x∗ is a strong local minimum.
Examples
1. min f (x) = x4
Stationary point: ∇f (x) = 4x3 = 0 ⇔ x∗ = 0.
Hessian matrix: H(x∗ ) = 12x2 |x=0 = 0.
Although the Hessian matrix is not strictly positive definite, x∗ = 0 is a minimum.
Why can we not guarantee that it is a minimum from the analysis?
x
2
Unconstrained
minimum
x
Constrained
minimum
Feasible
region
x1
x
2
x Constrained
minimum
=
Feasible Unconstrained
region minimum
x1
x2 Local
minimum
Global
minimum
10
20
30
Feasible
40
region
50
x1
Based on the eigenvalues of the Hessian matrix, the first two points are minima and
the third point is a saddle point. To determine which minimum is global, we need
to compare the values of the objective function at each of the minima. We have no
guarantee that we have identified all minima.
• Robust, i.e. able to get a solution, since a general non-linear function is unpredictable
in its behaviour, and may have local minima and saddle points.
2. Minimising in that direction to some extent to find a new point xk+1 = xk + ∆xk .
Local optimisation algorithms require an initial starting point and a convergence criterion
for termination as input, in addition to the problem statement.
Available techniques differ mainly in the way they generate search directions.
1. Direct Methods do not require derivatives and rely solely on function evaluations.
They are simple to understand and execute but are inefficient and lack robustness. They
include random search, grid search, univariate search, the simplex method, conjugate
search directions and Powell’s method. They are useful for simple low-dimensional
problems.
2. Indirect Methods use derivatives in determining the search direction for optimisation.
First-order methods such as the steepest descent use first-order derivatives only while
second-order methods such as Newton’s method also use second-order derivatives.
Basic idea The gradient ∇f (x) of a function f (x) at a point x̃ is a vector at that point
that gives the (local) direction of the greatest increase in f (x) and is orthogonal (normal)
to the contour of f at x̃ (see Fig. 2.9). In order to minimise f (x), we move continuously in
the opposite direction: the search direction s is the opposite of the gradient: s = −∇f (x̃).
The search direction is followed continuously until we arrive at a stationary point. Note that
in this method, the negative of the gradient gives the direction for minimisation but not the
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 21
x2
Gradient
Steepest descent
x1
Figure 2.9: The steepest descent method moves in the opposite direction to the gradient
f(x)
f(xk+1 )
f(xk)
xk xk+1 x
magnitude of the step to be taken. At the k th iteration of the steepest descent, the transition
from point xk to another point xk+1 is given by:
xk+1 = xk + λk sk = xk − λk ∇f (xk ) (2.13)
where λk is the scalar that determines the step length in the direction of steepest descent
−∇f (xk ). How big should λk be for optimal performance of the algorithm?
• If λ is small, the path followed is continuous but may need too many iterations;
2. Employ a one-dimensional search along the negative of the gradient to select step size.
Note that ( )
f (xk+1 ) = f xk − λk ∇f (xk ) .
Let F (λk ) ≡ f (xk+1 ). Solving the one-dimensional minimisation problem min F (λk )
λk
enables the generation of an “optimum” step size.
2.5. UNCONSTRAINED OPTIMISATION 22
Algorithmic procedure
Step 1 Choose an initial point x0 . Set the iteration counter k = 0. Set the convergence
tolerance to ϵ;
∂f (x)
, j = 1, . . . , n;
∂xj
sk = −∇f (xk );
Step 4 Set the step size λk by minimising F (λk ) numerically or using the pre-assigned
step size;
Step 5 Compute the next point xk+1 using the relation xk+1 = xk + λk sk ;
Remark Termination can occur at any type of stationary point: a local minimum or a
saddle point. Examine the Hessian matrix of the objective function f (x) to characterise the
stationary point. If it is positive-definite, a minimum has been found. Otherwise, a saddle
point has been found and the search must be continued. To move away from the saddle point,
we employ a non-gradient method. The minimisation may then continue as before.
Performance of the steepest descent method While it is very simple, this approach
is sensitive to the scaling of f (x). Convergence is very slow, leading to poor performance.
Solve the following problem:
min x21 + x22
x1 ,x2
x2
2
3
s=-
∆
()-2
f(1,2) = -4
-3 -2 -1 3
x1
0 1 2
-1
-2
-3
Figure 2.11: Pictorial representation of the example problem. Observe that s is a vector
pointing towards the optimum (0,0)
5. Compute the next point: x(1) = x(0) − 1/2∇f (x(0) ) = (0, 0)T .
The solution is found in one iteration only (although one more iteration is needed to verify
convergence). However if we change the scaling of the problem by defining new variables
y1 = x1 , y2 = x2 /2. Solve
min f (y1 , y2 ) = y12 + 4y 2
3. Calculate λ(0) : min F (λ(0) ) = (1 − 2λ(0) )2 + 4(1 − 8λ(0) )2 . Minimum at λ(0) = 0.1037
Basic Idea Recall that for x∗ to be a stationary point of the function f (x), a necessary
condition is ∇f (x∗ ) = 0. This is a set of n algebraic equations in n unknowns which can be
solved for x∗ using numerical techniques such as Newton’s method.
Relationship between Newton’s method and the optimisation problem The Ja-
cobian of the system F(x) = 0 is the Hessian matrix of the original objective function.
[ ]−1
According to Eq. (2.17), the search direction is given by − J(xk ) F(xk ) or equivalently,
[ ]−1
sk = − H(xk ) ∇f (xk ).
How does this compare to the first-order methods? The search direction of the steepest
descent can be interpreted as being orthogonal to a linear approximation (or tangent) of the
objective function at point xk . Now suppose we make a quadratic approximation of f (x) at
xk . [ ]T 1 ( k )T
f (x) ≈ f (xk ) + ∇f (xk ) ∆xk + ∆x H(xk )∆xk (2.18)
2
where H(xk ) is the Hessian matrix of f (x) evaluated at xk and ∆xk = x − xk . This
approximation takes into account the curvature of f (x) at xk and is used in second-order
methods to determine a search direction.
Step size selection In Eq. (2.17), a step size of 1 is effectively used. A more general
equation is
[ ]−1
xk+1 = xk − λk H(xk ) ∇f (xk ) (2.19)
• If f (x) is quadratic, Newton’s method requires only one step to reach a minimum and
λk = 1 can be used.
The value of λ(0) must be such that f (x(1) ) − f (x(0) ) < 0. Thus, we must have
[ ]T
λ(0) ∇f (x(0) ) ∆x(0) < 0 (2.21)
Examples
• min f (x) = 4x21 + x22 − 2x1 x2 starting from x(0) = (1, 1)T .
( )
8x1 − 2x2
1. ∇f (x) = , hence ∇f (x(0) ) = (6, 0)T .
2x2 − 2x1
( ) ( )
8 −2 1 1
2. H(x) = , hence [H(x)]−1 = 6 6 .
−2 2 1
6
2
3
( ) ( ) ( )
1 −1 0
4. The new guess is then x(1) = x(0) + ∆x(0) = + = and
1 −1 0
f (x(1) ) = 0. It can be checked that at this point ∇f (x(1) ) = 0 and the Hessian
matrix is positive definite.
• min f (x) = 12 x21 x22 + x21 + x22 + 2x1 + x2 starting from x(0) = (1, 1)T .
( )
x1 x22 + 2x1 + 2
1. ∇f (x) = , hence ∇f (x(0) ) = (5, 4)T .
x21 x2 + 2x2 + 1
( ) ( )
x22 + 2 2x1 x2 3 2
2. H(x) = , hence H(x(0) ) = .
2x1 x2 x21 + 2 2 3
3. The step direction is given by
[ ]−1
− H(x(0) ) ∇f (x(0) ) = (−7/5, −2/5)T .
6. The new guess is then x(1) = x(0) + ∆x(0) = (−1.24, 0.36)T and f (x(1) ) = −0.353.
Note that the value of f (x(0) ) was 5.5.
2.6. NONLINEARLY CONSTRAINED OPTIMISATION 26
Step 1 Choose an initial point x0 . Set the iteration counter k = 0. Set the convergence
tolerance to ϵ. Choose λ.
Step 4 Calculate xk+1 = xk + ∆xk . If f (xk+1 ) > f (xk ), reduce the step length λ and
go back to Step 3.
Unified treatment All methods can be viewed as evolving from Newton’s method. For
Newton’s method, H is exact and convergence is quadratic. For Quasi-Newton methods,
H = H̃ and convergence is supralinear. For the steepest descent method, H = H −1 = I and
convergence is linear. There is a trade-off between the cost of Hessian matrix evaluations and
the rate of convergence.
min f (x)
x
s.t h(x) = 0
(2.23)
g(x) ≤ 0
x ∈ Rn
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 27
x2
3 _
1
_
2 (3,2)
1 _
| | | | x1
0 1 2 3 4
where f (x) is a scalar function, h(x) is a vector of size m, g(x) is a vector of size p and
n > m.
Since we know how to solve unconstrained optimisation problems, it would be useful to
transform the constrained optimisation problem (2.23) into an unconstrained problem.
min f (x)
x
s.t h(x) = 0 (2.24)
x ∈ Rn
Example Consider the problem of determining the point on a unit circle centered at (3,2)
which is closest to the origin (0,0).
• elimination of the variables may not be possible analytically. Using Newton’s method to
solve the equalities numerically is expensive and may be prone to numerical problems.
• elimination of the variables does not necessarily yield a unique solution. In the example,
√
for instance, the equality is equivalent to x2 = 2 ± 1 − (x1 − 3)2 . It may not be easy
to choose the correct solution.
∑
m
ϕ(x) = f (x) + K [hi (x)]2 , K > 0. (2.25)
i=1
Note that provided all the constraints are satisfied (i.e. h(x) = 0), this is the same as the
original objective function f (x). On the other hand, if hi (x) = 0 is violated for some i, ϕ(x)
is larger than f (x). Thus, by choosing a very large K, even small constraint violations are
severely penalised. We then expect that the solution of the unconstrained problem
min ϕ(x)
x
will satisfy the constraint h(x) = 0. Chapter 6 of [3] provides an in-depth discussion of
penalty methods and their application to problems with inequality constraints.
Disadvantages A major issue with penalty function approaches is that there is no good
way to choose a value for K.
• If K is too small, the penalty term may not be sufficient to ensure that all equalities
are satisfied. This phenomenon can be attributed to the presence of local minima.
• The larger K becomes the more ill-conditioned the unconstrained optimisation problem
is.
As seen in the previous section, transforming constrained problems into unconstrained prob-
lems may result in numerical difficulties. In order to explore alternative approaches, we need
to understand what characterises a minimum in a constrained problem.
Example Let us return to the example considered earlier and recall that the solution was
(2.17, 1.44)T . We now add the inequality constraint x1 ≥ 2.1, so that the problem formulation
becomes:
min f (x) = x21 + x22
s.t. (x1 − 3)2 + (x2 − 2)2 = 1 (2.38)
2.1 − x1 ≤ 0
Obviously, this does not alter the solution since x̃1 = 2.17 is already greater than 2.1. In this
case, the inequality constraint is inactive and could be ignored. On the other hand, if we
demand that x1 ≥ 2.5, this will affect the solution, since x̃1 violates this requirement. From
Fig. 2.13, it is obvious that the minimum is now at point A at which x1 = 2.5. In this case
the constraint x1 ≥ 2.5 is, in fact, satisfied as an equality x1 = 2.5 and is said to be active.
For optimisation purposes, active inequality constraints could in effect be treated as equal-
ities and assigned a Lagrange multiplier. Let us consider this issue for another example
problem.
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 31
x1> 2.5
x2
3 _
2
_ .
(3,2)
1 _ .
A
| | | | x1
0 1 2 3 4
3 _
~x
2
_
. δx
x1x2=1
1 _
| | | | x1
0 1 2 3 4
It is obvious from the figure that the constraint g(x) will in fact be active at the solution,
i.e. x1 x2 = 1.
Now, consider a point x̃ on the hyperbola x1 x2 = 1. As before, we consider small steps
δx from the minimum x̃. However, in this case the step does not have to take us to another
point on the curve x1 x2 = 1. Then,
{
f (x̃ + δx) = f (x̃) + 2x̃1 δx1 + 2x̃2 δx2 + H.O.T.
(2.40)
−(x̃1 + δx1 )(x̃2 + δx2 ) ≤ −1
The second equation can be re-arranged by noting that x̃1 x̃2 = 1 and neglecting second-order
2.6. NONLINEARLY CONSTRAINED OPTIMISATION 32
As before, for x̃ to be a minimum, the rows of the matrix on LHS must be linearly dependent.
Therefore, there must exist µ1 and µ2 , with at least one non zero, such that:
{
µ1 2x̃1 − µ2 x̃2 = 0
(2.43)
µ1 2x̃2 − µ2 x̃1 = 0
Thus, δf = 2x̃1 δx1 + 2x̃2 δx2 = µx̃2 δx1 + µx̃1 δx2 . Therefore, δf = −µδg. From the definition
of δg and Eq. (2.41), we must have δg ≤ 0 or −δg ≥ 0. Thus, by choosing µ < 0, we
could cause a decrease in f , which is incompatible with the premise that x̃ is a minimum.
Therefore, µ must be non-negative.
Active inequality constraints are treated just like equality constraints with the
additional restriction that their Lagrange multipliers must be non-negative.
∂f ∑
m
∂hi ∑ ∂gi
+ λi + µi = 0, j = 1, . . . , n (2.45)
∂xj ∂xj ∂xj
i=1 i∈AS
∂f ∑ ∂hi ∑ ∂gi
m p
+ λi + µi = 0, j = 1, . . . , n (2.46)
∂xj ∂xj ∂xj
i=1 i=1
Every term in the summation must vanish to satisfy this constraint. For active constraint
gi , gi (x) = 0 by definition. Each inactive constraint gi is such that gi (x) < 0. Thus the
summation can only be equal to 0 if the corresponding µi ’s are equal to 0.
This condition is called the complementarity condition as µi and gi (x) are comple-
mentary in the sense that at least one of them is zero for each constraint.
Complementarity conditions:
µ∗T g(x∗ ) = 0
(2.49)
µ∗ ≥ 0
• Kuhn-Tucker sufficient conditions: Consider x∗ , a feasible point for problem (2.23), for
which the Kuhn-Tucker necessary conditions hold. Define the Hessian matrix of the
restricted Lagrangian as
∑ ∑
∇2 L(x∗ ) = ∇2 f (x∗ ) + λ∗i hi (x∗ ) + µ∗i ∇2 gi (x∗ ),
i i∈I
where I is the set of active inequality constraints (i.e. I = {i : gi (x∗ ) = 0}) . Define
the set I + of strongly active inequality constraints as I + = {i ∈ I : µ∗i > 0}, and the
set I 0 of weakly active inequality constraints as I 0 = {i ∈ I : µ∗i = 0}. The cone of
feasible directions is then the set C such that
C = {x ̸= 0 : ∇gi (x∗ )T x = 0 for i ∈ I +
∇gi (x∗ )T x ≤ 0 for i ∈ I 0
∗
∇hi (x ) x = 0 for i = 1, . . . , m}.
T
xT ∇2 L(x∗ )x > 0, ∀x ∈ C
Note that if f (x) is convex and the feasible region is convex, any point which satisfies the
Kuhn-Tucker necessary conditions is the global minimum of the problem.
2.6. NONLINEARLY CONSTRAINED OPTIMISATION 34
This is a system of 3 linear equations in 3 unknowns, with solution: x∗1 = 1.071, x2 = 1.286
and λ∗ = −4.286. Since the objective function and feasible region are convex, this Kuhn-
Tucker point is the unique solution to the problem and f (x∗ ) = 12.857.
Basic idea The central difficulty in finding Kuhn-Tucker points is determining which in-
equality constraints should be active. In order to address this issue, an iterative strategy
which postulates a different set of active constraints at every iteration can be used.
Algorithmic procedure
Step 1 Let the set of active inequalities JA = ∅. Set µi = 0, i = 1, . . . , p.
• If gi (x∗ ) ≤ 0 and µ∗i ≥ 0 for all i = 1, . . . , p, a solution has been found: terminate.
• Otherwise, add any violated inequality constraint (gi (x∗ ) > 0) to the set JA , set
µi ≥ 0, and remove from JA the inequality with the most negative multiplier (if
any) and set the multiplier of this inequality to 0. Return to step 2.
Example
min x21 + x22
s.t. g1 (x) = 1 − x1 ≤ 0
g2 (x) = x1 − 3 ≤ 0
g3 (x) = 2 − x2 ≤ 0
g4 (x) = x2 − 4 ≤ 0
Step 1 Assume there are no active inequality constraints JA = ∅ and µi = 0, i = 1, 2, 3, 4.
∑
Step 2 Find x∗ such that ∇f (x∗ ) + µi gi (x∗ ) = 0. This is equivalent to
i∈JA
{ {
∂f
∂x1 + 0 = 0 2x∗1 = 0
⇔ ⇔ x∗ = (0, 0)T (2.51)
∂f
∂x2 + 0 = 0 2x∗2 = 0
Note that this is the solution of the unconstrained problem.
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 35
Step 3 Test for violated inequalities: g(x∗ ) = (1, −3, 2, −4)T . Hence g1 and g3 are vio-
lated at x∗ . Set JA = {1, 3}, µ2 = µ4 = 0 and return to Step 2.
∂f
+ µ∗1 ∂x
∂g1
+ µ∗3 ∂x
∂g3
=0
∂x1 1 1
∂f
+ µ∗1 ∂x
∂g1
+ µ ∗ ∂g3 = 0
⇔ ∂x2 2 3 ∂x2
g1 (x∗ ) = 0
g3 (x∗ ) = 0
2x∗1 + µ∗1 (−1) + µ∗3 (0) = 0
2x∗ + µ∗ (0) + µ∗ (−1) = 0
⇔ 2 1
∗ =0
3
1 − x
1
2 − x∗2 = 0
The solution to this linear system is x∗ = (1, 2)T , µ∗1 = 2 and µ∗3 = 4.
Step 3 Test for violated inequalities: g(x∗ ) = (0, −2, 0, −2)T . All inequalities are satis-
fied. Test for non-negativity of the Lagrange multipliers for inequalities: all µi ’s are
non-negative. The global optimum solution has been found (note that the problem is
convex): x∗ = (1, 2)T and f (x∗ ) = 5.
If Step 2 requires the solution of a nonlinear system of equations, Newton’s method can
be used after choosing some initial values for x, λ and µ.
min f (x) = cT x + 12 xT Qx
x
s.t. Ax ≥ b (2.52)
x≥0
Some efficient techniques for solving such problems have been developed (active set strategies,
Simplex-based methods, . . .).
Basic Idea Build QP approximation of problem (2.23) at the current point and solve the
resulting (easier) subproblem. The approximation is constructed by expanding the Kuhn-
Tucker necessary conditions in a Taylor series around the current point and taking a Newton
step.
Further details and an algorithmic procedure are given in Section 2.8.1.
2.6. NONLINEARLY CONSTRAINED OPTIMISATION 36
x
2
linearisation of
nonconvex constraint
global minimum
x1
50 40 30 20
Figure 2.15: Linearisation of nonconvex constraints may lead to the elimination of the global
solution
2. SQP algorithms are “infeasible path” algorithms: the equality and inequality con-
straints are converged simultaneously while minimising f and are only satisfied at the
solution.
3. The rate of convergence is quadratic if the exact Hessian of the Lagrangian is used, and
superlinear if the BFSG approximation is used
4. Some problems may arise during the solution of the QP as its feasible region may be
empty. This can happen when nonconvex constraints are linearised, as shown in Fig.
2.15.
This approach was developed by several groups, such as Abadie and Carpenter, Lasdon and
Warren, Murtagh and Saunders.
Basic Idea Eliminate variables to reduce the dimensionality of the problem and apply
Newton’s method in the reduced space.
Further details on this method and an algorithmic procedure are given in Section 2.8.2.
3. When linear inequalities are involved, they can be transformed to linear equalities by
introducing slack variables.
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 37
SQP
MINOS
• Works best for mostly linear and large scale problems (say 6000 variables and 5000
constraints).
• Requires a moderate number of iterations as sparsity is very well exploited through
LP technology.
• Very complex to program as analytical functions are required to exploit all features
(write f (x) as Ax + f N (x)).
GRG2/GINO
min f (u)
(2.53)
s.t. g(u) ≤ 0
by converging the process simulation at each iteration. This is the approach followed in
ASPEN.
Infeasible path In this case, we solve the problem
min f (u, x)
s.t. r(u, x) = 0 (2.54)
g(u, x) ≤ 0
Recall that the Newton step for such a system of nonlinear equations is given by sk =
[ ]−1
− ∇2 L(xk , λk ) ∇L(xk , λk ). The Jacobian of the Lagrangian is given by
( ) ( )
∇L(x, λ) ∇f (x) + λT ∇h(x)
∇L(x, λ) = =
h(x) h(x)
The quadratic approximation for general problems of form (2.23) is obtained by applying
the same transformations to the objective and equality constraints as in problem (2.59) and
by linearising the inequalities around the current point. Thus the general QP subproblem is
given by
min ∇f (x)T (x − xk ) + 21 (x − xk )T ∇xx L(xk , λk )(x − xk )
x
s.t. h(xk ) + ∇h(xk )T (x − xk ) = 0 (2.60)
g(x ) + ∇g(x ) (x − x ) ≤ 0
k k T k
where the Lagrangian is now L(x, λ, µ) = f (x) + λT h(x) + µT g(x) and the second-order
derivatives with respect to x are given by ∇xx L(x, λ, µ) = ∇xx f (x) + λT ∇xx h(x) +
µT ∇xx g(x). Since ∇xx L requires information on the second-order derivatives of f , h and
g, it is very costly to evaluate. In practice, a Quasi-Newton update formula is usually used.
Step 1 Set iteration counter k = 0, initialise estimate of the Hessian matrix B 0 = I and
give initial guess x0 . Set the step size α. Set convergence tolerances, ϵ1 and ϵ2 .
Step 2 Evaluate functions f (xk ), h(xk ), g(xk ) and their gradients ∇f (xk ), ∇h(xk ) and
∇g(xk ).
The problem of infeasible QP which may arise through linearisation for nonconvex problems
can be overcome by adding a new variable u to the QP as follows:
min ∇f (xk )T d + 12 dT B k d + Ku
d,u
s.t. h(xk ) + ∇h(xk )T d = 0
g(xk ) + ∇g(xk )T d ≤ u
u≥0
where K is a large positive scalar. The effect of this variable is to allow the linearised
inequality constraints to be violated, although this comes at a large cost in the objective
function. Note additionally that this approach offers no guarantee that the optimum solution
will not be cut off.
where A is m × n matrix
( )(m < n) and dim{x} = n. The variable set can be partitioned into
y
two subsets: x = , where y is a vector of m basic (dependent) variables and u is a
u
vector of n−m superbasic (independent) variables. Similarly, the matrix A can be partitioned
into A = [B|C] where B is a square m × m non-singular matrix and C is an m × (n − m)
matrix. Then the equality constraints become
( )
y
Ax = [B|C] = b. (2.62)
u
( ) ( )
∆y k y − yk
Consider the step ∆xk = = that satisfies the linear equalities of
∆uk u − uk
Eq. (2.62) so that
B∆y k + C∆uk = 0 ⇒ ∆y k = −B −1 C∆uk . (2.63)
where ( )
−B −1 C
Z≡ (2.65)
I
is called the transformation matrix. Consider the second-order expansion of the objective
function f (x) at xk
1
f (x) = f (xk ) + ∇f (xk )T ∆xk + ∆xkT ∇2 f (xk )∆xk (2.66)
2
Then, if we substitute Eq. (2.64) into Eq. (2.66)
[ ] 1 [ ]
f (x) ≈ f (xk ) + ∇f (xk )T Z ∆uk + ∆ukT Z T ∇2 f (xk )Z ∆uk ≡ F (∆uk ) (2.67)
2
Eq. (2.67) is an expansion of f in the reduced space of u. Let us define
T = ∇f (xk )T Z
1. Reduced Gradient gR
1
min F (∆uk ) = f (xk ) + gR
T
∆uk + ∆ukT HR ∆uk (2.68)
∆ uk 2
∂f (u) −1
= 0 ⇒ ∆uk = −HR gR (2.69)
∂∆uk
The above expression for ∆uk is of the same form as the usual Newton step, but it is calculated
in the reduced space.
Computing the Newton step ∆uk To compute the reduced gradient gR T = ∇f (xk )T Z,
Therefore [ ]
T
gR = −∇y f (xk )T B −1 C + ∇u f (xk )T (2.71)
This is a common expression for reduced gradients. It would be useful to avoid computing
B −1 . To do so, we define λT = −∇y f (xk )T B −1 . Then,
and solve it for λ (and therefore −∇y f (xk )T B −1 ). The reduced gradient then becomes
T
gR = λT C + ∇u f (xk )T .
It can be shown that λ is the vector of Lagrange multipliers for the original problem (2.61).
Recall that the Kuhn-Tucker conditions for this problem require that
∇f (x) + AT λ = 0.
Therefore, ( ) ( ) ( )
∇y f (xk ) BT 0
+ λ= .
∇u f (xk ) CT 0
Hence, ∇y f (xk ) + B T λ = 0 and B T λ = −∇y f (xk ): the vector λ satisfies the KT necessary
conditions.
The reduced Hessian matrix HR = Z T ∇2 f (xk )Z can be estimated with a quasi-Newton
method (e.g. BFGS) using information on changes in gR T.
Step 1 Select a feasible starting point for Ax = b. This can be obtained through Phase
I of an LP algorithm. Setup the sets of basic (y) and superbasic (u) variables. Set
HR0 = I. Partition the matrix A = [B|C]. Set the iteration counter k = 0 and the
Step 4 Perform a line search for xk+1 = xk + α∆xk , with the aim of minimising f (xk+1 )
in the direction given by ∆xk . Typically, α = 0.3.
• If gR
T g < ϵ and ∆ukT ∆uk < ϵ , terminate.
R 1 2
Given a general problem of form (2.23), redefine it as a problem with equality constraints
only by introducing slack variables:
min f (x)
x,σ
s.t. r(x) = 0 (2.74)
xL ≤ x ≤ xU
σ ∈ Rp
Apply correction step with
Newton to regain feasibility
( )
h(x)
where r(x) = .
g(x) + σ 2
One procedure which has been proposed by Lasdon and Warren (GRG2, GINO) to handle
this type of problem is based on the following approximation:
min f (x)
(2.75)
s.t. J(xk )x = J(xk )xk
where J(xk ) is the Jacobian of r(x) at xk . Note that the solution of the above problem is
not necessarily a feasible point for the original problem. After each approximation is solved,
a Newton correction step is used to ensure feasibility, as shown in Fig. 2.16. This method is
therefore a feasible path method.
Bibliography
[1] M.S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear Programming - Theory and
Algorithms. Second. New York: John Wiley and Sons, 1993.
[2] C. A. Floudas. Nonlinear and Mixed-Integer Optimization: Fundamentals and Applica-
tions. Oxford: Oxford University Press, 1995.
[3] G. V. Reklaitis, A. Ravindran, and K. M. Ragsdell. Engineering Optimization: Methods
and Applications. New York: John Wiley, 1983.
Chapter 3
Many of the decisions made in process synthesis are discrete in nature (number of trays in
a distillation, use of a given unit type). If we are to use an algorithmic approach to process
synthesis, we therefore need to represent these discrete choices mathematically and learn
how to solve the resulting optimisation problems. In this chapter, we consider the issue of
modelling discrete decisions.
There are three main steps in this process:
For homogeneous systems such as heat exchanger networks, distillation sequences and
utility plants, systematic representations can be developed. Fig. 3.1 shows a network of
distillation columns for a four component system with sharp splits (Andrecovich and Wester-
berg, 1985). All possible column sequences are considered in this diagram. Figure 3.2 shows
a superstructure for a CSTR/PFR system. The reactors may be used alone, in parallel, in
series, or in any other configuration, and recycles may be included.
For heterogeneous systems such as overall plant flowsheets, alternatives must generally
be specified by the user. For instance, different types of reactors may be proposed, several
recycle structures may be postulated, a few feedstock choices may be specified, . . ..
B
C
A D
B C
C B
D
D C
D
B
A A C
B B
C C
D D
A
B A
A
C B
B
C
D A
B
C
Figure 3.1: Network of distillation columns with sharp splits representing all alternative
routes (Andrecovich and Westerberg, 1985)
CSTR
PFR
Caveats
• Great care must be taken when formulating the superstructure as the optimum found
is dictated by the assumptions made in this stage: only alternatives embedded within
the superstructure can be selected as optimal solution!
• The performance of the optimisation algorithms used to optimise the structure is highly
dependent upon the mathematical representation of the superstructure. While it is pos-
sible to automate the derivation of a mathematical representation, obtaining a “good”
formulation still demands experience and an understanding of the optimisation algo-
rithms.
V = 0 OR V ≥ Vmin (3.1)
Symbol Meaning
∨ OR
⊕ EXCLUSIVE OR
∧ AND
¬ NOT
Binary variables are variables which can only take on the values 0 and 1. By convention,
they are usually denoted by y. Often, a binary variable y is associated with a process
unit as follows:
{
0 if unit does not exist
y= (3.2)
1 if unit exists
If such a variable is defined in the reactor example, Eq. (3.1) can be rewritten as the following
algebraic equation:
Vmin y ≤ V ≤ M y (3.3)
where M is a very large number (or an upper bound on reactor volume, if one is known).
When the reactor does not exist, y = 0 and Eq. (3.3) becomes 0 ≤ V ≤ 0, i.e. V = 0. When
the reactor exists, y = 1 and Eq. (3.3) becomes Vmin ≤ V ≤ M .
It is worth noting that some recent optimisation strategies can handle propositional logic
expressions directly. For further information, refer to the disjunctive programming literature
[1, 2].
Any propositional logic statement can be transformed into an equivalent algebraic equation
with binary variables using a systematic procedure.
Step 2 Transform the logical expressions into an equivalent conjunctive normal form
Q1 ∧ Q2 ∧ . . . ∧ Qn , where the Qj ’s (j = 1, . . . , n) are clauses that depend on the Pi ’s.
The n Q’s are independently true.
CHAPTER 3. MODELLING DISCRETE ALTERNATIVES 49
Step 3 Transform each Q into an equivalent algebraic equation or set of algebraic equa-
tions.
¬ ((P1 ∧ P2 ) ∨ P3 ) ∨ (P4 ∨ P5 ).
The systematic procedure can be applied to some frequently arising situations in process
synthesis.
⊕
m ∑
m
1. Mutually exclusive alternatives: Select only one out of m: Pi , or yi = 1.
i=1 i=1
∨
m ∑
m
2. Non-exclusive alternatives: Select at most one out of m: Pi , or yi ≤ 1.
i=1 i=1
3.2. SUPERSTRUCTURE FORMULATION 50
Figure 3.3: Plot of a linear cost function defined over a discontinuous domain
8. Discontinuous functions: Fixed charge cost model with a discontinuous domain (see
Fig. 3.3): {
α + βx L ≤ x ≤ U
C=
0 x=0
Figure 3.5: The superstructure for a pump network with three pump types
Exercise Construct the mathematical representation of a unit with a piecewise linear cost
function of the continuous capacity variable x, as shown in Fig. 3.4.
• we could introduce integer variables (nsi ∈ {0, 1, 2, 3}, the number of pumps of type i
in series)
Once again, there exists a systematic way to express an integer variable through a set of
binary variables. Consider the integer variable n such that n ∈ {0, Un } ∩ N. Define K such
that
K − 1 ≤ log2 Un < K (3.4)
and K binary variables, yk , k = 1, . . . , K. Then, the following equations can be used to define
n in terms of binary variables:
∑K
n= 2k−1 yk
k=1 (3.5)
n ≤ Un
Bibliography
[1] E. Balas. “Disjunctive Programming and a Hierarchy of Relaxations for Discrete Opti-
mization Problems”. In: SIAM Journal on Algebraic and Discrete Methods 6.3 (1985),
pp. 466–486.
[2] A. Vecchietti and I.E. Grossmann. “LOGMIP: A Disjunctive 0-1 Non-Linear Optimizer
for Process System Models”. In: Computers and Chemical Engineering 23.4-5 (1999),
pp. 555–565.
[3] T. Westerlund, F. Pettersson, and I. E. Grossmann. “Optimization of Pump Configu-
ration Problems as a MINLP Problem”. In: Computers and Chemical Engineering 18.9
(1994), pp. 845–858.
Chapter 4
min f (x, y)
x,y
s.t. h(x, y) = 0
g(x, y) ≤ 0 (4.1)
x ∈ Rn
y ∈ {0, 1}q
The x variable vector represents the continuous decisions (flowrates, equipment sizes, pres-
sure, temperature, heat duties) and the y variables represent the existence or non-existence
of process units. Problems of form (4.1) are referred to as MINLPs (mixed-integer nonlinear
programs) when at least one function in the problem is nonlinear, and MILPs (mixed-integer
linear programs) when all functions are linear.
q 2 5 10 20 50 100
2q 4 32 1024 106 1015 1030
Table 4.1: Evolution of the number of combinations with increasing number of binary vari-
ables
found by comparing the solutions of the LPs. There are 2q combinations to be tested. The
combinatorial explosion in the number of solutions to be tested is highlighted in Table 4.1.
Considering 50 binary variables, and assuming that it takes only 10 ms to solve each LP,
it would take 31 years to try all combinations, . . .
Let us “relax” the MILP by removing the integrality condition on the y variables. They
are now allowed to vary continuously between 0 and 1 and the resulting problem is an LP.
Note that because we have less stringent conditions on the problem, the solution we will
obtain cannot be greater than the solution of the original MILP. In some special cases, the
solution of the LP is equal to that of the MILP. A sufficient condition for this to occur is
the matrix B in problem (4.2) to be unimodular (i.e., every square non-singular matrix of
B has a determinant equal to 1). However, for general unstructured MILPs, some of the y
variables will be non-integer at the solution of the relaxed LP. This is usually the case in
process synthesis.
Exercise Consider an assignment problem were m jobs must be distributed over m ma-
chines. The cost of the assignment is determined by cost coefficients Cij corresponding to
the cost of carrying out job i on machine j. We define the binary variable yij as equal to 1
if job i is assigned to machine j and 0 otherwise. The problem can be formulated as
∑
m ∑
m
min Cij yij
i=1 j=1
∑
m
s.t. yij = 1, j = 1, . . . , m
i=1 (4.3)
∑
m
yij = 1, i = 1, . . . , m
j=1
yij ∈ {0, 1}, i = 1, . . . , m, j = 1, . . . , m
The first set of equality constraints ensures that one and only one machine is assigned to each
job. The second set ensures that one and only one job is assigned to each machine. Show
that the solution of the relaxed problem is the same as that of the MILP for m = 3.
In the general case where the solution of the relaxed problem is non-integer, one may apply
a rounding scheme such as rounding the solution to the nearest integer. However, this may
result in a sub-optimal solution, or even in an infeasible combination!
CHAPTER 4. MIXED INTEGER PROGRAMMING 55
min −1.2y1 − y2
s.t. y 1 + y2 = 1
(4.4)
1.2y1 + 0.5y2 ≤ 1
(y1 , y2 ) ∈ {0, 1}2
The solution of the relaxed problem is (y1 , y2 ) = (0.715, 0.285)T with an objective function
value of −1.148. Rounding to the nearest integer yields (y1 , y2 ) = (1, 0)T , a combination
which violates the inequality constraint. The optimal solution is in fact (y1 , y2 ) = (0, 1)T
(the opposite of what was found) and the corresponding objective function value is -1. Note
∗
that fM ∗
ILP > fLP .
Consider an LP relaxation of problem (4.2) where the binary variables in some set Ji have
been fixed (dim{Ji } < q) and the remaining binary variables are allowed to vary between 0
and 1. This subproblem Pi is given by
Consider another LP relaxation Pj derived from Pi where the set of variables Jj that have
been fixed includes Ji (Ji ⊂ Jj and dim{Jj } ≤ q). Pj is given by
We define f ∗ as the objective function value at the solution of problem (4.2). fi∗ and fj∗
denote the objective function values at the solution of problems (Pi ) and (Pj ) respectively.
Then, the following properties hold:
At the first iteration of the B&B algorithm for the MILP, problem P0 , the full LP
relaxation of the original problem is solved.
• If all binary variables are integer at its solution f0∗ , the optimal solution of (4.2)
has been found.
• Otherwise, f0∗ constitutes a lower bound on the problem.
At the second iteration, two subproblems of P0 or nodes are created by fixing one of
the binary variables, y 1 .
∗ and f ∗
The solutions of these two subproblems give two new tighter lower bounds, f1,0 1,1
respectively.
∗ , f ∗ }.
• The smallest known lower bound on the problem is given by fmin = min{f1,0 1,1
We now select one node from the list and use it to create two nodes by fixing an-
other binary variable. This enables us to generate two new tighter lower bounds, to
update the upper bound, test for convergence and to eliminate any node with lower
bound greater than the upper bound. All remaining nodes are added to the list to be
analysed further.
Relaxed LP
Level 0
y =0 y1 = 1
1
Level 1 A
y2 = 1 y2 = 0 y2 = 1
y2 = 0
Level 2
y3 = 0 y3 = 0 y3 = 0
y3 = 0 y3 = 1 y3 = 1
y3 = 1 y3 = 1
Level 3 B
Branching In the branching step, the binary variable(s) to be fixed at the new children
nodes is (are) selected. While most algorithms only fix one additional binary variable at a
time, more than one can be fixed. Ideally, we would like to fix the variable that will lead to
the greatest increase in the value of the lower bound as this speeds up convergence. However,
there is no way to detect this variable a priori. The following strategies are used in practice:
• Choose a variable randomly, or choose the first variable that has not yet been fixed
from the list of binary variables.
• Choose the most-fractional variable, i.e. the binary variable whose value at the solution
of the relaxation of the parent node is closest to 0.5. Thus, if the solution at the root
node is given by y = (1, 0.8, 0.5)T , select y3 for branching.
Node selection A number of alternative criteria can be used to decide which node should
be selected for further analysis from the list of open nodes.
• Newest bound rule (depth first, LIFO): expand the most recently created node, as
shown in Fig. 4.2. Features of this rule are:
• Best bound rule (“breadth first”, priority rule): Expand node with the smallest lower
bound, as shown in Fig. 4.3. Features of this rule include:
1
10
4 2
13 12
3
14
+ 18 16
(Infeasible) (18 > 15)(16 > 15) 16
_
15 =Z (16 > 15)
Figure 4.2: Order of exploration of nodes using the newest bound rule
1
10
3 2
13 12
4
14
+ 18 16
(Infeasible) (18 > 15)(16 > 15) 16
_
15 =Z (16 > 15)
Figure 4.3: Order of node exploration using the best bound rule
CHAPTER 4. MIXED INTEGER PROGRAMMING 59
In commercial codes, combinations of these rules are used. For instance, one may use a
depth-first approach, but follow the child node with the lowest bound rather than the left or
right branch specifically.
We solve this problem using the best bound rule and branching on the most fractional variable.
The procedure is illustrated in Fig. 4.4.
Iteration 3 Select node 4 and create children nodes by fixing y3 . A feasible integer
solution is found for y3 = 1, giving an upper bound of 4.5. This upper bound can be
used to rule out nodes 1, 3, and 5. The only remaining node is node 6, which has an
integer solution. The optimal solution has been found!
A total of seven subproblems have been solved. Exhaustive enumeration would have required
the solution of sixteen problems.
• Unlike B&B algorithms, no feasible solution can be identified before the optimal solution
has been found.
• In addition, finite convergence of the algorithm is difficult to prove: thus, if the solution
procedure is very slow, we may end up with no solution at all. For B&B algorithms, at
least one feasible solution can usually be identified in reasonable time.
• There is no single strategy to construct cutting planes. Some methods are described in
Discrete Optimization by Parker and Rardin ( Academic Press, 1988).
CHAPTER 4. MIXED INTEGER PROGRAMMING 61
y y
2 2
y y
1 1
• Cutting plane algorithms have not been as popular as Branch and Bound algorithms
but their popularity is growing. It is possible to combine both approaches.
y ∈ {0, 1}q
In solving problems of this type, two difficulties must be overcome: the combinatorial nature
of the problem which arises from the presence of binary variables, and, when the non-linear
functions are nonconvex, the presence of local minima.
to a local solution. In terms of efficiency, it is not as easy to update the NLPs at each node
as it is to update LPs and more effort is therefore needed at each node. Finally, note that
B&B algorithms can be used to solve problems with integer variables without reformulation.
y ∈ {0, 1}q
The participation of the binary variables is limited to mixed-bilinear and linear terms. This
is however not restrictive as any problem of form (4.6) can be transformed in to a problem
of form (4.7).
The GBD algorithm is based on three main principles:
Partitioning of the variable set The y variables are referred to as complicating vari-
ables and handled differently from the x variables. In Geoffrion’s original work, the
complicating y variables were not restricted to binary variables, but could also be con-
tinuous. Thus, this algorithm can also be used to handle bilinear nonconvexities in a
rigorous manner.
Iterative refinement By using the information provided by any given primal and master
problems, new primal and master problems can be constructed in such a way that the
bounds become tighter and convergence can be achieved within a finite number of
iterations.
The kth primal problem is obtained by fixing the binary variables in (4.7) to some combination
y k . Let us define f (x, y) = f x (x) + xT Ay + cT y, h(x, y) = hx (x) + xT By + dT y and
g(x, y) = g x (x) + xT Cy + eT y. The resulting primal problem is given by
min f (x, y k )
x
s.t. h(x, y k ) = 0
(P k )
g(x, y k ) ≤ 0
x ∈ Rn
This NLP provides an upper bound on (4.7). When solving the primal problem P k , two
situations can be encountered: the NLP is feasible or infeasible.
CHAPTER 4. MIXED INTEGER PROGRAMMING 63
Feasible primal problem In this case, the solution of the primal problem yields an upper
k
bound f on the MINLP, values of the continuous variables at that solution xk and values of
the optimal Lagrange multipliers at that solution λk and µk . A Lagrange function can then
be formulated as
Infeasible primal problem In this case, a feasibility problem is formulated and solved in
an attempt to identify a feasible or nearly feasible point:
∑ p
min αi
x,α i=1
s.t. h(x, y k ) = 0
(4.9)
g(x, y k ) ≤ α
x ∈ Rn
α≥0
The solution of this problem is greater than 0 if no feasible point can be found. At the
solution xk , the Lagrange multipliers λIP,k and µIP,k enable the specification of the following
Lagrange function:
LIP (x, y; λIP,k , µIP,k ) = λIP,kT h(x, y) + µIP,kT g(x, y). (4.10)
The relaxed master problem at iteration K is constructed from the evaluation of the La-
grange functions at the solution of the primal and infeasible primal problems for all previous
iterations:
min η K
y ,ηK
s.t. η K ≥ L(xk , y, λk , µk ), k = 1, . . . , K (4.11)
0 ≥ LIP (xk , y, λIP,k , µIP,k ), k = 1, . . . , K
y ∈ {0, 1}q
This problem is an MILP with a single continuous variable and can be solved easily. η K is a
lower bound on the solution of (4.7). Since a new constraint is added to the master problem
at each iteration, the sequence of lower bounds obtained in non-decreasing. The solution
vector y K can be used to construct the primal problem for iteration K + 1.
To ensure that any combination y k of the binary variables is not generated twice, integer
cuts may be added to the set of constraints. Let Z k = {i : yik = 0} and N Z k = {i : yik = 1}.
Then the constraint
∑ ∑
yi − yi ≤ |N Z k | − 1, (4.12)
i∈N Z k i∈Z k
Objective
function
+
Current best
upper bound
+
+
+ Upper bound (Primal)
+ + Lower bound (Master)
+
l l l l l l l l l
1 2 3 4 5 6 7 8 9 Iterations
Typical progress for the upper and lower bounds during a GBD run is shown on Fig. 4.6. The
algorithm can be terminated when the lower bound exceeds the current best upper bound
(i.e. the smallest upper bound) since subsequent solutions can only yield a larger objective
function.
When integer cuts are added to the relaxed master at each iteration, it is possible for
the relaxed master to become infeasible before the lower and upper bound have converged.
This means that all feasible integer combinations have been explored and that the solution
is therefore the best upper bound found so far.
min −2.7y + x2
x,y
s.t. g1 = − ln(1 + x) + y ≤ 0
g2 = − ln(x − 0.57) − 1.1 + y ≤ 0 (4.13)
0≤x≤2
y ∈ {0, 1}
The feasible region and the objective function are convex as plotted in Fig. 4.7.
• Iteration 1
1. Set y (1) = 1.
2. The first primal problem is P (1) :
min −2.7 + x2
x
s.t. g1 = − ln(1 + x) + 1 ≤ 0
g2 = − ln(x − 0.57) − 0.1 ≤ 0
0≤x≤2
CHAPTER 4. MIXED INTEGER PROGRAMMING 65
x
2 x=2
g2 =0
_
1 Decreasing f(x,y)
g1 =0
|1 y
0
Feasible region
Its solution is (x(1) , µ(1) ) = (1.7183, 9.3417, 0)T with an objective function of
0.2525. The Lagrange function is given by
min η (1)
y,η (1)
s.t. η (1) ≥ L(x(1) , y, µ(1) )
• Iteration 2
1. The primal problem P (2) is built with y (2) = 0, as specified by the solution of
the previous relaxed master. The solution of P (2) is (x(2) , µ(2) ) = (0.9028, 0, 0.6)T .
The objective function value is 0.815 which is greater than the previous upper
bound of 0.2525. The best upper bound is therefore 0.2525.
2. The relaxed master problem is
min η (2)
y,η (2)
s.t. η (2) ≥ L(x(1) , y, µ(1) )
η (2) ≥ L(x(2) , y, µ(2) )
The solution is η (2) = 0.2525 at y (3) = 1. This is equal to the upper bound and
the solution has therefore been found.
The integer cut y ≤ 0 could have been introduced in the first relaxed master problem to
eliminate y = 1 as this possibility had already been tried. This would not have changed the
solution. However, introducing the additional integer cut −y ≤ −1 in the second relaxed
master would have resulted in an infeasible problem and hence convergence.
4.2. MIXED-INTEGER NON-LINEAR PROGRAMMING (MINLP) 66
min f x (x) + cT y
x, y
s.t. hx (x) + dT y = 0
g x (x) + eT y ≤ 0 (4.14)
x ∈ Rn
y ∈ {0, 1}q
The general procedure in the outer-approximation algorithm is the same as in the GBD
algorithm. The main difference stems from the construction of the relaxed master problem.
If we assume that the objective function and feasible region are convex, problem (4.14) can be
expressed through the linearisation (outer-approximation) of the constraints and objective
function at an infinite number of points, as shown in Fig. 4.8. A relaxation is necessary
to make the linear master problem tractable and only a reduced number of linearisations is
considered. These linearisations are taken at the solution of the primal problems, to give the
following relaxed master problem at the K th iteration:
min cT y + η K
x,y ,ηK
η K ≥ f x (xk ) + ∇f x (xk )T (x − xk )
[ ]
s.t. T k hx (xk ) + ∇hx (xk )T (x − xk ) + dt y ≤ 0 ∀k = 1, . . . , K (4.15)
g x (xk ) + ∇g x (xk )T (x − xk ) + eT y ≤ 0
x ∈ Rn
y ∈ {0, 1}q
CHAPTER 4. MIXED INTEGER PROGRAMMING 67
−1 if λi < 0
k
where T k is the relaxation matrix, a diagonal matrix where tii = +1 if λki > 0 for
0 if λk = 0
i
i = 1, . . . , m, and where λki denotes the Lagrange multiplier for the ith equality constraint at
the solution of the kth primal problem. This matrix essentially transforms equality constraints
into inequality constraints without modifying the solution of the problem. In the case where
the primal problem is infeasible, the linearisation of the constraints at the solution of the
feasibility problem is added but the objective function is not linearised.
The relaxed master problem is an MILP which involves the x and y variables as well
as η K . Its solution is a lower bound on the solution of (4.14), provided that the convexity
conditions are met. As in the GBD algorithm, the lower bounds increase monotonically as
the iterations proceed.
The algorithm terminates when the lower bound is greater than or equal to the upper bound
on the solution. Note that when the convexity conditions are met, the two bounds are exactly
equal at the end of the run. When integer cuts are used, the infeasibility of the relaxed master
also indicates convergence.
1. Select y (1) = 1.
2. The first primal problem P (1) is the same as with the GBD and therefore yields
the same upper bound of 0.2525 at x(1) = 1.7183. We linearise the constraints
and the objective function using the gradient information generated in the NLP
solver:
f xL (x) = f (x(1) ) + 2x(1) (x − x(1) ) = 3.4366x − 2.9526
g1L (x, y) = g1 (x(1) , y) − 1+x1 (1) (x − x(1) ) = −0.3679 − 0.3679x + y
g2L (x, y) = g2 (x(1) , y) − x(1) −0.57
1
(x − x(1) ) = 0.2581 − 0.8709x + y
A2 B2
2
B
A 1 C
3 B3
A3 B1 (purchased)
• Iteration 2
1. Solve the primal problem P (2) with y (2) = 0. The solution is the same as with the
GBD. Linearise the objective function and constraints around this point.
2. Solve the relaxed master problem:
The solution of this problem gives a lower bound of 0.2525. The algorithm can
terminate.
It is proposed to manufacture a chemical C with a process 1 that uses raw material B. B can
either be purchased or manufactured with either two processes, 2 or 3, which use chemical A
as a raw material. The task is then to find the selection of processes and production levels
that maximise total profit. The superstructure for this problem is shown in Fig. 4.9.
Data
Iteration 1 2 3 4 5
GBD -27.33 -23.83 -11.85 -2.72 -1.92
OA/ER -3.71 +∞ — — —
Table 4.2: Progress of the lower bound for the GBD and OA algorithms
• Process 1: FC = 0.9FB
• Process 2: FB = ln(1 + FA )
• Process 3: FB = 1.2 ln(1 + FA )
Formulation Let us define three binary variables y1 , y2 and y3 . These are equal to 1 if
processes 1, 2 and 3 exist respectively, and 0 otherwise. The profit is given by
Solution outline Choose a starting point of y 1 = (1, 1, 0)T . The sequence of lower bounds
for the two algorithms is shown in Table 4.2. The optimal solution has a profit of $1920/hr
with binary variables y = (1, 0, 1)T . The corresponding superstructure is shown in Fig. 4.10.
4.2. MIXED-INTEGER NON-LINEAR PROGRAMMING (MINLP) 70
B
A 1 C
3 B3
A3
The behaviour of many physical systems may be expressed as the solution to an optimi-
sation problem. Out of those, it is often the case that the global solution of a problem is of
special significance. In fact, in order to correctly predict the behaviour of numerous systems
(e.g. phase equilibria, path finding, crystal prediction), it is necessary to locate the global
optimum of an optimisation problem. In other systems (e.g., financial modelling, product
design), the difference between a locally optimal solution and the global one may translate
into the loss of millions of pounds.
There are numerous alternative ways in which this problem may be approached. The first
is to create a linear model of the system one is attempting to describe. This linear model may
be reliably solved to global optimality, even for millions of variables, using a linear solver (e.g.
CPLEX, Gurobi), but the modelling accuracy may be very low. The second is to create a
nonlinear model which adequately describes the complexity, but is difficult to solve to global
optimality. To overcome this difficulty, numerous local optimisation runs may be attempted
in order to locate a reasonably good solution, or a global optimisation algorithm may be
employed. Some of these algorithms provide a theoretical guarantee of convergence to the
global solution under certain circumstances.
Neumaier [10] provided the following classification of global optimisation algorithms,
based on the degree of rigour with which they approach the problem:
1. An incomplete method uses clever intuitive heuristics for searching but has no safe-
guards if the search gets stuck in a local optimum (e.g, an interior-point algorithm), or
one of the many guided search methods such as particle swarm optimisation (PSO) or
genetic algorithms (GA).
3. A complete method reaches a global optimum with certainty, assuming exact compu-
tations and indefinitely long run-time, and can guarantee after a finite time that an
5.1. SPATIAL BRANCH-AND-BOUND METHODS 72
approximate global solution has been found, within prescribed tolerances (e.g., the
αBB algorithm [6, 3], which will be discussed in this chapter).
4. A rigorous method reaches a global optimum with certainty and within given toler-
ances even in the presence of rounding errors, except in near-degenerate cases, where
the tolerances may be exceeded. Because every part of a software implementation needs
to be rigorous not only algorithmically, but also with respect to numerical calculations
(including all included software libraries), software which implements this type of algo-
rithms is very difficult to develop.
Thus, if it is necessary to achieve certainty that a global solution has been located, a complete
or rigorous methods must be employed. In this chapter, we will focus on deterministic
global optimisation (DGO) algorithms that belong to the classes of complete methods. Such
methods are based on a detailed theoretical analysis of the functions involved in the problem.
Because global optimisation problems are very difficult to solve, numerous heuristics (rules-
of-thumb) are typically combined with theoretical elements in order to accelerate the solution
and solve the problem in reasonable time. Thus, in this chapter will introduce theoretical
underestimation methodologies as well as some of the most commonly used heuristic methods
in modern DGO. In the sections which follow, it is assumed that the optimisation problem
is the minimisation problem P :
P : min f (x)
x∈X
s.t. g(x) ≤ 0
h(x) = 0
X = [xL , xU ] (5.1)
where all functions f, g, h are general non-linear twice differentiable1 non-infinite functions,
defined over a hyper-rectangular domain X ⊂ RN , and N is the number of variables.
The first step is to derive lower and upper bounds on the objective function that are valid
over the whole range of the variable (First iteration). The two bounds are derived as follows:
(i) the upper bound f¯∗ on the value of the objective function, typically acquired by local
minimisation of the original problem, and (ii) a rigorous lower bound f ∗ on the range of
the possible values that the objective function may acquire within that region. The lower
bound may be calculated by constructing a convex relaxation f˘ of the original function
f : f˘(x) ≤ f (x), ∀x ∈ X. We will discuss in some detail how this can be constructed later in
the chapter. Because this new function is convex and smaller than or equal to f everywhere
in X, it is possible to extract a lower bound on the value of f by solving the corresponding
local optimisation problem2 :
At the first iteration, the upper bound we find is the best known upper bound on the global
solution of the problem, f U B and we set f U B = f¯∗ . Furthermore, we know that f ∗ is a valid
lower bound on f for all x ∈ X and we can set the best overall lower bound f LB accordingly:
f LB = f ∗ . We can thus state that the value of the objective function at the global solution,
f ∗ , is such that f LB ≤ f ∗ ≤ f U B .
Then, in the second iteration, one “branches on the variable”, i.e., divides the solution
space X ⊂ R in two subregions, with x ∈ X2 as denoted by “Region 2” in the figure and
with x ∈ X3 as denoted by “Region 3”. This is followed by bounding steps in each subregion,
where once again two types of bounds need to be calculated: (i) an upper bound, f¯i∗ , which
will be valid for the original space X and may again be obtained by local optimisation; (ii) a
lower bound on the current subregion, obtained by constructing relaxations of the function
for this specific region. For instance, for Region 2, the lower bound will be derived from a
convex relaxation f over X2 , i.e., f˘(x) ≤ f (x), ∀x ∈ X2
The corresponding problem over subregion i (x ∈ Xi ) is given by:
Once a valid upper bound and lower bound are available for each branch, the next step
is to determine whether this information allows any reduction of the solution space. In other
words, we ask the following question: “given these bounds, is there any branch Xi , i = 2, 3
where it is impossible to find a global solution?”.
To answer this, we first update the value of the best known upper bound, such that
{ }
f U B = min f U B , f¯2∗ , f¯3∗ . In the example illustrated in Figure 5.1, the upper bound has
remained the same as at Iteration 1. Considering the lower bounds in the figure, we see that
the lower bound generated in Region 3 is greater than the best upper bound. Therefore, we
can reliably infer that it is impossible to find the global solution in Region 3, and remove that
region from our search. This process of removing a region from the search tree is referred to
as fathoming that region. Furthermore, we can conclude that the lower bound on f in Region
2, f ∗2 is a valid lower bound on the global solution for all x ∈ X and update f LB = f ∗2 .
2
Because the problem is convex, the local solution is equivalent to the global one, and can be reliably
located by means of a local optimisation method.
5.2. CONVEX RELAXATION OF FUNCTIONS 74
Upper bound
Upper bound
Lower bound
Lower bound
1111111Region 1
0000000 0000
1111
00
11
0000Region 3 00
1111
111111111
000000000 0000Region 2 11
1111 00 11
Objective function
Convex relaxation of the objective function
Previous relaxation of the objective function
This process may be repeated iteratively, until the separation distance f U B −f LB becomes
smaller than a predefined tolerance ϵ, in which case the algorithm achieves ϵ−convergence.
If the branch (node) where the global minimum is located is also detected to be convex, the
algorithm can achieve full convergence, as shown in Figure 5.2.
2 3
5 6 7 4
9 8
10
Fathomed node
111
000
000
111
000
111 Optimal node
000
111
A set of simple rules has been developed to perform operations on intervals. Given two
intervals [a1 , a1 ] and [a2 , a2 ], the basic interval arithmetic operations are
• [a1 , a1 ]/[a2 , a2 ] = [min{a1 /a2 , a1 /a2 }, max{a1 /a2 , a1 /a2 }], for a2 ≥ 0.
Ranges for trigonometric and transcendental functions can be computed based on their known
behaviour (monotonicity properties or location of extrema). Composite functions constructed
from these simple operations can then be enclosed through recursive evaluations.
Examples
As an illustration, enclosures for two non-linear expressions are computed for (x1 , x2 ) ∈
[−1, 2] × [−1, 1].
Since sin([−1, 2]) = [sin(−1), 1] = [−0.841472, 1], − sin([−1, 2]) = [−1, 0.841472].
Moreover, cos([−1, 1]) = [cos(1), 1] = [0.540301, 1].
Hence, − sin([−1, 2]) cos([−1, 1]) = [−1, 0.841472].
2[−1, 1] = [−2, 2] and ([−1, 1]2 + 1)2 = ([0, 1] + 1)2 = [1, 2]2 = [1, 4].
2[−1,1] [−2,2]
Thus ([−1,1] 2 +1)2 = [1,4] = [min{−2/1, −2/4}, max{2/1, 2/4}] = [−2, 2].
2[−1,1]
Finally, − sin([−1, 2]) cos([−1, 1]) + ([−1,1]2 +1)2
= [−1, 0.841472] + [−2, 2]
= [−3, 2.841472].
wB ≥ x L
1 x2 + x2 x1 − x1 x2
L L L
wB ≥ x U
1 x2 + x2 x1 − x1 x2
U U U
(5.5)
1 x2 + x2 x1 − x1 x2 ; x1 x2 + x2 x1 − x1 x2 }
wB = max{xL L L L U U U U
(5.6)
Al-Khayyal and Falk [2] later showed that this relaxation is the convex hull 3 of the bilinear
expression. An illustration of the containment set that these constraints generate is given in
Figure 5.4.
Exercise Show that
x1 x2 ≥ xL
1 x2 + x2 x1 − x1 x2 ,
L L L
x1 x2 ≥ xU
1 x2 + x2 x1 − x1 x2 .
U U U
(5.7)
In a similar fashion, the concave relaxation of a bilinear term may be expressed as:
wB ≤ x U
1 x2 + x2 x1 − x1 x2
L U L
wB ≤ x L
1 x2 + x2 x1 − x1 x2
U L U
(5.8)
Thus, a problem where all the non-linearities involve only bilinear terms may be relaxed by
constructing a linear problem where every one of the bilinear terms (e.g., term j involving
variables xi and xi′ is replaced by a new variable wBj , and a matching set of linear constraints.
Each wBj has a bijective relationship with the set of variables it is replacing, i.e., with xi xi′ ,
3
The tightest possible convex set which contains the original function.
CHAPTER 5. DETERMINISTIC GLOBAL OPTIMISATION 77
x1 x2
1
0.5
0
-0.5
-1
1
0.5
-1 0
-0.5
0 -0.5 x2
0.5 -1
1
x1 Figure 5.3: Illustration of a bilinear term x1 x2 .
wB
1
0.5
0
-0.5
-1
1
0.5
-1 0
-0.5
0 -0.5
x2
0.5 -1
x1 1
Figure 5.4: Illustration of the McCormick containment set for a bilinear term.
5.2. CONVEX RELAXATION OF FUNCTIONS 78
and this term will be substituted by the same wBj wherever it appears in the problem.
min x1 x2 + x1 x3
x∈X
s.t. x1 x2 ≤ 0
x1 x3 ≤ 0 (5.9)
This problem has two bilinear terms, x1 x2 , and x1 x3 . Let wB1 and wB2 be the variables which
substitute each of them respectively. Then, the convex relaxation of the original problem is:
wB1 ≥ xU
1 x2 + x2 x1 − x1 x2
U U U
wB2 ≥ xL
1 x3 + x3 x1 − x1 x3
L L L
wB2 ≥ xU
1 x3 + x3 x1 − x1 x3
U U U
(5.10)
Although the relaxed problem is bigger, in the sense that it has more variables and constraints,
it is much easier to solve than the original one, because all constraints are linear.
Underestimating trilinear terms In a similar fashion, Maranas and Floudas [7] derived
a tight relaxation for a trilinear term x1 x2 x3 by decomposing it into two bilinear terms,
wB = x1 x2 and wT ∈ WT = wB x3 :
wT ≥ x 1 x L
2 x3 + x1 x2 x3 + x1 x2 x3 − 2x1 x2 x3
L L L L L L L L
wT ≥ x 1 x U
2 x3 + x1 x2 x3 + x1 x2 x3 − x1 x2 x3 − x1 x2 x3
U U L U L U L L U U U
wT ≥ x 1 x L
2 x3 + x1 x2 x3 + x1 x2 x3 − x1 x2 x3 − x1 x2 x3
L L U L U L U U L L L
wT ≥ x 1 x U
2 x3 + x1 x2 x3 + x1 x2 x3 − x1 x2 x3 − x1 x2 x3
L U U L U L U L U U U
wT ≥ x 1 x L
2 x3 + x1 x2 x3 + x1 x2 x3 − x1 x2 x3 − x1 x2 x3
U L L U L U L U L L L
wT ≥ x 1 x L
2 x3 + x1 x2 x3 + x1 x2 x3 − x1 x2 x3 − x1 x2 x3
U L U U U L L U U U U
wT ≥ x 1 x U
2 x3 + x1 x2 x3 + x1 x2 x3 − x1 x2 x3 − x1 x2 x3
L U L L L U U L L L L
wT ≥ x 1 x U
2 x3 + x1 x2 x3 + x1 x2 x3 − 2x1 x2 x3
U U U U U U U U
(5.11)
where wT ∈ WT , and WT is the set of all variables which substitute trilinear terms.
on the sign of the bounds on x1 . For xL2 > 0, these equations are:
L
x1 xL
+ x1
− 1
1 ≥0
, xL
x2 U
x2 xU
2
U1 − L1 U + L1 , xL
x L x x L
x 2
1 <0
wF ≥ xx2U x2 x2 x2
(5.12)
x U
x2 + xL2 − xL2 , x1 ≥ 0
1 x1 1 U
xL1 − xL1 xU2 + xU1 , xU < 0
U U
x2 x2 x2 x2 1
where wF ∈ WF , and WF is the set of all variables which substitute fractional terms.
x1 xL xL
1 x2 xL L
1 x2 xL
1 x2
L
wF T ≥ 2
+ + − 2
xU3 xU
3 x3 xU3
x1 xL xL
1 x2 xL U
1 x2 xL
1 x2
U xL
1 x2
L
wF T ≥ 2
+ + − −
xU3 xL3 x3 xL3 xU3
x1 xU xU
1 x2 xU L
1 x2 xU
1 x2
L xU
1 x2
U
wF T ≥ 2
+ + − −
xL3 xU
3 x3 xU3 xL3
x1 xU xU
1 x2 xL U
1 x2 xL
1 x2
U xU
1 x2
U
wF T ≥ 2
+ + − −
xU3 xL3 x3 xU3 xL3
x1 xL xL
1 x2 xU L
1 x2 xU
1 x2
L xL
1 x2
L
wF T ≥ 2
+ + − −
xU3 xL3 x3 xL3 xU3
x1 xU xU
1 x2 xL U
1 x2 xL
1 x2
U xU
1 x2
U
wF T ≥ 2
+ + − −
xU3 xL3 x3 xU3 xL3
x1 xL xL
1 x2 xU L
1 x2 xU
1 x2
L xL
1 x2
L
wF T ≥ 2
+ + − −
xU3 xL3 x3 xL3 xU3
x1 xU xU
1 x2 xU U
1 x2 xU
1 x2
U
wF T ≥ 2
+ + − 2 (5.13)
xL3 xL3 x3 xL3
where wF T ∈ WF T , and WF T is the set of all variables which substitute fractional trilinear
terms. The full set of constraints for the remaining cases may be found in [1].
∑
N
i − xi )(xi − xi )
αi (xL U
L(x) = f (x) + (5.15)
i=1
| {z }
convexifying quadratic q(x)
h11 (x) h12 (x) . . . h1N (x)
h21 (x) h22 (x) . . . h2N (x)
=
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
hN 1 (x) hN 2 (x) . . . hN N (x)
Then the Hessian matrix H(x) derived from L(x) is given by
To ensure that H(x) remains positive semi-definite for all x ∈ [xL , xU ], and hence that L(x)
is convex over this domain, we must ensure that the minimum eigenvalue of H(x) over all x
is non-negative. First consider a given vector x and a vector α with identical elements, i.e.,
α = αi = αi−1 , i = 2, . . . , N . If we define λxmin (x) to be the minimum eigenvalue of Hf (x)
(please note the subscript f ) for this given x, then the corresponding minimum eigenvalue for
CHAPTER 5. DETERMINISTIC GLOBAL OPTIMISATION 81
H(x) is λxmin (x) + 2α. Thus L is convex at x if and only if λxmin + 2α ≥ 0. Now considering
all x, we seek λmin = min λxmin (x) and need α such that λmin + 2α ≥ 0, i.e.,
x∈[xL ,xU ]
{ }
1
α ≥ max 0, − λmin
2
Hence √
4x − 16x2 − 4(−12x2 + 4y 2 )
λxmin (x, y) =
2
Using interval arithmetic, we can get a lower bound on λxmin (x, y) over all x and y:
√
4(−2) − 16 × 32 − 4(−12 × 32 + 4 × 12 )
λmin ≥ = −15.833
2
A single value of α = max(0, − 21 λmin ), where λmin is the minimum eigenvalue over the set of
scalar Hessian matrices in [Hf ], i.e., it is given by:
Theorem 5.1 (Scaled Gerschgorin) For any positive vector d and a symmetric interval
matrix [Hf ], define the vector α as:
1 ∑ d
αi = max 0, − hii − |h|ij
j
(5.16)
2 di
j̸=i
where |h|ij = max{|hij |, |hij |}. Then, for all Hf ∈ [Hf ], the matrix H = Hf + 2∆ with
∆ = diag(αi ) is positive semi-definite. For simplicity, d=1 can be chosen.
1
α1 = max(0, − (−12 − 8)) = 10
2
1
α2 = max(0, − (−6 − 8)) = 7
2
This yields the following convex relaxation:
Note that using a single value of α = max(α1 , α2 ) = 10 also gives a valid convex relaxation:
Objective
function
Feasible
region
on which the relaxed problem is defined is not convex, our local solver may get stuck at a
local minimum, and produce bounds that are not valid.
Because constraints define a region, constraint types are important. For instance, x2 ≤ 0
is a convex constraint, but x2 ≥ 0 (equivalently −x2 ≤ 0) is a concave one, and must be
convexified. More generally, convexified constraints ğ(x, w) ≤ 0 always define a convex
feasible region. This is not the case however for non-linear equality constraints which always
define a non-convex (non-simply connected) region. Any equality constraint h which, after
any substitutions, still has non-linear terms, must be re-written as two inequalities of opposite
signs, i.e.,
{
h+ (x) = h(x) ≤ 0
(5.17)
h− (x) = −h(x) ≤ 0
{
h̆+ (x) ≤ 0
(5.18)
h̆− (x) ≤ 0
While the αBB functional form is very general, specialised relaxations like the McCormick
constraints may contribute to making the algorithm converge much more quickly. Thus, some
types of non-convex terms are treated in a different way. In the αBB algorithm [1] formula-
tions for a variety of frequently encountered special cases of non-convex terms are considered:
(i) bilinear, (ii) trilinear, (iii) fractional, (iv) fractional trilinear, and (v) univariate concave
terms. Combining these techniques, a full convex lower bounding problem may be formulated
4
Remember, the participating functions must be non-infinite in X, hence the objective may not have
asymptotes.
5.4. HEURISTICS IN DGO 84
P̆ : min f˘(x, w)
x∈X
w∈W
s.t. ğ(x, w) ≤ 0
hL (x, w) = 0
+
h̆ (x, w) ≤ 0
−
h̆ (x, w) ≤ 0
C(x, w) ≤ 0
(5.19)
where ˘ denotes a convexified function, hL refers to equality constraints which, after any
substitutions, only have linear terms, and C(x, w) ≤ 0 represents all additional constraints
which are added to the lower bounding problem in order to build relaxations of special non-
convex terms.
LLB branching is one of the most prevalent node selection strategies in DGO. Given the
list of all nodes which might contain the global solution, known as the branching pool, the
node with the worst (lowest) lower bound is selected for branching. If the convex relaxation
guarantees better bounds when the domain becomes smaller, like αBB does, this strategy
provides two advantages: (i) the worst current estimate of the objective value is guaranteed
to improve (or at least not become worse) at every iteration, and (ii) it is very easy to check
for convergence, because once the node with the worst bound is within convergence tolerance
of the best known local solution (the upper bound), it is guaranteed that there are no other
nodes with worst bounds (and thus, outside the convergence tolerance) in the branching pool.
5
Other, more advanced methods of subdividing nodes exist, but bisection is generally accepted as an overall
balanced choice.
CHAPTER 5. DETERMINISTIC GLOBAL OPTIMISATION 85
The least reduced axis criterion dictates that, given a node, the variable with bounds which
have been reduced the least since the solution process begun is selected for branching. This
criterion is the simplest, as it involves minimal calculations, and guarantees that the range
of all variables will keep diminishing uniformly. However, convergence using this criterion
may be very slow because it only takes the range of a variable into account: it is insensitive
to the degree to which a variable affects the overall bounds. For instance, in the expression
−x21 − x242 , x1 ∈ [−1, 1], x2 ∈ [−0.1, 0.1], the range of x2 is ten times smaller than the range
of x1 , thus, according to the least reduced axis criterion, x1 will be selected for branching.
However, branching on x2 is clearly a better choice, as the lower bound would improve
massively.
where tmax M SD is the term with the maximum separation distance, j is the jth term in the
problem, and µjt is the maximum separation distance of term j, calculated depending on its
type t. Once the term is chosen, one of its participating variables is chosen for branching
using the least reduced axis criterion.
While this strategy is more costly than the least reduced axis because dedicated calcula-
tions for each term of the problem are necessary, in a great majority of cases the additional
computational cost is a worthwhile investment. This strategy is able to guide the creation of
the tree much more intelligently because it takes the degree to which a variable affects lower
bounds into account.
The maximum separation distance strategy aims to achieve a balance between the time in-
vested in selecting the branching variable and the quality of the resulting lower bounds. How-
ever, because it takes the contribution of terms into account, rather than the contributions
of the variables themselves, variable selection is not always optimal. The most non-convex
variable branching strategy estimates a measure (weight) of the overall contribution of a
variable i to the non-convexity of a relaxation, as follows:
∑
NT ∑
Nt
i
µ = µij
t (5.21)
t=1 j=1
where t is a term type, NT is the number of different term types in the problem, j is the
jth term of type t, and Nt is the number of terms of type t. µij t is the measure of the
contribution of term j which contains variable i and belongs to the term group t. The sum of
5.4. HEURISTICS IN DGO 86
those contributions, for all terms in each term type which contain variable i, yields an overall
estimate of the contribution of that variable.
This strategy provides more holistic information about the likelihood of achieving signifi-
cantly tighter bounds by branching on a particular variable, but at increased cost: expensive
interval eigenvalue calculations will need to be performed numerous times in order to extract
this information.
Strong branching is a procedure where variables are tested for their potential to improve
problem bounds. Virtual nodes are generated (two for each candidate variable), and as many
lower bounding problems are solved. The variable which produced the best lower bound is
selected for branching. If all variables are tested, rather than the most promising subset of
them, the procedure is called full strong branching. Strong branching may result in tighter
bounds, but it is very costly because of the increased number of computationally expensive
lower bounding problems which need to be solved before each branching step. Therefore,
strong branching is typically used for the first few branching steps of the tree and then less
expensive methods are chosen to guide the branching.
the current best upper bound and simultaneously satisfy all the constraints of the problem.
Based on this information, tighter bounds may be derived.
FBBT is relatively inexpensive to perform, and is thus commonly used to tighten the
bounds of every new node. However, it does not provide any guarantee that the bounds will
be tightened, and the degree of tightening may be negligible if the interval bounds are not
very tight.
xL
i,new = min xi
x∈X
w∈W
s.t. ğ(x, w) ≤ 0
hL (x, w) = 0
+
h̆ (x, w) ≤ 0
−
h̆ (x, w) ≤ 0
C(x, w) ≤ 0
(5.22)
xU
i,new = max xi
x∈X
w∈W
s.t. ğ(x, w) ≤ 0
hL (x, w) = 0
+
h̆ (x, w) ≤ 0
−
h̆ (x, w) ≤ 0
C(x, w) ≤ 0
(5.23)
This method is significantly more expensive than FBBT, but is much more likely to result
in significant domain reduction. A common strategy is to use FBBT on every new node, and
if the problem is not found infeasible, to proceed with OBBT to achieve further domain
reduction. Nevertheless, because 2N convex optimisation problems need to be solved (two
for each variable), this procedure is applied at the first few nodes of the branch-and-bound
tree.
5.5. AVAILABLE DGO SOFTWARE 88
1. ANTIGONE
ANTIGONE [9] is a general mixed integer framework which implements algorithms for
continuous/integer global optimisation of non-linear equations. It provides numerous
features, including reformulation of user input, efficient linearisation techniques, and
specialised handling of terms of different types.
2. BARON
A general purpose solver for optimization problems with nonlinear constraints and/or
integer variables. BARON [12, 11] provides fast specialized solvers for many linearly
con- strained problems. It is based on branching and box reduction using convex and
polyhedral relaxation and Lagrange multiplier techniques.
3. Couenne
Couenne [4] is an open source branch-and-bound package for solving MINLP problems.
Couenne implements linearization, bound reduction, and branching methods.
4. LINDO Global
Branch and bound code for global optimization with general factorable constraints,
including nondifferentiable expressions. LINDO (www.lindo.com) is based on linear
relaxations and mixed integer reformulations.
5. GlobSol
Branch and bound code for global optimization with general factorable constraints,
with rigorously guaranteed results (even round-off is accounted for correctly). Glob-
Sol [5] is based on branching and box reduction using interval analysis to verify that a
global minimizer cannot be lost.
Bibliography
[1] C. S. Adjiman et al. “A Global Optimization Method, αBB, for General Twice-Differentiable
Constrained NLPs – I. Theoretical Advances”. In: Computers and Chemical Engineer-
ing 22 (1998), pp. 1137–1158.
[2] F. A. Al-Khayyal and J. E. Falk. “Jointly constrained biconvex programming”. In:
Mathematics of Operations Research 8.2 (1983), pp. 273–286.
[3] I.P. Androulakis, C. D. Maranas, and C. A. Floudas. “αBB : a global optimization
method for general constrained nonconvex problems”. In: Journal of Global Optimiza-
tion 7 (1995), pp. 337–363.
[4] Pietro Belotti et al. “Mixed-integer nonlinear optimization”. In: Acta Numerica 22
(2013), 1131.
[5] R. Baker Kearfott. “GlobSol: History, Composition, and Advice on Use”. In: Global Op-
timization and Constraint Satisfaction: First International Workshop on Global Con-
straint Optimization and Constraint Satisfaction, COCOS 2002, Valbonne-Sophia An-
tipolis, France, October 2002. Revised Selected Papers. Ed. by Christian Bliek, Christophe
Jermann, and Arnold Neumaier. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003,
pp. 17–31. isbn: 978-3-540-39901-8. doi: 10 . 1007 / 978 - 3 - 540 - 39901 - 8 _ 2. url:
http://dx.doi.org/10.1007/978-3-540-39901-8_2.
[6] C. D. Maranas and C. A. Floudas. “A global optimization approach for Lennard-Jones
microclusters”. In: The Journal of Chemical Physics 97.10 (1992), pp. 7667–7677.
[7] C. D. Maranas and C.A. Floudas. “Finding all solutions of nonlinearly constrained
systems of equations”. In: Journal of Global Optimization 7.2 (1995), pp. 143–182.
[8] G. P. McCormick. “Computability of global solutions to factorable nonconvex programs:
Part I - Convex underestimating problems”. In: Mathematical Programming 10.1 (1976),
pp. 147–175.
[9] Ruth Misener and Christodoulos A. Floudas. “ANTIGONE: Algorithms for coNTinuous
/ Integer Global Optimization of Nonlinear Equations”. In: Journal of Global Optimiza-
tion 59.2 (2014), pp. 503–526. issn: 1573-2916. doi: 10.1007/s10898- 014- 0166- 2.
url: http://dx.doi.org/10.1007/s10898-014-0166-2.
[10] A. Neumaier. “Complete search in continuous global optimization and constraint sat-
isfaction”. In: Acta numerica 13.June 2004 (2004), pp. 271–369.
[11] N. V. Sahinidis. BARON 14.3.1: Global Optimization of Mixed-Integer Nonlinear Pro-
grams, User’s Manual. 2014.
BIBLIOGRAPHY 90
Process Operability
6.1 Introduction
In previous chapters, we have focused on economic criteria for design. Here we consider the
role of operability.
Flexibility The capability of the process to maintain feasible operation under changing
conditions.
Controllability The capability of the process for a good dynamic response and stability
in the presence of external disturbances.
Reliability The capability of the process to withstand mechanical and electrical failures.
Safety The capability of the process to maintain safe operation under all required oper-
ating conditions.
Issues of flexibility and controllability arise in normal process operation, while reliability,
safety and start-up/shut-down are relevant in abnormal situations.
• External uncertainties, such as feed composition, temperature of the cooling water, etc.
6.2. PROCESS FLEXIBILITY 92
η
2
A η1 U
C(inert) k
f A B
x
C T
W
Example Consider the process depicted in Fig. 6.1. The uncertain parameters include:
Time Scales Yet other uncertainties affect process operation in the long run. For instance,
changes in product demand or specification may affect the desired setpoints, so that the
process may have to move away from its nominal design. This happens on a much larger
time scale than the external and internal uncertainties mentioned above.
The heat and mass balances and design equations take the general form
h(d, x, z, θ) = 0 (6.1)
and the process specifications (e.g. product purity ≥ 95%) and physical constraints (e.g.
outlet temperature ≤ 400K) are given by
g(d, x, z, θ) ≤ 0 (6.2)
Since the vectors h and x have the same dimension, Eq. (6.1) can (sometimes) be used to
obtain explicit expressions for x:
x = x(d, z, θ). (6.3)
Substituting Eq. (6.3) into Eq. (6.2), we get the reduced model
We can analyse feasible operation either in terms of the full model or in terms of the reduced
model.
• Is the design feasible for operation for all θ in the given range? → Flexibility Test
A key aspect in both problems is to anticipate that the controls z can be adjusted during
operation for every θ.
Design and Synthesis of Flexible Processes In this case, the design of the process is
no longer fixed.
• Determine an optimal design d that is feasible at finite points in the uncertain parameter
space → Multiperiod Design Problem
• Determine an optimal design d that is feasible for all uncertain parameters in the
specified range → Design Problem Under Uncertainty
Consider the heat exchanger network shown in Fig 6.2. The design (network topology)
has already been fixed. The control variable z is the heat load on the cooler QC . The
state variables are the temperatures T2 , T4 , T6 and T7 . The uncertain parameters are the
temperatures T3 and T5 .
These parameters have nominal values T3N =388K and T5N =583K. The expected deviation
is ±10K and therefore the parameter ranges are 378 ≤ T3 ≤ 398 and 573 ≤ T5 ≤ 593.
6.2. PROCESS FLEXIBILITY 94
620K T5
1.5 kW/K 1 kW/K
T3 T4
1 2 563K
2 kW/K
T2 T
6
313K
3 393K
3kW/K
Q
C
350K T < 323K
7
Process Model
• Temperature specifications
Exchanger 1 T2 − T3 ≥ 0 (6.9)
Exchanger 2 T6 − T4 ≥ 0 (6.10)
Exchanger 3 T7 − 313 ≥ 0 (6.11)
Exchanger 3 T6 − 393 ≥ 0 (6.12)
Exchanger 3 T7 ≤ 323 (6.13)
The inequalities (6.9) to (6.12) are physical constraints based on a minimum approach tem-
perature of 0K. Eq. (6.13) is a specification.
Reduced Process Model This linear model involves four equalities and four state vari-
ables. We can therefore construct the reduced network model by eliminating the state vari-
ables. First, solve Eq. (6.5) to (6.8) for T2 , T4 , T6 , T7 . Substitute these in Eq. (6.9) to (6.13)
CHAPTER 6. PROCESS OPERABILITY 95
f1 = T3 − 0.666QC − 350 ≤ 0
f2 = −T3 − T5 + 0.5QC + 923.5 ≤ 0
f3 = −2T3 − T5 + QC + 1274 ≤ 0 (6.14)
f4 = −2T3 − T5 + QC + 1144 ≤ 0
f5 = 2T3 + T5 − QC − 1284 ≤ 0
Flexibility Test Are the inequalities in the process model satisfied for the range of uncer-
tain parameters 378 ≤ T3 ≤ 398 and 573 ≤ T5 ≤ 593, given that QC will be adjusted for each
value of T3 and T5 ?
First we need to decide how we can establish whether the inequalities can be made feasible
at fixed values of T3 and T5 by adjusting QC . One approach is to select QC to minimise the
largest constraint value given by max{fj (QC , T3 , T5 )}. Thus we need to calculate
j
This problem is the flexibility test. If ψ(T3 , T5 ) ≤ 0, a QC can be chosen to satisfy the
reduced process model (and hence the process model). If ψ(T3 , T5 ) > 0, there is no QC which
leads to feasible operation.
The flexibility test requires the analysis of ψ(T3 , T5 ) for all (T3 , T5 ) within the specified
range, i.e. the region shown in Fig. 6.3. We must ascertain whether ψ is non-positive for all
the points in the region. Since the inequalities in this problem are linear in QC , T3 and T5 ,
the maximum value of ψ can be found simply by evaluating ψ at the vertices of this region.
This corresponds to solving the following LP problem
min u
QC ,u
s.t. u ≥ T3k − 0.666QC − 350
u ≥ −T3k − T5k + 0.5QC + 923.5
k
ψ = u ≥ −2T3k − T5k + QC + 1274 (6.17)
u ≥ −2T3 − T5 + QC + 1144
k k
u ≥ 2T3k + T5k − QC − 1284
Q ≥0 C
T5
2 1
593
583 .
573
3 4
T3
378 388 398
k T3k T5k
1 388 + 10 583 + 10
2 388 – 10 583 + 10
3 388 – 10 583 – 10
4 288 + 10 583 – 10
The results are shown in the table below. Since ψ k < 0, ∀k, the network can be operated
without any constraint violations within the specified range of parameters and has therefore
passed the flexibility test.
k ψk QC (kW)
1 -5 110
2 -5 70
3 -3.33 48.33
4 -3.33 83.33
Flexibility index The goal is to determine how large the temperature deviations for T3 ,
T5 can be while still maintaining feasible operation. Thus, we are looking for the maximum
value of δ with
388 − 10δ ≤ T3 ≤ 388 + 10δ,
(6.18)
583 − 10δ ≤ T5 ≤ 583 + 10δ,
as illustrated in Fig. 6.4. The maximum value of δ which maintains feasible operation is
called the flexibility index, F. Given that the model is linear, we only need to determine
the maximum allowable value of δ in each of the vertex directions (1,2,3,4). F is then the
minimum δ over all vertex values. Thus, we must solve the following LPs:
max δ
δ,QC
δk = s.t. fj (QC , T3 , T5 ) ≤ 0, j = 1, . . . , 5 (6.19)
T = 388 k 10δ
3 3
T5 = 583k5 10δ
CHAPTER 6. PROCESS OPERABILITY 97
T5
593
δ>1
583 .
δ=1
573 δ<1
T3
378 388 398
k δk Active constraints
1 +∞ —
2 +∞ —
3 1.5 f1 , f2
4 2.0 f2 , f5
Geometric interpretation To plot the boundary of the feasible region, we can use
the information on the active constraints. At vertex 3, f1 and f2 are active and hence
f1 = T3 − 0.666QC − 350 = 0
(6.20)
f2 = −T3 − T5 + 0.5QC + 923.5 = 0
T5
593 |
583 .
573
|
φ4
φ3
| |
T3
378 388 398
Figure 6.5: The region of feasible operation for the motivating example.
T5 f
1
593
|
583 .
573
|
| | T3
378 388 398 f2
f4
f5
Figure 6.6: Region of feasible operation for the motivating example when QC = 75kW. Note
that f3 is too relaxed to be shown on the picture.
CHAPTER 6. PROCESS OPERABILITY 99
θ2
Boundary of R defined by
ϕ (d,δ ) = 0
θ1
The process has a given design and is described by the reduced model
fj (z, θ) ≤ 0, j ∈ J (6.22)
that defines feasible operation. Nominal values of the uncertain parameters, θN , as well as
positive and negative expected deviations, ∆θ+ and ∆θ− , are given.
Flexibility test problem Halemane and Grossmann [2] studied whether the design is
feasible to operate within the range T = {θ|θL ≤ θ ≤ θU }, where θL = θN − ∆θ− and
θU = θN + ∆θ+ .
To account for the adjustments of control z, define for a given θ
To test whether feasible operation can be achieved for all θ ∈ T , it suffices to consider the
largest value of ψ(θ) over T (see Fig 6.7). We are therefore looking for
6.2. PROCESS FLEXIBILITY 100
θ1
θ2
θC
point with largest violation
θ1
or equivalently,
χ = max min max fj (z, θ) (6.27)
θ∈T z j∈J
If χ ≤ 0, the design is feasible for T and θC is the point with the smallest feasibility (largest
ψ). If χ > 0, the design is infeasible for T and θC is the point with the largest violation
(largest ψ), as illustrated on Fig. 6.8.
Flexibility index problem (Swaney and Grossmann [5]). Let the parameter range vary
so that
T (δ) = {θ|θN − δ∆θ− ≤ θ ≤ θN + δ∆θ+ }, δ ≥ 0. (6.28)
CHAPTER 6. PROCESS OPERABILITY 101
θ2
+
∆θ 2
N
θ2
-
∆θ 2
- +
∆θ 1 ∆θ 1
N
θ1 θ1
Critical point θ C
What is the largest T (δ) that the design can tolerate? This is given by the flexibility index
F such that {
max δ
F = (6.29)
s.t. T (δ) ⊆ R
Thus, a constraint of the flexibility index problem is that the flexibility test must hold for
the largest δ, i.e.
max δ
F = s.t χ = max min max fj (z, θ) ≤ 0 (6.30)
θ∈T (δ) z j∈J
T (δ) = {θ|θN − δ∆θ− ≤ θ ≤ θN + δ∆θ+ }, δ ≥ 0
1. The critical parameter θC lies at a vertex when one of the following conditions is met:
2. At the critical point, there are usually n + 1 active constraints, where n = dim{z} (see
Fig. 6.11).
6.2. PROCESS FLEXIBILITY 102
θ2
Non vertex
θC
θ1
Figure 6.10: The critical point may not be a vertex for nonconvex feasible regions
f 1=0
f =0
2
θ
θC θN
Figure 6.11: The number of active constraints at the critical point depends on the size of the
control vector
CHAPTER 6. PROCESS OPERABILITY 103
Flexibility Test
Step 2 Set χ to the maximum of all ψ k ’s. If χ ≤ 0, the process passes the flexibility test.
It fails otherwise.
• These algorithms are only suitable for a small number of uncertain parameters as there
is an exponential growth in the number of vertices (dim{θ} = p ⇒ dim{V } = 2p ).
• With the flexibility index calculations, several pieces of information can be obtained:
θN − F ∆θ− ≤ θ ≤ θN + F ∆θ+
∑ ∂f
– The sensitivity of the flexibility index to design changes. ∂F
∂di =− µj ∂dji where
j∈J
di is the ith design variable and µj is the Lagrange multiplier of the j process
constraint.
In order to avoid the 2p vertex searches and to predict non-vertex critical points, we convert
the max min max problem into a single optimisation problem (Grossmann and Floudas [1]).
which is a two-level optimisation problem. Since in general the inner minimisation problem
has n + 1 active constraints at the solution and it has n variables (θ is fixed in the inner
6.2. PROCESS FLEXIBILITY 104
f 1=0
f =0
2
f 3=0
θ
θ
f 1 , f 2 active
ψ(d ,θ) f 2 , f 3 active constraints
constraints
problem), then it has no degrees of freedom. If we knew which inequalities are active, we
could treat the problem as a system of equations. However, the set of active constraints
depends on the value of θ (Fig. 6.12). In addition, ψ(θ) is usually non-differentiable. Since
ψ(θ) is the solution of an optimisation problem, we can represent it through its Kuhn-Tucker
conditions. We use the following form of the inner problem:
{
min u
ψ(θ) = u,z (6.33)
s.t. fj (z, θ) ≤ u, j ∈ J
∑
1− µj = 0
j∈J
∑ ∂fj
µj ∂zi = 0, i = 1, . . . , n
j∈J
(6.34)
µj ≥ 0, j ∈ J
fj (z, θ) − u ≤ 0, j ∈ J
µj (fj (z, θ) − u) = 0, j ∈ J
The main difficulty is due to the complementarity conditions which implies a choice regarding
the nature of each inequality constraint: if a constraint is active, the corresponding µ is
greater than 0; otherwise, it is equal to 0. We model this decision with binary variables. For
a constraint fj (z, θ) − u ≤ 0, yj = 1 if the constraint is active and yj = 0 otherwise. We also
define a slack variable sj for constraint j where sj ≥ 0 and sj = u − fj (z, θ). Then, assuming
that n + 1 constraints must be active µj [fj (z, θ) − u] = 0 can be replaced by
sj − U (1 − yj ) ≤ 0, j ∈ J
µj − yj ≤ 0, j ∈ J
∑ (6.35)
yj = n + 1
j∈J
sj ≥ 0, j ∈ J
conditions, we get
max u
θ,u,z ,µ,s,y
∑
s.t. µj = 1
j∈J
∑
∂f
µj ∂ zj = 0
j∈J
sj + fj (z, θ) − u = 0, j ∈ J
sj − U (1 − yj ) ≤ 0, j ∈ J
χ= µj − yj ≤ 0, j ∈ J (6.36)
∑
yj = n + 1
j∈J
µj ≥ 0, j ∈ J
sj ≥ 0, j ∈ J
θL ≤ θ ≤ θU
u∈R
yj ∈ {0, 1}, j ∈ J
This problem is an MINLP. If the fj (z, θ) linear in z and θ for all j ∈ J, the problem is an
MILP. This formulation can be extended to handle the full process model rather than the
reduced process model. This is useful when the state variables cannot easily be eliminated
through manipulation of the equalities. Correlated uncertain parameters expressed as r(θ) =
0 can also be included in the formulation.
max δ
θ,δ,z ,µ,s,y
∑
s.t. µj = 1
j∈J
∑
∂f
µj ∂ zj = 0
j∈J
sj + fj (z, θ) = 0, j ∈ J
sj − U (1 − yj ) ≤ 0, j ∈ J
F = (6.37)
µj − yj ≤ 0, j ∈ J
∑
yj = n + 1
j∈J
θN − δ∆θ− ≤ θ ≤ θN + δ∆θ+
µj ≥ 0, j ∈ J
sj ≥ 0, j ∈ J
yj ∈ {0, 1}, j ∈ J
Application of active set strategy to motivating example The flexibility test problem
is given by
max u
s.t. µ1 + µ2 + µ3 + µ4 + µ5 = 1
−0.666µ1 + 0.5µ2 + µ3 + µ4 − µ5 = 1
s1 + f1 − u = 0
s2 + f2 − u = 0
s3 + f3 − u = 0
s4 + f4 − u = 0
s5 + f5 − u = 0
1000y1 + s1 ≤ 1000
1000y2 + s2 ≤ 1000
1000y3 + s3 ≤ 1000
1000y4 + s4 ≤ 1000
1000y5 + s5 ≤ 1000
−y1 + µ1 ≤ 0
−y2 + µ2 ≤ 0
−y3 + µ3 ≤ 0
−y4 + µ4 ≤ 0
−y5 + µ5 ≤ 0
y1 + y2 + y3 + y4 + y5 = 2
378 ≤ T3 ≤ 398
573 ≤ T5 ≤ 593
µj ≥ 0, sj ≥ 0, yj ∈ {0, 1}, j = 1, . . . , 5
The solution of the MILP is u = −3.33 (as with the vertex enumeration algorithm). At
the solution, y1 = y3 = y4 = 0 and y2 = y5 = 1: f2 and f5 are the limiting constraints.
µ2 = 0.667 and µ5 = 0.333. T3C = 378K and T5C = 573K.
Solution strategy in the nonlinear case One can solve the problem as an MINLP or
use an active set strategy.
1. Let us assume that the constraint functions fj are monotonic in z, we can identify
candidate active sets a priori from the following conditions:
∑ ∂f
µj ∂ zj = 0
j∈J
µj − yj = 0, j ∈ J
∑
yj = n + 1.
j∈J
Since the sign of the partial derivative of fj with respect to z is constant, the set of
nonzero µj (and therefore active constraints) must contain both positive and negative
derivatives.
H2 450C H1 310C
2 kW/C F H1
C2 t
1 1 2 290C
2 kW/C
115C
t
2
C1
3 120C
3kW/C
Q 40C
C
280C t < 50C
3
ua = max u
θ,u,z (6.38)
s.t. fj (z, θ) = u, j ∈ JAa
3. Set χ = max{ua }.
a
Consider the example from Saboo and Morari [4] shown in Fig. 6.13. The uncertain
parameter is FH1 with 1 ≤ FH1 ≤ 1.8 kW/C. The control variable QC . The constraints are
∑ ∂f
After eliminating t1 , t2 and t3 , set µj ∂ zj = 0:
j∈J
Since there is one control variable, the active set is expected to contain two constraints. Given
that the coefficient of µ4 is the only negative coefficient, there are three possible combinations
all involving f4 : (f1 , f4 ), (f2 , f4 ) and (f3 , f4 ). In general, the number of candidate sets
m!
is smaller than the maximum number of assignments as given by (n+1)!(m−n−1)! (m is the
number of inequalities) because of the non-negativity constraints on µ and s.
Consider the flexibility test for 1 ≤ FH1 ≤ 1.8kW/o C.
CHAPTER 6. PROCESS OPERABILITY 109
Q
C
f1
(z)
f2
300 _ f3
f4
200 _
100 _
| | F H1 (θ)
1.8
1
Active constraints u
1,4 5.108
2,4 -31.67
3,4 -5
Consider an example taken from Swaney and Grossmann [5] of two networks (Fig. 6.15).
There are 12 uncertain parameters: 4 flowrates, 4 inlet temperatures and 4 fouling resis-
tances. Although the two designs meet the same specifications at the nominal values of the
parameters, the flexibility index of the first network is 0.06 and that of the second network
is 0.816.
(a) F = 0.06
C4
(20kW/C-+ 2
115C-+ 5) CW
H2 1 <280C
4
+
(20kW/C- 2
450C-+ 5) C3
(30kW/C -+ 3
40C -+ 5)
H1 2 3 <50C
(10kW/C-+ 1
310C-+ 5)
>290C <120C
(b) F=0.816
>290C
CW
H2 2 <280C
4
(20kW/C-+ 2
450C-+ 5) C3
(30kW/C -+ 3
40C -+ 5)
H1 1 3 <50C
(10kW/C-+ 1
310C-+ 5)
C4
(20kW/C-+ 2 <120C
115C-+ 5)
H
480K
440K
C1 1 500K
420K
Steam
440K
C2 2 >430K
385K
<410K
Is overdesign always required for flexibility? Consider the example in Fig. 6.16. The
flowrate-heat capacities of the streams are 15kW/K for H, 30kW/K for C1 and 10kW/K
for C2 . The heat transfer coefficients are U1 = U2 =800W/m2 K. The minimum approach
temperature is 10K. Assume we overdesign the heat exchange areas A1 and A2 by 20%.
If U1 deviates by +20% and U2 by −20%, however, the operation of the network becomes
infeasible. In fact, there are two alternative ways to overdesign the heat exchange areas to
ensure operational feasibility under these circumstances:
1. Overdesign A1 by 20% and A2 by 108%.
Mathematical formulation
∑
N
min c(d) + Wi Φi (d, z i , θi )
d,z 1 ,...,z N i=1
(6.41)
s.t. (d, z i , θi ) ≤ 0, i = 1, . . . , N
g(d) ≤ 0
6.2. PROCESS FLEXIBILITY 112
g(d)g1 . . . . . . . . . . .g . . .
d M
0000000
1111111
0000000
1111111
d 1111111
0000000
11111111
0000000
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
. 0000000
1111111
. 0000000
1111111
0000000
1111111
0000000
1111111 000000
111111
. 0000000
1111111 000000
111111
000000
111111
. 000000
111111
. 000000
111111
000000
111111
000000
111111
. 000000
111111
000000
111111 000000
111111
. 000000
111111
000000
111111
000000
111111
000000
111111
dN 000000
111111
000000
111111
• The problem structure (block angular) can be exploited in efficient decomposition pro-
cedures (Fig 6.17).
• The formulation can be applied to design problems concerned with equipment sizing or
with synthesis.
Step 1 Select n0 points θi in range T = {θ|θL ≤ θ ≤ θU }, for instance the nominal point
and some expected critical points. Assign a weight Wi to each point θi . Set iteration
counter k = 0.
Step 3 Perform a flexibility analysis on the resulting design (flexibility test or flexibility
index).
Design with Explicit Flexibility Constraints (Pistikopoulos and Grossmann [3]). The
basic idea is to perform a prior analysis on the active set and to generate explicit constraints
for flexibility. Consider the following case:
1. fj (d, z, θ), j ∈ J linear in d, z and θ.
Property 2 At the optimal solution, the Lagrangian is equal to ψ a (d, θ), i.e. ψ a (d, θ) =
∑ a
µj fj (d, z, θ).
a
j∈JA
Property 3 The critical points θCa for each active set a can be determined a priori and
a
are independent of d (because ψ a (d, θ) is linear in θ). If ∂ψ Ca = θ U . If
∂θi > 0, then θi i
∂ψ a
∂θi < 0, then θiCa = θiL .
From these properties, the design problem reduces to an MILP problem with explicit
flexibility constraints.
min c = αT y + β T d
d,y
∑ a
s.t. µj fj (d, z, θCa ) ≤ 0, a = 1, . . . , N AS
a
j∈JA
(6.44)
di − U yi ≤ 0, i = 1, . . . , r
d≥0
y ∈ {0, 1}r
The design procedure can be outlined as follows.
6.2. PROCESS FLEXIBILITY 114
Step 1 Identify N AS active sets JA and obtain µaj , j ∈ JAa by solving the linear equations
∑
µj = 1
a
j∈JA
∑ ∂f
µj ∂ zj = 0
a
j∈JA
∑ ∂ψ a
Step 2 Set ψ a (d, θ) = µaj fj (d, z, θ) and determine θCa by analysing the sign of ∂θ .
a
j∈JA
f1 = −z + θ ≤ 0
f2 = z − 2θ + 2 − d ≤ 0
1≤θ≤2
The procedure can be extended to nonlinear constraints: the design formulation becomes
an MINLP of the form
min c = αT y + β T d
∑ a,l
s.t. µj fj (d, z, θa,l ) ≤ 0, k = 1, . . . , N AS, l = 1, . . . , L
a
j∈JA
di ≥ 0, i = 1, . . . , r (6.45)
di − U yi ≤ 0, i = 1, . . . , r
yi ∈ {0, 1}, i = 1, . . . , r
where L is the number of points at which each active set is evaluated. The algorithm is
χa (d) = max u
u,z ,θ
s.t. fj (d, z, θ) = u, j ∈ JAa
θL ≤ θ ≤ θU
Step 3 Set up and solve the MINLP for d. Set L = L + 1 and return to step 2.
1. Obtain a parametric curve for cost vs. flexibility index by formulating the design
problem in terms of F (see Fig 6.18).
2. Assuming probability distribution functions for the uncertain parameters θ, one can
maximise the expected revenue and determine the optimal flexibility (see Fig 6.19).
CHAPTER 6. PROCESS OPERABILITY 115
Cost
expected revenue
profit
F
F*
cost
Figure 6.19: Typical expected revenue, profit and cost vs. flexibility curves
Bibliography
[1] I.E. Grossmann and C.A. Floudas. “Active constraint strategy for flexibility analysis in
chemical processes”. In: Computers & Chemical Engineering 11.6 (1987), pp. 675 –693.
issn: 0098-1354. doi: http://dx.doi.org/10.1016/0098-1354(87)87011-4. url:
http://www.sciencedirect.com/science/article/pii/0098135487870114.
[2] K. P. Halemane and I. E. Grossmann. “Optimal process design under uncertainty”.
In: AIChE Journal 29.3 (1983), pp. 425–433. issn: 1547-5905. doi: 10 . 1002 / aic .
690290312. url: http://dx.doi.org/10.1002/aic.690290312.
[3] E.N. Pistikopoulos and I.E. Grossmann. “Optimal retrofit design for improving pro-
cess flexibility in linear systems”. In: Computers & Chemical Engineering 12.7 (1988),
pp. 719 –731. issn: 0098-1354. doi: http://dx.doi.org/10.1016/0098- 1354(88)
80010-3. url: http://www.sciencedirect.com/science/article/pii/0098135488800103.
[4] Alok K. Saboo and Manfred Morari. “Design of resilient processing plantsIV”. In:
Chemical Engineering Science 39.3 (1984), pp. 579 –592. issn: 0009-2509. doi: http:
//dx.doi.org/10.1016/0009-2509(84)80054-8. url: http://www.sciencedirect.
com/science/article/pii/0009250984800548.
[5] R. E. Swaney and I. E. Grossmann. “An index for operational flexibility in chemi-
cal process design. Part I: Formulation and theory”. In: AIChE Journal 31.4 (1985),
pp. 621–630. issn: 1547-5905. doi: 10.1002/aic.690310412. url: http://dx.doi.
org/10.1002/aic.690310412.