You are on page 1of 118

Advanced Process Optimisation

Claire S. Adjiman

Department of Chemical Engineering


Centre for Process Systems Engineering
Imperial College London
London SW7 2AZ

January 2017
Chapter 1

Introduction to Advanced Process


Optimisation

1.1 Decision-making in Chemical Engineering


1.1.1 Decision-making as an Optimisation Problem
Engineers are concerned with designing and operating processes, products and other complex
systems that best meet a plethora of criteria and help to achieve the goals of sustainable
development. This requires making choices between different options, often in the face of
uncertainty. These are highly complex problems with many facets, and delicate trade-offs
between different choices. Some examples of decision-making at different levels include:

Equipment design Find the optimum size and operating conditions of a unit operation
to achieve desired production (including quality constraints). This may include detailed
design aspects such as the shape of mixer blades.

Process synthesis Process design is the set of activities which take us from the decision
to develop a process — in order to increase company profits — to its construction and
operation. Synthesis is the conversion of an abstract description into a more concrete
one through the consideration of alternatives. Thus, process synthesis is one of the
many aspects of process design. It may be define as “the systematic development of
process flowsheet(s) that transform the available raw materials into the desired products
and which meet the specified performance criteria of (a) maximum profit or minimum
cost, (b) energy efficiency, (c) good operability [...]” ([2]).

Parameter estimation Given a model (a set of mathematical equations) to represent a


physical reality (e.g., a mixture, a reactor, a heat exchanger), find the parameters that
give the best match of existing experimental data. Here, “best” may include measures of
statistical significance/confidence, as well as goodness of fit. Related activities include
experiment design and model identifiability.

Optimal control Identify the set of control actions that will allow to achieve the target
process behaviour in the best way.
1.1. DECISION-MAKING IN CHEMICAL ENGINEERING 2

Molecular design Identify the structure of the molecule(s) that make it possible to
maximise performance.

Due to the complexity of these problems, it is important to develop a deep understand-


ing of decision-making and to adopt/develop systematic tools for their solution. There are
many approaches to this, including heuristics, hierarchical decision-making (e.g., [1]) and
mathematical optimisation.
Mathematical optimisation plays an important role in decision-making. It can help to
develop a better understanding of the design problem because the translation of a sometimes
ambiguous problem into mathematical language helps to pose the problem and narrow its
scope. It can also help to identify good (ideally best) solutions and to identify alternatives
— it would be extremely tedious and time-consuming to do this by hand.

1.1.2 A Process Synthesis Example


Two feed streams must be preheated from 300o C to 480o C before entering a reactor. The
effluent stream must be cooled from 500o C to 400o C. What is the best heat exchanger network
configuration, given steam, hot water and cooling water as available utilities?

Criterion: Minimise cost = annualised capital cost + annual operating cost.

1.1.2.1 Assessing Some Alternative Flowsheets

Three alternatives are considered in Figs 1.1 to 1.3. The first represents the base case which
makes use of the steam and cooling water utilities. It has a total annualised cost of $90220/yr,
with an investment cost of only $5520/yr and utilities of $84700/yr. In the second, the need
for cooling water has been eliminated and the need for steam reduced by transferring heat
from the effluent to the feed streams. The capital increases slightly to $5880/yr and the
utilities decrease to $36720/yr. This corresponds to a 50% reduction in total cost. The third
alternative results from splitting the effluent stream. In this case, only the hot water utility is
required. The investment cost more than doubles to $12200/yr, but the utilities fall further
to $5900/yr. This corresponds to an 80% reduction from the base case.
This example shows that integration can bring about great benefits. However, the number
of alternatives is very large. How can we identify the optimal process configuration reliably?

C1
300 oC 480 oC
500 oC H1
Reactor
400 oC
C2
300 oC 480 oC

Figure 1.1: Alternative 1, with an annualised cost of $90220/yr.


CHAPTER 1. INTRODUCTION TO ADVANCED PROCESS OPTIMISATION 3

C1
300 oC 480 oC
440 oC 500 oC
Reactor
(steam)
C2
300 oC 360 oC 480 oC
400 oC

H1

Figure 1.2: Alternative 2, with an annualised cost of $42600/yr.

C1
300 oC 480 oC
500 oC
Reactor
(hot water)
C2 480 oC
300 oC
440 oC 320 oC

400 oC
H1

Figure 1.3: Alternative 3, with an annualised cost of $18100/yr.

1.1.2.2 Algorithmic Approach

Provided that sufficiently reliable models of the process to be optimised are available, the
process synthesis problem may be formulated as an optimisation problem. The following
three steps are taken:

1. Postulate a superstructure of alternatives, i.e. a structure which embeds all possible


designs simultaneously.

2. Formulate an optimisation model for the superstructure.

3. Solve the optimisation problem to “extract” the optimum configuration.

The solution of the optimisation problem gives the optimum configuration as well as the
optimum operating parameters. In other words, both discrete and continuous decisions are
made simultaneously.
The systematic framework of the algorithmic approach accounts for nonlinear interactions
(capital cost, raw material costs, . . .). However, solution of such problems is limited by
currently available optimisation technology — which is constantly improving.

1.1.3 A Capacity Planning Example


We need to determine the daily capacity required for two plants FA and FB which will be
able to manufacture two products, P1 and P2 . The forecasted maximum demand for P1 is
1.1. DECISION-MAKING IN CHEMICAL ENGINEERING 4

U1 (kg/yr) and that for P2 is U2 (kg/yr). The profit for each kg of product sold depends
on the product value and the plant running costs which are a function of plant location and
production rate. The profit is given by Sij = S(Mij , Fi , Pj ), where Mij is the production rate
for product Pj at plant Fi in kg/day, i = {A, B} and j = {1, 2}.

1.1.3.1 Decisions to be Made

What should the capacity of each plant be in kg/day to maximise profits and not exceed
market demand?

1.1.3.2 Methodology/Issues

• What factors should be considered?

• What are the variables to be optimised?

• How do we use the above information to mathematically formulate the optimisation


problem?

Step 1 Define the variables (decisions)


tA1 , tA2 , tB1 , tB2 : number of days per year each plant produces each material.
MA1 , MA2 , MB1 , MB2 : production rate for each product in each plant (kg/day).

Step 2 Identify decision criterion. Objective function: annual profit


f (Mij , tij ) = tA1 MA1 SA1 + tA2 MA2 SA2 + tB1 MB1 SB1 + tB2 MB2 SB2

Step 3 Model the system through equality and inequality constraints.


A year has 365 days:
tA1 + tA2 = 365
tB1 + tB2 = 365
Runtime (tA1 , tA2 , tB1 , tB2 ) must always be non-negative:

tAj ≥ 0 j = 1, 2
tBj ≥ 0 j = 1, 2

Production rate (MA1 , MA2 , MB1 , MB2 ) must always be non-negative:

MAj ≥ 0 j = 1, 2
MBj ≥ 0 j = 1, 2

Limitation on total amount of product 1:

tA1 MA1 + tB1 MB1 ≤ U1

Limitation on total amount of product 2:

tA2 MA2 + tB2 MB2 ≤ U2


CHAPTER 1. INTRODUCTION TO ADVANCED PROCESS OPTIMISATION 5

Management

Allocation & Operations


Design
scheduling
Individual
equipment

Figure 1.4: Interaction between levels of optimisation

1.1.3.3 Summary: Mathematical Formulation of the Optimal Scheduling Prob-


lem

max tA1 MA1 SA1 + tA2 MA2 SA2 + tB1 MB1 SB1 + tB2 MB2 SB2
MA1 ,MA2 ,MB1 ,MB2 ,tA1 ,tA2 ,tB1 ,tB2
tA1 + tA2 = 365
tB1 + tB2 = 365
tAj ≥ 0 j = 1, 2
tBj ≥ 0 j = 1, 2
MAj ≥ 0 j = 1, 2
MBj ≥ 0 j = 1, 2
tA1 MA1 + tB1 MB1 ≤ U1
tA2 MA2 + tB2 MB2 ≤ U2
The outcome of the above optimisation problem are the solution vectors M and t.

1.1.4 Scope of Optimisation


Process optimisation involves two types of decisions: discrete decisions (e.g., whether a
unit exists, or a chemical species appears in the process) and continuous decisions (e.g.,
the size and operating conditions for a unit, amount of a species).
Optimisation can take place at many levels in a company ranging from a complex com-
bination of plants and distribution facilities down to individual plants, combination of units,
individual pieces of equipment, subsystems in a piece of equipment.
In a typical industrial company, there are three main levels of optimisation:

1. Management

2. Process design and equipment specification

3. Plant operations

As shown in Fig. 1.4, these levels are not independent. However, at each level, optimisation
is used to make different types of decisions:

1. Management: project evaluation, product selection, new plant construction, corporate


budget.
1.2. FORMULATING AN OPTIMISATION PROBLEM 6

2. Process Design and Equipment Specification: choice of a process, choice of nominal


operating conditions, configuration of plant, optimum size of units.

3. Plant Operations: allocation of raw materials, utility consumption, scheduling.

These differences are reflected in different time scales and different model granularities.
Furthermore, all of these activities require models and parameters, and therefore model
building activities. This often requires the solution of optimisation problems (e.g, parameter
estimation, experiment design).
Better solutions can usually be found by increasing:

• the number of decisions we consider simultaneously (i.e. the boundaries of the problems)

• the level of detail we use.

1.2 Formulating an Optimisation Problem


1.2.1 Models for Decision-Making
When a process design problem is framed as an optimisation problem, the process is repre-
sented by some equations derived from first principles or correlated from experimental data.
Depending on the complexity of the process model and the required level of detail, models
may require

• the solution of a system of linear equations,

• the solution of a system of nonlinear equations,

• the solution of a system of differential algebraic equations,

• the solution of a system partial differential algebraic equations.

We will use algebraic models in this course, i.e., systems of linear or nonlinear equations.
The mathematical representation of superstructures is also an issue we will con-
sider. We will see how alternatives can be expressed through integer and binary variables.

1.2.2 The Nature of Optimisation


Typically, a single performance criterion is chosen to guide the decisions, such as minimum
cost. There is also growing interest in multi-objective optimisation, where several targets
are to be minimised/maximised but one does not wish to make an a priori choice about the
relative importance or weighting of these two targets.
The goal of optimisation is to find the values of the variables in the process that yield
the best value of the performance criterion. When costs are minimised, the optimum is
usually a trade-off between capital and operating costs. Rigorous optimisation techniques
(mathematical programming) are concerned with selecting the best among a set of many
solutions by efficient numerical methods.
CHAPTER 1. INTRODUCTION TO ADVANCED PROCESS OPTIMISATION 7

1.2.3 Superstructures and Optimisation


A set of alternative designs can be represented through a model which combines the physics
of the system with logical relations, all expressed as algebraic equations. To determine the
optimal discrete and continuous decisions, we must solve the following general problem:

Minimise/Maximise design criterion


subject to
Physical constraints
Logical relations

We will study methods to formulate and solve optimisation problems with binary and con-
tinuous variables in a later chapter. To start with, we focus on problems with continuous
variables only.
Bibliography

[1] J. M. Douglas. Conceptual Design of Chemical Processes. New York: McGraw Hill,
1988.
[2] C. A. Floudas. Nonlinear and Mixed-Integer Optimization: Fundamentals and Applica-
tions. Oxford: Oxford University Press, 1995.
Chapter 2

Nonlinear Optimisation: The Basics

2.1 Motivation: Parameter estimation


The first task in decision-making in engineering is to build a model of the system. This
requires

• the selection of an appropriate form of mathematical equations which normally contains


some adjustable parameters

• the design and realisation of some experiments.

• a parameter estimation phase, whose goal is the following

Find values of the model parameters which give the best fit between the model and the
experimental results

It is important to carry out a statistical analysis of the results to ensure that the model
is meaningful. This is however outside the scope of this course and will not be described
here.

Parameter estimation itself requires the following inputs:

• a postulated mathematical model,

• measured variables (experimental data),

• controlled variables,

• a measure of the quality of the fit.

2.1.1 A simple example


A model providing the diffusion coefficient of para-hydroxybenzoic acid in water as a function
of temperature is needed. The following experimental data are available.
2.1. MOTIVATION: PARAMETER ESTIMATION 10

12

Diffusion coefficient 10 cm /s
2
10

6
8

4
10 20 30 40
o
Temperature C

Figure 2.1: Experimental data and fitted line for diffusion coefficient of para-hydroxybenzoic
acid in water vs. temperature.

Experiment T (o C) D (106 cm2 /s)


1 15.0 5.52
2 17.4 6.14
3 19.9 6.65
4 30.5 8.58
5 40.0 10.33

D is the measured variable and T is the controlled variable.


Based on the plot of these data (see Fig. 2.1), it seems that a straight line would provide
a good fit. This is represented mathematically as

D(T ) = θ1 T + θ2 (2.1)

where θ1 and θ2 are two parameters that we need to estimate.


We define a measure of the quality of the fit (least-squares) as


5
(D(Tk ) − Dk )2 (2.2)
k=1

where Dk is the measured value of the diffusion coefficients in the kth experiment, Tk is the
measured value of the temperature in the kth experiment and D(Tk ) is obtained by evaluating
Eq. (2.1) at Tk .
Then, the best fit according to this measure is obtained by solving the following optimi-
sation problem
∑5
min (D(Tk ) − Dk )2
θ1 ,θ2 k=1
(2.3)
s.t. D(Tk ) = θ1 Tk + θ2 , k = 1, . . . , 5
θ1 ∈ R, θ2 ∈ R
The smaller the quality measure, the better the fit. This type of problem is called a least-
squares problem. The solution of this problem gives θ1 = 0.1889 and θ2 =2.805 with a “quality
measure” of 0.0249. This corresponds to the line plotted in Fig. 2.1.
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 11

• How was this solution obtained?

• Is 0.0249 really the smallest value of the quality measure?

• How can we deal with larger and more complex problems?

2.1.2 General formulation


Let
• θ represent the vector of parameters which appear in the model,

• x represent the vector of controlled variables,

• y represent the vector of measured variables,

• ŷ represent the measured data points,

• h(y, θ; x) = 0 represent the set of model equations,

• f represent the measure of the quality of the fit.


Then, the problem of estimating the parameters can be posed as
min f (y; ŷ k )
θ,y
s.t. hk (y, θ; xk ) = 0, k = 1, . . . , m (2.4)
θ ∈ Rp , y ∈ Rn
where m is the number of experiments, n is the number of measured variables, p is the number
of model parameters. The vector ŷ k represents the values measured in the kth experiment.
The problem has the following characteristics
• It is multi-dimensional.

• It involves constraints.

• Its objective function is nonlinear.

• Its constraints, the model equations h(y, θ) = 0, may be nonlinear.


The rest of this chapter is devoted to the solution of problems of this type.

2.2 Some Features of Optimisation Problems


2.2.1 General formulation

Minimise f (x) Objective function


subject to h(x) = 0 Equality Constraints
g(x) ≤ 0 Inequality Constraints
x ∈ X ⊆ Rn
The inequality constraints can take the form of bound constraints such as xL − x ≤ 0
and x − xU ≤ 0.
2.3. CLASSIFICATION OF OPTIMISATION PROBLEMS 12

x2 Nonlinear inequality
constraints

Linear
Feasible equality
region constraint

Optimal solution
max x12 + x22
x1
Linear
inequality constraints
Optimal solution
min x12+ x22

Figure 2.2: General features of a two-dimensional optimisation problem

2.2.2 Definitions
1. Degrees of freedom Let DIM{h} = m and DIM{x} = n. If n > m then there
are n − m degrees of freedom. The number of degrees of freedom is the number of
decision variables. We want to select values of the decision variables to optimise a
scalar objective function.

2. Feasible Solution Values of the variable vector that satisfy equality and inequality
constraints. Any x ∈ Rn : h(x) = 0, g(x) ≤ 0.

3. Feasible Region The set of feasible solutions, F = {x ∈ Rn |h(x) = 0, g(x) ≤ 0}.

4. Optimal Solution A feasible solution that provides the optimal value for the objective
function. For a minimisation problem, an optimal solution is any x∗ ∈ F : f (x∗ ) ≤
f (x), ∀x ∈ F .

Figure 2.2 illustrates these concepts for a problem with two variables.

2.3 Classification of Optimisation Problems


2.3.1 Problem structure
By convention, we frame optimisation problems as minimisations.

min f (x)
x
s.t. h(x) = 0 (2.5)
g(x) ≤ 0
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 13

The mathematical structure of the objective function and the constraints affects the way in
which the problem can be solved. The main classes of optimisation problems are:

Linear Programming (LP) f (x) is a linear function; h(x) and g(x) are linear func-
tions.

Non-linear Programming (NLP) At least one function in the problem (objective or


constraint) is non-linear. There exist many subclasses of NLPs for which specific solu-
tion techniques exist.

• Are there any constraints? If not, the problem is an unconstrained NLP. Oth-
erwise, it is a constrained NLP.
• Are the functions continuous? differentiable? twice-differentiable?
• Are the non-linearities arbitrary or do they follow certain patterns? See for in-
stance linearly constrained quadratic programming where f (x) = xT Qx and h(x)
and g(x) are linear functions.

Mixed-integer (Linear or Non-linear) Programming Problems (MILP, MINLP)


A subset of the variable vector is confined to binary or integer variables, i.e. for some
p < n, xi ∈ R, i = 1, . . . , p and xi ∈ N, i = p + 1, . . . , n. We will study this type of
problem in the next chapter.

2.3.2 Examples

2.3.2.1 Optimal Design of a Cooler

A stream with flowrate 75,000 kg/yr must be cooled from T0 =350o C to Tf =120o C. The
available utility is cooling water which enters the cooler at t0 =30o C and must leave at a
maximum of 60o C (tmax ). Its cost is cW (in £/kg). The cooler has a heat transfer coefficient
of U (in J/m2 K) and an annualised cost of cA (in £/m2o C). Design a cooler with minimum
annualised cost.

1. Identify the decision variables.

2. Write the formulation in general form (using symbols for the data). It is good practice
to retain general equations as much as possible.

3. How many degrees of freedom are there?

4. How can this problem be classified?

• Variables: Q, A, W , tW → Trade-off between A and W (why?)


2.4. BASIC CONCEPTS OF OPTIMISATION 14

o
W 30 C

75000 kg/yr
A
o 120 oC
350 C Q
A: Area
tW Q: Heat load

Figure 2.3: Optimal design of a cooler

• Formulation:

min C = cA A + cW W
Q,A,W,tW 
Q − F CP (T0 − Tf ) = 0 



Q − W CP W (tW − t0 ) = 0
s.t (T −t )−(T −t ) Equality constraints
Q − U A 0 (WT −t f ) 0 = 0 

ln T0 −tW 
f 0 
tW − tmax ≤ 0 
t0 − tW ≤ 0 Inequality constraints


Q, W, A ≥ 0

2.4 Basic Concepts of Optimisation


Issues to be addressed:

• How are optima (minima) characterised mathematically?

• How do we identify a point that satisfies these conditions?

• Is this point unique?

2.4.1 Local and Global Minima


Recall the definition of an optimum solution x∗ . In fact, if the property f (x∗ ) ≤ f (x)
is satisfied for all x in the feasible region, x∗ is a global minimum. If this condition is
satisfied for all x in a neighbourhood of x∗ , x∗ is a local minimum, as shown in Fig. 2.4.
Finally, if the inequality holds strictly, x∗ is called a strong minimum, while it is called a
weak minimum otherwise.

2.4.2 Convex and Concave Functions


2.4.2.1 Definition

A function f (x) is convex over a region R (Fig. 2.5) if and only if for any two different
values x1 , x2 lying in the region R,

f (αx1 + (1 − α)x2 ) ≤ αf (x1 ) + (1 − α)f (x2 ), ∀α ∈ [0, 1]. (2.6)


CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 15

Weak local minimum


Strong local minimum
Global minimum

Figure 2.4: Types of minima

f(x) f(x)

f(x2)
f(x 1)

f(x2)
f(x 1)

x1 x2 x x1 x2 x

Figure 2.5: Examples of convex functions

A function f (x) is concave over R (Fig 2.6) if and only if, for any two points x1 , x2 lying
in the region R,

f (αx1 + (1 − α)x2 ) ≥ αf (x1 ) + (1 − α)f (x2 ), ∀α ∈ [0, 1]. (2.7)

2.4.2.2 The role of convexity in optimisation

Convexity is a useful concept in optimisation because a convex function has at most one
(global) minimum. Thus, once we know how to characterise and identify this point, we do

f(x)
f(x2)

f(x 1)

x1 x2 x

Figure 2.6: Example of a concave function


2.4. BASIC CONCEPTS OF OPTIMISATION 16

not need to worry about its uniqueness. It should be noted that convexity is only a sufficient
condition for the uniqueness of the minimum. Consider the following classes of functions:

Quasi-convex functions A function f (x) is quasi-convex over R if

f (αx1 + (1 − α)x2 ) ≤ max{f (x1 ), f (x2 )}, ∀α ∈ [0, 1], ∀x1 , x2 ∈ R (2.8)

Can you show that any convex function is quasi-convex?

Pseudo-convex functions A differentiable function f (x) is pseudo-convex over the do-


main R if for every x1 , x2 ∈ R such that f (x1 ) < f (x2 ), we have

(x1 − x2 )T ∇f (x2 ) ≤ 0 (2.9)

All differentiable convex functions are pseudo-convex.

Invex functions A differentiable function f (x) is invex over R if there exists a vector
function η(x1 , x2 ) such that

f (x1 ) − f (x2 ) ≥ η(x1 , x2 )T ∇f (x2 ), ∀x1 , x2 ∈ R (2.10)

All differentiable convex functions are invex. What function η(x1 , x2 ) satisfies the above
condition for all differentiable convex functions?
A differentiable function has a unique minimum if and only if it is invex. Invexity is a
necessary and sufficient condition for uniqueness of the minimum.
Floudas [2] provides a discussion of quasi and pseudo convexity. Many other generalisa-
tions of convexity have been proposed in the literature.

2.4.2.3 Testing for Convexity

From now on we focus on twice-continuously differentiable functions unless otherwise stated.


The second-order derivatives of a function are closely linked to its convexity properties.

Examples

• The function f (x) = 2x2 is strictly convex. Note that f ′′ (x) = 4 > 0.

• The function f (x) = 2x2 − x3 is not convex. f ′′ (x) = 4 − 6x may be positive or negative
depending on the value of x.

Definition The matrix H(x) of second-order derivatives of f (x) is called its Hessian matrix,
H(x) = ∇2 f (x). A Hessian matrix is always symmetric.

Example Let f (x1 , x2 ) = h11 x21 + h12 x1 x2 + h22 x22 .Then,


 2 2
 [ ]
∂ f (x1 ,x2 ) ∂ f (x1 ,x2 )
2h11 h12
H(x1 , x2 ) = ∇ f (x1 , x2 ) =  ∂x21 =
2 ∂x1 ∂x2
∂ 2 f (x1 ,x2 ) ∂ 2 f (x1 ,x2 ) h12 2h22
∂x1 ∂x2 ∂x22
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 17

Necessary and sufficient conditions for convexity For univariate functions,


• f (x) is convex ⇔ f ′′ (x) ≥ 0

• f (x) is strictly convex ⇔ f ′′ (x) > 0.


For multivariate functions, f (x) is strictly convex ⇔ z T H(x)z > 0, ∀x, ∀z ̸= 0.
Can you prove this?

Properties of Hessian matrices


• If z T H(x)z > 0, H(x) is said to be positive definite.

• H(x) is positive definite if and only if all its eigenvalues are positive.

Example Is f (x1 , x2 ) = 2x21 + 2x1 x2 + 1.5x22 + 7x1 + 8x2 + 24 convex in R2 ?


1. Calculate its Hessian matrix:
( )
∂ 2 f (x1 , x2 ) ∂ 2 f (x1 , x2 ) ∂ 2 f (x1 , x2 ) 4 2
= 4; = 3; = 2 ⇔ H(x1 , x2 ) =
∂x21 ∂x22 ∂x1 ∂x2 2 3

2. Calculate its eigenvalues:


( )
4−λ 2
H − λI =
2 3−λ
( )
4−λ 2
det[H − λI] = 0 ⇔ det =0
2 3−λ
⇔ λ2 − 7λ + 8 = 0
Therefore, λ1 = 5.56 > 0 and λ2 = 1.44 > 0.

f (x) is strictly convex.

2.4.3 Convex Optimisation Problems


An optimisation problem is convex (and therefore has a unique minimum) if the objective
function and the feasible region are convex. How do we test convexity for a region?

Definition A region R is convex if and only if for any x1 , x2 ∈ R, the point X = αx1 +
(1 − α)x2 is such that X ∈ R, ∀α ∈ [0, 1].

Convex Feasible Region Recall that the feasible region is defined by F = {x ∈ Rn |h(x) =
0, g(x) ≤ 0}.
Sufficient condition for convexity: If the equality constraints h(x) are linear and the
inequality constraints g(x) are convex functions then F is convex.
The nature of the search region has an important bearing on the potential for obtaining
suitable results in optimisation, as shown in Fig. 2.8. A problem with a convex objective
function and a convex feasible region has a unique global minimum.
2.5. UNCONSTRAINED OPTIMISATION 18

(a) (b)

x x

Figure 2.7: Examples of (a) convex and (b) nonconvex regions

2.5 Unconstrained Optimisation


How do we locate a minimum of the following problem?

min f (x) (2.11)


x∈Rn

2.5.1 Necessary and Sufficient Conditions for a Minimum


2.5.1.1 Necessary Conditions

If f (x) has an extremum at x∗ , then x∗ is a stationary point


 ∂f 
∂x1
 
 . 
∗  
∇f (x ) = 0 or  . =0 (2.12)
 
 . 
∂f
∂xn

2.5.1.2 Sufficient Conditions

If the Hessian matrix H(x) of f (x) is such that xT H(x∗ )x > 0, ∀x ̸= 0, the stationary point
x∗ is a strong local minimum.

Examples
1. min f (x) = x4
Stationary point: ∇f (x) = 4x3 = 0 ⇔ x∗ = 0.
Hessian matrix: H(x∗ ) = 12x2 |x=0 = 0.
Although the Hessian matrix is not strictly positive definite, x∗ = 0 is a minimum.
Why can we not guarantee that it is a minimum from the analysis?

2. min f (x1 , x2 ) = 4 + 4.5x1 − 4x2 + x21 + 2x22 − 2x1 x2 + x41 − 2x21 x2


x1 ,x2 ∈R
Stationary point:
{ {
∂f
=0 4.5 + 2x1 − 2x2 + 4x31 − 4x1 x2 = 0
∇f (x1 , x2 ) = 0 ⇔ ∂x1
∂f ⇔
∂x2 =0 −4 + 4x2 − 2x1 − 2x21 = 0
Solving this system of equations with Newton’s method, we obtain the following results:
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 19

x
2

Unconstrained
minimum

x
Constrained
minimum
Feasible
region
x1
x
2

x Constrained
minimum
=
Feasible Unconstrained
region minimum
x1

x2 Local
minimum

Global
minimum
10
20
30
Feasible
40
region
50
x1

Figure 2.8: The effect of the nonconvexity of the feasible region


2.5. UNCONSTRAINED OPTIMISATION 20

Stationary point f (x1 , x2 ) Eigenvalues


(1.941,3,854) 0.9855 37.03, 0.97
(-1.053,1.028) -0.5134 10.5, 3.5
(0.612,1.493) 2.83 7.0,-2.56

Based on the eigenvalues of the Hessian matrix, the first two points are minima and
the third point is a saddle point. To determine which minimum is global, we need
to compare the values of the objective function at each of the minima. We have no
guarantee that we have identified all minima.

2.5.2 Algorithms for Unconstrained Multivariable Optimisation


Automated solution techniques must be

• Efficient, since solution procedures are iterative.

• Robust, i.e. able to get a solution, since a general non-linear function is unpredictable
in its behaviour, and may have local minima and saddle points.

Most iterative procedures alternate between two phases:

1. Choosing a search direction,

2. Minimising in that direction to some extent to find a new point xk+1 = xk + ∆xk .

Local optimisation algorithms require an initial starting point and a convergence criterion
for termination as input, in addition to the problem statement.
Available techniques differ mainly in the way they generate search directions.

1. Direct Methods do not require derivatives and rely solely on function evaluations.
They are simple to understand and execute but are inefficient and lack robustness. They
include random search, grid search, univariate search, the simplex method, conjugate
search directions and Powell’s method. They are useful for simple low-dimensional
problems.

2. Indirect Methods use derivatives in determining the search direction for optimisation.
First-order methods such as the steepest descent use first-order derivatives only while
second-order methods such as Newton’s method also use second-order derivatives.

2.5.2.1 Steepest Descent (or Gradient) Method

Basic idea The gradient ∇f (x) of a function f (x) at a point x̃ is a vector at that point
that gives the (local) direction of the greatest increase in f (x) and is orthogonal (normal)
to the contour of f at x̃ (see Fig. 2.9). In order to minimise f (x), we move continuously in
the opposite direction: the search direction s is the opposite of the gradient: s = −∇f (x̃).
The search direction is followed continuously until we arrive at a stationary point. Note that
in this method, the negative of the gradient gives the direction for minimisation but not the
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 21

x2

Gradient

Steepest descent

x1

Figure 2.9: The steepest descent method moves in the opposite direction to the gradient
f(x)

f(xk+1 )

f(xk)

xk xk+1 x

Figure 2.10: Effect of the magnitude of the search direction

magnitude of the step to be taken. At the k th iteration of the steepest descent, the transition
from point xk to another point xk+1 is given by:
xk+1 = xk + λk sk = xk − λk ∇f (xk ) (2.13)
where λk is the scalar that determines the step length in the direction of steepest descent
−∇f (xk ). How big should λk be for optimal performance of the algorithm?
• If λ is small, the path followed is continuous but may need too many iterations;

• If λ is large we may produce an increase in the objective function, as shown in Fig.


2.10.

Choosing a value of λ can be achieved in several ways:


1. Pre-specify step size λ to be a constant value to be used for every iteration.

2. Employ a one-dimensional search along the negative of the gradient to select step size.
Note that ( )
f (xk+1 ) = f xk − λk ∇f (xk ) .
Let F (λk ) ≡ f (xk+1 ). Solving the one-dimensional minimisation problem min F (λk )
λk
enables the generation of an “optimum” step size.
2.5. UNCONSTRAINED OPTIMISATION 22

Algorithmic procedure

Step 1 Choose an initial point x0 . Set the iteration counter k = 0. Set the convergence
tolerance to ϵ;

Step 2 Calculate (analytically or numerically) the partial first-order derivatives of f (x)

∂f (x)
, j = 1, . . . , n;
∂xj

Step 3 Calculate the search vector

sk = −∇f (xk );

Step 4 Set the step size λk by minimising F (λk ) numerically or using the pre-assigned
step size;

Step 5 Compute the next point xk+1 using the relation xk+1 = xk + λk sk ;

Step 6 Compare f (xk+1 ) with f (xk ).

• If ∥∇f (xk+1 )∥ ≤ ϵ, convergence has been achieved: terminate the algorithm.


• If not, set k = k + 1 and return to step 2.

Remark Termination can occur at any type of stationary point: a local minimum or a
saddle point. Examine the Hessian matrix of the objective function f (x) to characterise the
stationary point. If it is positive-definite, a minimum has been found. Otherwise, a saddle
point has been found and the search must be continued. To move away from the saddle point,
we employ a non-gradient method. The minimisation may then continue as before.

Performance of the steepest descent method While it is very simple, this approach
is sensitive to the scaling of f (x). Convergence is very slow, leading to poor performance.
Solve the following problem:
min x21 + x22
x1 ,x2

with starting point: x(0) = (1, 2)T .


Then change the scaling of the problem by defining new variables y1 = x1 , y2 = x2 /2 and
solve the rescaled problem with the same (rescaled) starting point.

1. Starting point: x(0) = (1, 2)T .


( )
2x1
2. Calculate gradients: ∇f (x1 , x2 ) = ⇐ ∇f (1, 2) = (2, 4)T .
2x2

3. Search vector: s = (−2, −4)T .

4. Compute λ(0) : Minimise F (λ(0) ) = f (x(1) ) = f (x(0) − λ(0) ∇f (x(0) ) = (1 − 2λ(0) )2 +


(2 − 4λ(0) )2 . Thus, F (λ(0) ) = 5 − 20λ(0) + 20λ(0)2 . This function has a minimum at
λ(0) = 1/2.
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 23

x2

2
3
s=-

()-2
f(1,2) = -4

-3 -2 -1 3
x1
0 1 2

-1

-2

-3

Figure 2.11: Pictorial representation of the example problem. Observe that s is a vector
pointing towards the optimum (0,0)

5. Compute the next point: x(1) = x(0) − 1/2∇f (x(0) ) = (0, 0)T .

The solution is found in one iteration only (although one more iteration is needed to verify
convergence). However if we change the scaling of the problem by defining new variables
y1 = x1 , y2 = x2 /2. Solve
min f (y1 , y2 ) = y12 + 4y 2

1. Starting point: y (0) = (1, 1)T .

2. Gradient: ∇f (1, 1) = (2, 8)T .

3. Calculate λ(0) : min F (λ(0) ) = (1 − 2λ(0) )2 + 4(1 − 8λ(0) )2 . Minimum at λ(0) = 0.1037

4. New point: y (1) = (0.784, −0.046)T which corresponds to x = (0.784, −0.092)T .

The steepest descent method is very scale dependent.

2.5.2.2 Second Order Indirect Methods: Newton’s Method

Basic Idea Recall that for x∗ to be a stationary point of the function f (x), a necessary
condition is ∇f (x∗ ) = 0. This is a set of n algebraic equations in n unknowns which can be
solved for x∗ using numerical techniques such as Newton’s method.

Solving systems of nonlinear equations Let F(x) ≡ ∇f (x). To solve a system of


equations of the form F(x) = 0, an iterative method relying on the Jacobian J(x) of the
system (matrix of first-order derivatives of the functions) can be used. The first-order Taylor’s
expansion of this system at a point xk yields
[ ]T [ ]T
F(x) ≈ F (xk ) + ∇F(xk ) (x − xk ) = F(xk ) + J(xk ) (x − xk ) (2.14)
2.5. UNCONSTRAINED OPTIMISATION 24

At the solution x∗ , this can be rewritten as


[ ]T
F(x∗ ) = 0 ≈ F (xk ) + J(xk ) (x∗ − xk ) (2.15)
and re-arranging,
[ ]−1
x∗ ≈ xk − J(xk ) F(xk ) (2.16)

Thus, from a guess xk , a new guess is determined by computing


[ ]−1
xk+1 = xk − J(xk ) F(xk ) (2.17)
If the Jacobian can be computed analytically, this approach is referred to as Newton’s
method. If an approximation is used, it is a Quasi-Newton method or Broyden’s method.

Relationship between Newton’s method and the optimisation problem The Ja-
cobian of the system F(x) = 0 is the Hessian matrix of the original objective function.
[ ]−1
According to Eq. (2.17), the search direction is given by − J(xk ) F(xk ) or equivalently,
[ ]−1
sk = − H(xk ) ∇f (xk ).
How does this compare to the first-order methods? The search direction of the steepest
descent can be interpreted as being orthogonal to a linear approximation (or tangent) of the
objective function at point xk . Now suppose we make a quadratic approximation of f (x) at
xk . [ ]T 1 ( k )T
f (x) ≈ f (xk ) + ∇f (xk ) ∆xk + ∆x H(xk )∆xk (2.18)
2
where H(xk ) is the Hessian matrix of f (x) evaluated at xk and ∆xk = x − xk . This
approximation takes into account the curvature of f (x) at xk and is used in second-order
methods to determine a search direction.

Step size selection In Eq. (2.17), a step size of 1 is effectively used. A more general
equation is
[ ]−1
xk+1 = xk − λk H(xk ) ∇f (xk ) (2.19)
• If f (x) is quadratic, Newton’s method requires only one step to reach a minimum and
λk = 1 can be used.

• For general non-linear functions, λk can be set equal to 1 or determined by minimising


F (λk ) = f (xk+1 ). However, it is not necessary to find the value of λ which minimises
F (λ). One may accept any λ which produces a nonzero decrease in the objective
function f (x). Using Taylor’s expansion, we find
[ ]T
f (x(1) ) = f (x(0) + ∆x(0) ) = f (x(0) ) + λ(0) ∇f (x(0) ) ∆x(0) + . . .
[ ]T (2.20)
f (x(1) ) − f (x(0) ) ≈ λ(0) ∇f (x(0) ) ∆x(0)

The value of λ(0) must be such that f (x(1) ) − f (x(0) ) < 0. Thus, we must have
[ ]T
λ(0) ∇f (x(0) ) ∆x(0) < 0 (2.21)

Any λ(0) that satisfies Eq. (2.21) can be used.


CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 25

Examples

• min f (x) = 4x21 + x22 − 2x1 x2 starting from x(0) = (1, 1)T .
( )
8x1 − 2x2
1. ∇f (x) = , hence ∇f (x(0) ) = (6, 0)T .
2x2 − 2x1
( ) ( )
8 −2 1 1
2. H(x) = , hence [H(x)]−1 = 6 6 .
−2 2 1
6
2
3

3. Using λ = 1, the step is given by

∆x(0) = −H −1 ∇f (x(0) ) = (−1, −1)T .

( ) ( ) ( )
1 −1 0
4. The new guess is then x(1) = x(0) + ∆x(0) = + = and
1 −1 0
f (x(1) ) = 0. It can be checked that at this point ∇f (x(1) ) = 0 and the Hessian
matrix is positive definite.

• min f (x) = 12 x21 x22 + x21 + x22 + 2x1 + x2 starting from x(0) = (1, 1)T .
( )
x1 x22 + 2x1 + 2
1. ∇f (x) = , hence ∇f (x(0) ) = (5, 4)T .
x21 x2 + 2x2 + 1
( ) ( )
x22 + 2 2x1 x2 3 2
2. H(x) = , hence H(x(0) ) = .
2x1 x2 x21 + 2 2 3
3. The step direction is given by
[ ]−1
− H(x(0) ) ∇f (x(0) ) = (−7/5, −2/5)T .

4. Use minimisation to get the step size.


( ) (
(0) 2
)
(0) 2
( )2 ( )2
7λ(0) 2λ(0)
F (λ(0) ) = 1
1 − 7λ5 1 − 2λ5 + 1− + 1−
2 ( (0)
) (0)
5 5
+2 1 − 7λ5 + 1 − 2λ5 .

The one-dimensional minimisation problem min F (λ(0) ) has a minimum at λ(0) =


λ(0)
1.6. (In general, λ ≤ 1)
5. The step is therefore given by

∆x(0) = 1.6(−7/5, −2/5)T = (−2.24, −0.64)T .

6. The new guess is then x(1) = x(0) + ∆x(0) = (−1.24, 0.36)T and f (x(1) ) = −0.353.
Note that the value of f (x(0) ) was 5.5.
2.6. NONLINEARLY CONSTRAINED OPTIMISATION 26

Algorithmic procedure Generalise your experience by proposing an algorithmic procedure


for Newton’s method.

Step 1 Choose an initial point x0 . Set the iteration counter k = 0. Set the convergence
tolerance to ϵ. Choose λ.

Step 2 Calculate f (xk ), ∇f (xk ) and H(xk ).

Step 3 Solve for ∆xk by solving the system of linear equations

H(xk )∆xk = −λ∇f (xk )

Step 4 Calculate xk+1 = xk + ∆xk . If f (xk+1 ) > f (xk ), reduce the step length λ and
go back to Step 3.

Step 5 Test for convergence:

• If ∥∇f (xk+1 )∥ < ϵ, terminate.


• Otherwise, set k = k + 1 and return to Step 2.

Performance of Newton’s method The convergence of this method is quadratic.

Other second-order methods Quasi-Newton methods rely on an approximation H̃(x) of


H(x) instead of the exact Hessian matrix. H̃(x) must be symmetric. In addition, if the initial
Hessian matrix H̃(x0 ) is positive-definite, the approximation must also be positive-definite.
The approximation is constructed by various combinations of the first derivatives as they are
cheaper to evaluate than the exact Hessian matrix. For instance, the BFGS update is given
by
∆g k ∆g kT H̃ k ∆xk ∆xkT H̃ k
H̃ k+1 = H̃ k + − (2.22)
∆g kT ∆xk ∆xkT H̃ k ∆xk
[ ]−1
where ∆g k = ∇f (xk+1 ) − ∇f (xk ) and ∆xk = H̃ k ∆g k .

Unified treatment All methods can be viewed as evolving from Newton’s method. For
Newton’s method, H is exact and convergence is quadratic. For Quasi-Newton methods,
H = H̃ and convergence is supralinear. For the steepest descent method, H = H −1 = I and
convergence is linear. There is a trade-off between the cost of Hessian matrix evaluations and
the rate of convergence.

2.6 Nonlinearly Constrained Optimisation


We now focus on the following class of problems:

min f (x)
x
s.t h(x) = 0
(2.23)
g(x) ≤ 0
x ∈ Rn
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 27

x2

3 _
1
_
2 (3,2)
1 _

| | | | x1
0 1 2 3 4

Figure 2.12: Pictorial representation of the example problem

where f (x) is a scalar function, h(x) is a vector of size m, g(x) is a vector of size p and
n > m.
Since we know how to solve unconstrained optimisation problems, it would be useful to
transform the constrained optimisation problem (2.23) into an unconstrained problem.

2.6.1 From Constrained to Unconstrained Optimisation


2.6.1.1 Feasible Path Approach

Consider problems with equality constraints only:

min f (x)
x
s.t h(x) = 0 (2.24)
x ∈ Rn

We can obtain an unconstrained problem by eliminating m variables by using the m equality


constraints which then become redundant. The resulting problem can then be solved using
Newton’s method or steepest descent. By construction, all the original equality constraints
are satisfied at every iteration, and a feasible path is followed.

Example Consider the problem of determining the point on a unit circle centered at (3,2)
which is closest to the origin (0,0).

min f (x1 , x2 ) = x21 + x22


s.t. (x1 − 3)2 + (x2 − 2)2 = 1

The equality can be re-arranged as x2 = 2 − 1 − (x1 − 3)2 and used to eliminate x2 from
the objective function. The problem becomes:
[ ( )2 ]

min x21 + 2 − 1 − (x1 − 3)2 ,
x1

a single variable unconstrained optimisation problem.


2.6. NONLINEARLY CONSTRAINED OPTIMISATION 28

Disadvantages This approach is difficult to use in practice on large problems because

• elimination of the variables may not be possible analytically. Using Newton’s method to
solve the equalities numerically is expensive and may be prone to numerical problems.

• elimination of the variables does not necessarily yield a unique solution. In the example,

for instance, the equality is equivalent to x2 = 2 ± 1 − (x1 − 3)2 . It may not be easy
to choose the correct solution.

2.6.1.2 Penalty Functions

Another approach is to form the following function:


m
ϕ(x) = f (x) + K [hi (x)]2 , K > 0. (2.25)
i=1

Note that provided all the constraints are satisfied (i.e. h(x) = 0), this is the same as the
original objective function f (x). On the other hand, if hi (x) = 0 is violated for some i, ϕ(x)
is larger than f (x). Thus, by choosing a very large K, even small constraint violations are
severely penalised. We then expect that the solution of the unconstrained problem

min ϕ(x)
x

will satisfy the constraint h(x) = 0. Chapter 6 of [3] provides an in-depth discussion of
penalty methods and their application to problems with inequality constraints.

Disadvantages A major issue with penalty function approaches is that there is no good
way to choose a value for K.

• If K is too small, the penalty term may not be sufficient to ensure that all equalities
are satisfied. This phenomenon can be attributed to the presence of local minima.

• The larger K becomes the more ill-conditioned the unconstrained optimisation problem
is.

2.6.2 Optimality Conditions for Constrained Optimisation

As seen in the previous section, transforming constrained problems into unconstrained prob-
lems may result in numerical difficulties. In order to explore alternative approaches, we need
to understand what characterises a minimum in a constrained problem.

2.6.2.1 Equality Constrained Problems

Let us first explore the effect of equality constraints on the minimum.


CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 29

Example Consider again our earlier example problem:


min f (x1 , x2 ) = x21 + x22
x1 ,x2
s.t. (x1 − 3)2 + (x2 − 2)2 = 1
If we ignore the constraint, x is not limited to the unit circle centered at (3,2) and the
minimum is found at x = (0, 0)T , where f (x) = 0. However, when x is confined to the circle,
the solution moves to x = (2.17, 1.44)T where f (x) = 6.789. At that point, the gradients
of f (x) are given by ∇f (x) = (4.34, 2.88)T . The optimality conditions for this constrained
example are clearly different from the unconstrained case.
Suppose that point x̃ on the circle is the minimum. Then, the effect of small steps δx on
the objective function is
f (x̃ + δx) = (x̃1 + δx1 )2 + (x̃2 + δx2 )2 = x̃21 + x̃22 + 2x̃1 δx1 + 2x̃2 δx2 + H.O.T. (2.26)
The presence of the equality constraint means that not all δx are legal as we must stay on
the circle. Hence, we must have
(x̃1 + δx1 − 3)2 + (x̃2 + δx2 − 2)2 = 1
(2.27)
i.e. (x̃1 − 3)2 + (x̃2 − 2)2 + 2(x̃1 − 2)δx1 + 2(x̃2 − 2)δx2 + H.O.T. = 1
Since by definition x̃ is on the circle, and neglecting H.O.T., we must have
(x̃1 − 3)δx1 + (x̃2 − 2)δx2 = 0 (2.28)
Putting together the first-order linear expansions of the objective function and the constraint,
we get: ( )( ) ( )
2x̃1 2x̃2 δx1 δf
= (2.29)
2(x̃1 − 3) 2(x̃2 − 2) δx2 0
This a linear system of equations. If the matrix on the LHS is non-singular, then we can solve
this system to find a step δx = (δx1 , δx2 )T that may achieve any change in f – including
a decrease. But, since x̃ is a minimum, a decrease is not possible and the matrix must be
singular.
Singularity implies that the matrix rows are linearly dependent, i.e. that there are mul-
tipliers λ1 and λ2 such that: {
λ1 x̃1 + λ2 (x̃1 − 3) = 0
(2.30)
λ1 x̃2 + λ2 (x̃2 − 2) = 0
where λ1 and λ2 are not both zero.
Assuming that λ1 is not zero, and defining λ = λ2 /λ1 , we can divide by λ1 :
x̃1 + λ(x̃1 − 3) = 0
(2.31)
x̃2 + λ(x̃2 − 2) = 0
Summarising, the necessary conditions for optimality and feasibility are:
x̃1 + λ(x̃1 − 3) = 0
x̃2 + λ(x̃2 − 2) = 0 (2.32)
(x̃1 − 3)2 + (x̃2 − 2)2 = 1
This is a system of three equations in three unknowns, x̃1 , x̃2 and λ. The solution of the
above system gives: x̃1 = 2.17, x̃2 = 1.44 and λ = 2.61.
2.6. NONLINEARLY CONSTRAINED OPTIMISATION 30

Necessary conditions We consider the problem


min f (x)
x
s.t. h(x) = 0 (2.33)
x ∈ Rn
Theorem 2.1 If f (x) has a constrained extremum at x̃ such that h(x̃) = 0, then the gradi-
ents ∇f (x̃) and ∇hi (x̃), i = 1, . . . , m are linearly dependent and satisfy

m
∇f (x̃) + λi ∇hi (x̃) = 0 (2.34)
i=1

Thus the necessary conditions for optimality and feasibility are



m
∇f (x̃) + λi ∇hi (x̃) = 0 and h(x̃) = 0 (2.35)
i=1

This is a system of n + m equations in n + m unknowns.

Lagrange function Define the Lagrange function L(x, λ) as



m
L(x, λ) = f (x) + λi hi (x) (2.36)
i=1

The stationary points of this function occur at ∇L(x, λ) = 0, or


{ 
 ∇f (x) + ∑ λ ∇h (x) = 0
m
∂L
∂x = 0 i i
⇔ i=1 (2.37)
∂L
∂λ = 0

h(x) = 0
Since these are the necessary optimality conditions for the constrained problem (2.33), the
Lagrangian L has a stationary point at the minimum of the constrained problem. Note that
a stationary point of L is not always a constrained minimum.
λi is referred to as the Lagrange multiplier for constraint i.

2.6.2.2 General Problems – The Kuhn-Tucker Conditions

Example Let us return to the example considered earlier and recall that the solution was
(2.17, 1.44)T . We now add the inequality constraint x1 ≥ 2.1, so that the problem formulation
becomes:
min f (x) = x21 + x22
s.t. (x1 − 3)2 + (x2 − 2)2 = 1 (2.38)
2.1 − x1 ≤ 0
Obviously, this does not alter the solution since x̃1 = 2.17 is already greater than 2.1. In this
case, the inequality constraint is inactive and could be ignored. On the other hand, if we
demand that x1 ≥ 2.5, this will affect the solution, since x̃1 violates this requirement. From
Fig. 2.13, it is obvious that the minimum is now at point A at which x1 = 2.5. In this case
the constraint x1 ≥ 2.5 is, in fact, satisfied as an equality x1 = 2.5 and is said to be active.
For optimisation purposes, active inequality constraints could in effect be treated as equal-
ities and assigned a Lagrange multiplier. Let us consider this issue for another example
problem.
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 31

x1> 2.5
x2

3 _

2
_ .
(3,2)
1 _ .
A
| | | | x1
0 1 2 3 4

Figure 2.13: The constraint x1 ≥ 2.5 shifts the minimum


x2

3 _
~x
2
_
. δx
x1x2=1
1 _

| | | | x1
0 1 2 3 4

Figure 2.14: Pictorial representation of the example problem

Example Consider the following problem, also shown in Fig. 2.14,

min f (x) = x21 + x22


s.t. g(x) = −x1 x2 ≤ −1 (2.39)
x ∈ Rn

It is obvious from the figure that the constraint g(x) will in fact be active at the solution,
i.e. x1 x2 = 1.
Now, consider a point x̃ on the hyperbola x1 x2 = 1. As before, we consider small steps
δx from the minimum x̃. However, in this case the step does not have to take us to another
point on the curve x1 x2 = 1. Then,
{
f (x̃ + δx) = f (x̃) + 2x̃1 δx1 + 2x̃2 δx2 + H.O.T.
(2.40)
−(x̃1 + δx1 )(x̃2 + δx2 ) ≤ −1

The second equation can be re-arranged by noting that x̃1 x̃2 = 1 and neglecting second-order
2.6. NONLINEARLY CONSTRAINED OPTIMISATION 32

terms. We then have


{
f (x̃ + δx) = f (x̃) + 2x̃1 δx1 + 2x̃2 δx2
(2.41)
−x̃2 δx1 − x̃1 δx2 ≤ 0

Denoting the value of −x̃2 δx1 − x̃1 δx2 by δg, we obtain


( )( ) ( )
2x˜1 2x̃2 δx1 δf
= (2.42)
−x̃2 −x̃1 δx2 δg

As before, for x̃ to be a minimum, the rows of the matrix on LHS must be linearly dependent.
Therefore, there must exist µ1 and µ2 , with at least one non zero, such that:
{
µ1 2x̃1 − µ2 x̃2 = 0
(2.43)
µ1 2x̃2 − µ2 x̃1 = 0

Assuming that µ1 is not zero and setting µ = µ2 /µ1 , we get


{
2x̃1 − µx̃2 = 0
(2.44)
2x̃2 − µx̃1 = 0

Thus, δf = 2x̃1 δx1 + 2x̃2 δx2 = µx̃2 δx1 + µx̃1 δx2 . Therefore, δf = −µδg. From the definition
of δg and Eq. (2.41), we must have δg ≤ 0 or −δg ≥ 0. Thus, by choosing µ < 0, we
could cause a decrease in f , which is incompatible with the premise that x̃ is a minimum.
Therefore, µ must be non-negative.

Active inequality constraints are treated just like equality constraints with the
additional restriction that their Lagrange multipliers must be non-negative.

Properties of the Lagrange multipliers for inequality constraints At an optimal


solution of problem (2.23) the Lagrangian must satisfy:

∂f ∑
m
∂hi ∑ ∂gi
+ λi + µi = 0, j = 1, . . . , n (2.45)
∂xj ∂xj ∂xj
i=1 i∈AS

where AS is the set of active inequality constraints.


The inactive inequality constraints are ignored. The additional restriction µi ≥ 0 must
be imposed for every active inequality constraint. However, it is not known a priori which
inequalities are active and which are inactive. Therefore, it is convenient to introduce a
Lagrange multiplier for every inequality constraint, writing

∂f ∑ ∂hi ∑ ∂gi
m p
+ λi + µi = 0, j = 1, . . . , n (2.46)
∂xj ∂xj ∂xj
i=1 i=1

and simply stating that µi ≥ 0 if gi is active and µi = 0 if gi is inactive. These conditions on


the µ’s may be written compactly as

p
µi gi (x) = 0; µi ≥ 0, i = 1, . . . , p. (2.47)
i=1
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 33

Every term in the summation must vanish to satisfy this constraint. For active constraint
gi , gi (x) = 0 by definition. Each inactive constraint gi is such that gi (x) < 0. Thus the
summation can only be equal to 0 if the corresponding µi ’s are equal to 0.
This condition is called the complementarity condition as µi and gi (x) are comple-
mentary in the sense that at least one of them is zero for each constraint.

Kuhn-Tucker conditions The above observations can only be generalised to problems of


the form (2.23) which satisfy certain criteria. In particular, the feasible region must have
certain properties which are referred to as constraint qualification. One type of constraint
qualification states that the active inequality constraints and the equality constraints should
be linearly independent. Many other forms have been proposed (see Chapter 5 of [1]).

• Kuhn-Tucker necessary conditions: Consider a problem of type (2.23) in which the


constraints are linearly independent. Let x∗ be a feasible point for this problem. If x∗
is a local solution of (2.23), there exists a point (x∗ , λ∗ , µ∗ ) which satisfies the Kuhn-
Tucker conditions below:
Lagrange conditions:

∇x L(x∗ , λ∗ , µ∗ ) = ∇x f (x∗ ) + λ∗T ∇x h(x∗ ) + µ∗T ∇x g(x∗ ) = 0 (2.48)

Complementarity conditions:
µ∗T g(x∗ ) = 0
(2.49)
µ∗ ≥ 0

• Kuhn-Tucker sufficient conditions: Consider x∗ , a feasible point for problem (2.23), for
which the Kuhn-Tucker necessary conditions hold. Define the Hessian matrix of the
restricted Lagrangian as
∑ ∑
∇2 L(x∗ ) = ∇2 f (x∗ ) + λ∗i hi (x∗ ) + µ∗i ∇2 gi (x∗ ),
i i∈I

where I is the set of active inequality constraints (i.e. I = {i : gi (x∗ ) = 0}) . Define
the set I + of strongly active inequality constraints as I + = {i ∈ I : µ∗i > 0}, and the
set I 0 of weakly active inequality constraints as I 0 = {i ∈ I : µ∗i = 0}. The cone of
feasible directions is then the set C such that
C = {x ̸= 0 : ∇gi (x∗ )T x = 0 for i ∈ I +
∇gi (x∗ )T x ≤ 0 for i ∈ I 0

∇hi (x ) x = 0 for i = 1, . . . , m}.
T

If the Hessian of the restricted Lagrangian is positive-definite at the Kuhn-Tucker point


for all feasible directions, i.e.

xT ∇2 L(x∗ )x > 0, ∀x ∈ C

then x∗ is a strict local solution of (2.23).

Note that if f (x) is convex and the feasible region is convex, any point which satisfies the
Kuhn-Tucker necessary conditions is the global minimum of the problem.
2.6. NONLINEARLY CONSTRAINED OPTIMISATION 34

min f (x) = 4x21 + 5x22


Example The Lagrangian is: L(x, λ) = 4x21 + 5x22 +
s.t. h(x) = 2x1 + 3x2 − 6 = 0
λ(2x1 + 3x2 − 6). The necessary conditions are ∇L = 0:
 
 ∂L
 ∂x1 = 0 
 8x1 + 2λ = 0
∂L
∂x = 0
⇔ 10x2 + 3λ = 0

 ∂L2 = 0 
 2x + 3x − 6 = 0
∂λ 1 2

This is a system of 3 linear equations in 3 unknowns, with solution: x∗1 = 1.071, x2 = 1.286
and λ∗ = −4.286. Since the objective function and feasible region are convex, this Kuhn-
Tucker point is the unique solution to the problem and f (x∗ ) = 12.857.

2.6.2.3 Finding Kuhn-Tucker points: Iterative active set strategy

Basic idea The central difficulty in finding Kuhn-Tucker points is determining which in-
equality constraints should be active. In order to address this issue, an iterative strategy
which postulates a different set of active constraints at every iteration can be used.

Algorithmic procedure
Step 1 Let the set of active inequalities JA = ∅. Set µi = 0, i = 1, . . . , p.

Step 2 Find x∗ , λ∗ , µ∗ such that


 ∑
m ∑ ∗
 ∗ λ∗i ∇h(x∗ ) + µi ∇gi (x∗ ) = 0
 ∇f (x ) +

i=1 i∈JA
(2.50)

 hi (x∗ ) = 0, i = 1, . . . , m

gi (x∗ ) = 0, i ∈ JA

Step 3 Termination test

• If gi (x∗ ) ≤ 0 and µ∗i ≥ 0 for all i = 1, . . . , p, a solution has been found: terminate.
• Otherwise, add any violated inequality constraint (gi (x∗ ) > 0) to the set JA , set
µi ≥ 0, and remove from JA the inequality with the most negative multiplier (if
any) and set the multiplier of this inequality to 0. Return to step 2.

Example
min x21 + x22
s.t. g1 (x) = 1 − x1 ≤ 0
g2 (x) = x1 − 3 ≤ 0
g3 (x) = 2 − x2 ≤ 0
g4 (x) = x2 − 4 ≤ 0
Step 1 Assume there are no active inequality constraints JA = ∅ and µi = 0, i = 1, 2, 3, 4.

Step 2 Find x∗ such that ∇f (x∗ ) + µi gi (x∗ ) = 0. This is equivalent to
i∈JA
{ {
∂f
∂x1 + 0 = 0 2x∗1 = 0
⇔ ⇔ x∗ = (0, 0)T (2.51)
∂f
∂x2 + 0 = 0 2x∗2 = 0
Note that this is the solution of the unconstrained problem.
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 35

Step 3 Test for violated inequalities: g(x∗ ) = (1, −3, 2, −4)T . Hence g1 and g3 are vio-
lated at x∗ . Set JA = {1, 3}, µ2 = µ4 = 0 and return to Step 2.

Step 2 Find x∗ , µ∗1 and µ∗3 such that



 ∗ ∗ ∗ ∗ ∗
 ∇f (x ) + µ1 ∇g1 (x ) + µ3 ∇g3 (x ) = 0
g1 (x∗ ) = 0

 g (x∗ ) = 0
3




∂f
+ µ∗1 ∂x
∂g1
+ µ∗3 ∂x
∂g3
=0


∂x1 1 1
∂f
+ µ∗1 ∂x
∂g1
+ µ ∗ ∂g3 = 0
⇔ ∂x2 2 3 ∂x2

 g1 (x∗ ) = 0


g3 (x∗ ) = 0


 2x∗1 + µ∗1 (−1) + µ∗3 (0) = 0

 2x∗ + µ∗ (0) + µ∗ (−1) = 0
⇔ 2 1
∗ =0
3

 1 − x


1
2 − x∗2 = 0
The solution to this linear system is x∗ = (1, 2)T , µ∗1 = 2 and µ∗3 = 4.

Step 3 Test for violated inequalities: g(x∗ ) = (0, −2, 0, −2)T . All inequalities are satis-
fied. Test for non-negativity of the Lagrange multipliers for inequalities: all µi ’s are
non-negative. The global optimum solution has been found (note that the problem is
convex): x∗ = (1, 2)T and f (x∗ ) = 5.

If Step 2 requires the solution of a nonlinear system of equations, Newton’s method can
be used after choosing some initial values for x, λ and µ.

2.6.3 Algorithms for Constrained Optimisation


2.6.3.1 Successive Quadratic Programming, SQP

What is Quadratic Programming (QP)? QP is a subset of the NLP class of optimisa-


tion problems. QP problems fit the form

min f (x) = cT x + 12 xT Qx
x
s.t. Ax ≥ b (2.52)
x≥0

Some efficient techniques for solving such problems have been developed (active set strategies,
Simplex-based methods, . . .).

Basic Idea Build QP approximation of problem (2.23) at the current point and solve the
resulting (easier) subproblem. The approximation is constructed by expanding the Kuhn-
Tucker necessary conditions in a Taylor series around the current point and taking a Newton
step.
Further details and an algorithmic procedure are given in Section 2.8.1.
2.6. NONLINEARLY CONSTRAINED OPTIMISATION 36

x
2

linearisation of
nonconvex constraint

global minimum
x1

50 40 30 20

Figure 2.15: Linearisation of nonconvex constraints may lead to the elimination of the global
solution

Comments on SQP methods


1. It is very common to solve general problems by transforming them to simpler and more
structured problems. The QP subproblem is to SQP methods what linear equations
are to nonlinear equation solvers.

2. SQP algorithms are “infeasible path” algorithms: the equality and inequality con-
straints are converged simultaneously while minimising f and are only satisfied at the
solution.

3. The rate of convergence is quadratic if the exact Hessian of the Lagrangian is used, and
superlinear if the BFSG approximation is used

4. Some problems may arise during the solution of the QP as its feasible region may be
empty. This can happen when nonconvex constraints are linearised, as shown in Fig.
2.15.

2.6.3.2 Generalised Reduced Gradient Method, GRG

This approach was developed by several groups, such as Abadie and Carpenter, Lasdon and
Warren, Murtagh and Saunders.

Basic Idea Eliminate variables to reduce the dimensionality of the problem and apply
Newton’s method in the reduced space.
Further details on this method and an algorithmic procedure are given in Section 2.8.2.

Comments on GRG methods


1. The algorithms always remain feasible in the linear constraints.

2. Sparsity can be very effectively exploited.

3. When linear inequalities are involved, they can be transformed to linear equalities by
introducing slack variables.
CHAPTER 2. NONLINEAR OPTIMISATION: THE BASICS 37

2.6.3.3 Comparison of Algorithms

SQP

• Works best for highly nonlinear problems.


• Requires few iterations in general.
• Can handle large scale problems provided a good QP solver is available.
• Works well with black box models.

MINOS

• Works best for mostly linear and large scale problems (say 6000 variables and 5000
constraints).
• Requires a moderate number of iterations as sparsity is very well exploited through
LP technology.
• Very complex to program as analytical functions are required to exploit all features
(write f (x) as Ax + f N (x)).

GRG2/GINO

• Can be slow for nonlinear constraints.


• Requires a large number of iterations.
• Somewhat complex to program.

2.7 Large Problems: Flowsheet Optimisation


There are two main approaches to flowsheet modelling environments: sequential modular
and equation oriented. The sequential modular approach is geared towards units: there is a
separate model for each unit (or module) and each is considered in turn during a simulation.
In the equation oriented approach, there is a single integrated model for the entire flowsheet
and all equations are handled simultaneously.

2.7.1 Sequential Modular Approach


The sequential approach can lead to convergence problems in the presence of recycle streams.
Let u denote the vector of decision variables, x the vector of tear variables and r(x) = 0 the
set of tear equations. Feasible or infeasible path techniques can be used.

Feasible path We need to solve the following optimisation problem

min f (u)
(2.53)
s.t. g(u) ≤ 0

by converging the process simulation at each iteration. This is the approach followed in
ASPEN.
Infeasible path In this case, we solve the problem

min f (u, x)
s.t. r(u, x) = 0 (2.54)
g(u, x) ≤ 0

by converging and optimising simultaneously. This is the approach taken by Flowtran.


The infeasible path approach appears preferable. The main problem arises from the
gradients which are prone to numerical perturbation.

2.7.2 Equation Oriented Approach


We solve a problem of the form
min f (x)
x
s.t. h(x) = 0 (2.55)
g(x) ≤ 0
where the number of degrees of freedom is small. It can be solved through simultaneous
solution using MINOS. This works best on simplified models and is the approach taken by
SPEEDUP. Alternatively, a decomposition scheme with SQP can be used, as those proposed
by Berna, Locke and Westerberg and implemented in ASCEND-II, or the scheme of Varan-
tharajan and Biegler.

2.8 Additional information


2.8.1 Successive Quadratic Programming
2.8.1.1 QP subproblem for equality constrained NLPs

Consider the problem


min f (x)
x
s.t. h(x) = 0 (2.56)
x ∈ Rn
The Kuhn-Tucker conditions for this problem are ∇L(x, λ) = 0, i.e.
{
∇x L(x, λ) = ∇f (x) + λT ∇h(x) = 0
(2.57)
∇λ L(x, λ) = h(x) = 0

Recall that the Newton step for such a system of nonlinear equations is given by sk =
[ ]−1
− ∇2 L(xk , λk ) ∇L(xk , λk ). The Jacobian of the Lagrangian is given by
( ) ( )
∇L(x, λ) ∇f (x) + λT ∇h(x)
∇L(x, λ) = =
h(x) h(x)

and the Hessian matrix of the Lagrangian is given by


( )
∇xx L(x, λ) ∇h(x)
∇ L(x, λ) =
2
.
∇h(x) 0
Substituting this into the Newton step definition, with sk = (∆xk , ∆λk )T , we obtain
{
∇f (xk ) + ∇xx L(xk , λk )∆xk + λT ∇h(xk ) = 0
(2.58)
h(xk ) + ∇h(xk )T ∆xk = 0
The above system is the first-order Taylor expansion of the Kuhn-Tucker conditions around
point (xk , λk ). It also corresponds to the Kuhn-Tucker conditions of the quadratic program:
min ∇f (x)T (x − xk ) + 21 (x − xk )T ∇xx L(xk , λk )(x − xk )
x (2.59)
s.t. h(xk ) + ∇h(xk )T (x − xk ) = 0
The Newton step can therefore be calculated by solving QP (2.59). The solution x∗ of this
QP gives the search direction ∆xk = x∗ − xk and the multipliers of the equality constraints
give λk+1 .

2.8.1.2 QP subproblem for NLPs with inequality constraints

The quadratic approximation for general problems of form (2.23) is obtained by applying
the same transformations to the objective and equality constraints as in problem (2.59) and
by linearising the inequalities around the current point. Thus the general QP subproblem is
given by
min ∇f (x)T (x − xk ) + 21 (x − xk )T ∇xx L(xk , λk )(x − xk )
x
s.t. h(xk ) + ∇h(xk )T (x − xk ) = 0 (2.60)
g(x ) + ∇g(x ) (x − x ) ≤ 0
k k T k

where the Lagrangian is now L(x, λ, µ) = f (x) + λT h(x) + µT g(x) and the second-order
derivatives with respect to x are given by ∇xx L(x, λ, µ) = ∇xx f (x) + λT ∇xx h(x) +
µT ∇xx g(x). Since ∇xx L requires information on the second-order derivatives of f , h and
g, it is very costly to evaluate. In practice, a Quasi-Newton update formula is usually used.

2.8.1.3 Algorithmic procedure: the Wilson-Han-Powell SQP

Step 1 Set iteration counter k = 0, initialise estimate of the Hessian matrix B 0 = I and
give initial guess x0 . Set the step size α. Set convergence tolerances, ϵ1 and ϵ2 .

Step 2 Evaluate functions f (xk ), h(xk ), g(xk ) and their gradients ∇f (xk ), ∇h(xk ) and
∇g(xk ).

Step 3 Solve QP subproblem


min ∇f (xk )T d + 21 dT B k d
d
s.t. h(xk ) + ∇h(xk )T d = 0
g(xk ) + ∇g(xk )T d ≤ 0
to obtain the search direction d = ∆xk and the multipliers λk+1 and µk+1 .

Step 4 Check convergence:

• If dT d ≤ ϵ1 and ∇L(xk+1 , λk+1 , µk+1 )T ∇L(xk+1 , λk+1 , µk+1 ) ≤ ϵ2 , terminate.


The solution is xk .
• Else, xk+1 = xk + αd, update B k+1 using the BFSG formula and set k = k + 1.
2.8.1.4 Infeasible QPs

The problem of infeasible QP which may arise through linearisation for nonconvex problems
can be overcome by adding a new variable u to the QP as follows:

min ∇f (xk )T d + 12 dT B k d + Ku
d,u
s.t. h(xk ) + ∇h(xk )T d = 0
g(xk ) + ∇g(xk )T d ≤ u
u≥0

where K is a large positive scalar. The effect of this variable is to allow the linearised
inequality constraints to be violated, although this comes at a large cost in the objective
function. Note additionally that this approach offers no guarantee that the optimum solution
will not be cut off.

2.8.2 Reduced gradient methods


2.8.2.1 GRG for linearly constrained problems

Consider the problem


min f (x)
x (2.61)
s.t. Ax = b

where A is m × n matrix
( )(m < n) and dim{x} = n. The variable set can be partitioned into
y
two subsets: x = , where y is a vector of m basic (dependent) variables and u is a
u
vector of n−m superbasic (independent) variables. Similarly, the matrix A can be partitioned
into A = [B|C] where B is a square m × m non-singular matrix and C is an m × (n − m)
matrix. Then the equality constraints become
( )
y
Ax = [B|C] = b. (2.62)
u
( ) ( )
∆y k y − yk
Consider the step ∆xk = = that satisfies the linear equalities of
∆uk u − uk
Eq. (2.62) so that
B∆y k + C∆uk = 0 ⇒ ∆y k = −B −1 C∆uk . (2.63)

We can now eliminate the y variables from the problem:


( ) ( )
−B −1 C∆uk −B −1 C
∆x = = ∆uk = Z∆uk (2.64)
∆uk I

where ( )
−B −1 C
Z≡ (2.65)
I
is called the transformation matrix. Consider the second-order expansion of the objective
function f (x) at xk

1
f (x) = f (xk ) + ∇f (xk )T ∆xk + ∆xkT ∇2 f (xk )∆xk (2.66)
2
Then, if we substitute Eq. (2.64) into Eq. (2.66)
[ ] 1 [ ]
f (x) ≈ f (xk ) + ∇f (xk )T Z ∆uk + ∆ukT Z T ∇2 f (xk )Z ∆uk ≡ F (∆uk ) (2.67)
2
Eq. (2.67) is an expansion of f in the reduced space of u. Let us define

T = ∇f (xk )T Z
1. Reduced Gradient gR

2. Reduced Hessian HR = Z T ∇2 f (xk )Z

Then, the original problem can be approximated by

1
min F (∆uk ) = f (xk ) + gR
T
∆uk + ∆ukT HR ∆uk (2.68)
∆ uk 2

an unconstrained optimisation problem. The necessary conditions for a minimum are

∂f (u) −1
= 0 ⇒ ∆uk = −HR gR (2.69)
∂∆uk

The above expression for ∆uk is of the same form as the usual Newton step, but it is calculated
in the reduced space.

Computing the Newton step ∆uk To compute the reduced gradient gR T = ∇f (xk )T Z,

we need B −1 since it appears in the definition of Z (Eq. (2.65)). Rewrite


( )
[ ] −B −1 C
gR = ∇f (x ) Z = ∇y f (x ) |∇u f (x )
T k T k T k T
(2.70)
I

Therefore [ ]
T
gR = −∇y f (xk )T B −1 C + ∇u f (xk )T (2.71)

This is a common expression for reduced gradients. It would be useful to avoid computing
B −1 . To do so, we define λT = −∇y f (xk )T B −1 . Then,

λT B = (B T λ)T = −∇y f (xk )T .

Re-arranging, we can formulate a system of linear equations:

B T λ = −∇y f (xk ) (2.72)

and solve it for λ (and therefore −∇y f (xk )T B −1 ). The reduced gradient then becomes

T
gR = λT C + ∇u f (xk )T .
It can be shown that λ is the vector of Lagrange multipliers for the original problem (2.61).
Recall that the Kuhn-Tucker conditions for this problem require that

∇f (x) + AT λ = 0.

Therefore, ( ) ( ) ( )
∇y f (xk ) BT 0
+ λ= .
∇u f (xk ) CT 0

Hence, ∇y f (xk ) + B T λ = 0 and B T λ = −∇y f (xk ): the vector λ satisfies the KT necessary
conditions.
The reduced Hessian matrix HR = Z T ∇2 f (xk )Z can be estimated with a quasi-Newton
method (e.g. BFGS) using information on changes in gR T.

Eq. (2.69) can then be solved as linear system of equations:

HR ∆uk = −gR (2.73)

Algorithmic procedure for linearly constrained problems

Step 1 Select a feasible starting point for Ax = b. This can be obtained through Phase
I of an LP algorithm. Setup the sets of basic (y) and superbasic (u) variables. Set
HR0 = I. Partition the matrix A = [B|C]. Set the iteration counter k = 0 and the

convergence tolerances ϵ1 and ϵ2 .

Step 2 Compute f (xk ) and ∇f (xk ) = (∇y f (xk ), ∇u f (xk ))T .

Step 3 Solve B T λ = −∇y f (xk ). Calculate gR


T = λT C∇ + f (xk )T . Calculate step
u
∆u by solving HR ∆u = −gR . Obtain ∆y k by solving B∆y k = −C∆uk . Set
k k

∆xk = (∆y k , ∆uk )T .

Step 4 Perform a line search for xk+1 = xk + α∆xk , with the aim of minimising f (xk+1 )
in the direction given by ∆xk . Typically, α = 0.3.

Step 5 Check convergence

• If gR
T g < ϵ and ∆ukT ∆uk < ϵ , terminate.
R 1 2

• Else, update HR with BFGS, set k = k + 1 and return to step 2.

2.8.2.2 Extension to nonlinearly constrained problems

Given a general problem of form (2.23), redefine it as a problem with equality constraints
only by introducing slack variables:

min f (x)
x,σ
s.t. r(x) = 0 (2.74)
xL ≤ x ≤ xU
σ ∈ Rp
Apply correction step with
Newton to regain feasibility

Figure 2.16: Newton correction step to maintain feasibility in GRG2

( )
h(x)
where r(x) = .
g(x) + σ 2
One procedure which has been proposed by Lasdon and Warren (GRG2, GINO) to handle
this type of problem is based on the following approximation:

min f (x)
(2.75)
s.t. J(xk )x = J(xk )xk

where J(xk ) is the Jacobian of r(x) at xk . Note that the solution of the above problem is
not necessarily a feasible point for the original problem. After each approximation is solved,
a Newton correction step is used to ensure feasibility, as shown in Fig. 2.16. This method is
therefore a feasible path method.
Bibliography

[1] M.S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear Programming - Theory and
Algorithms. Second. New York: John Wiley and Sons, 1993.
[2] C. A. Floudas. Nonlinear and Mixed-Integer Optimization: Fundamentals and Applica-
tions. Oxford: Oxford University Press, 1995.
[3] G. V. Reklaitis, A. Ravindran, and K. M. Ragsdell. Engineering Optimization: Methods
and Applications. New York: John Wiley, 1983.
Chapter 3

Modelling discrete alternatives

Many of the decisions made in process synthesis are discrete in nature (number of trays in
a distillation, use of a given unit type). If we are to use an algorithmic approach to process
synthesis, we therefore need to represent these discrete choices mathematically and learn
how to solve the resulting optimisation problems. In this chapter, we consider the issue of
modelling discrete decisions.
There are three main steps in this process:

1. Identification of all possible/relevant alternatives.

2. Definition of discrete decision variables.

3. Derivation of algebraic equations representing all alternatives.

3.1 Building superstructures


The first step in the process is best represented graphically.

For homogeneous systems such as heat exchanger networks, distillation sequences and
utility plants, systematic representations can be developed. Fig. 3.1 shows a network of
distillation columns for a four component system with sharp splits (Andrecovich and Wester-
berg, 1985). All possible column sequences are considered in this diagram. Figure 3.2 shows
a superstructure for a CSTR/PFR system. The reactors may be used alone, in parallel, in
series, or in any other configuration, and recycles may be included.

For heterogeneous systems such as overall plant flowsheets, alternatives must generally
be specified by the user. For instance, different types of reactors may be proposed, several
recycle structures may be postulated, a few feedstock choices may be specified, . . ..

In general we must rely on process knowledge, heuristics and thermodynamics to draw up


a superstructure.
3.1. BUILDING SUPERSTRUCTURES 46

B
C
A D
B C
C B
D
D C
D
B
A A C
B B
C C
D D
A
B A
A
C B
B
C
D A
B
C

Figure 3.1: Network of distillation columns with sharp splits representing all alternative
routes (Andrecovich and Westerberg, 1985)

CSTR

PFR

Figure 3.2: Superstructure for a CSTR/PFR system


CHAPTER 3. MODELLING DISCRETE ALTERNATIVES 47

Caveats

• Great care must be taken when formulating the superstructure as the optimum found
is dictated by the assumptions made in this stage: only alternatives embedded within
the superstructure can be selected as optimal solution!

• The performance of the optimisation algorithms used to optimise the structure is highly
dependent upon the mathematical representation of the superstructure. While it is pos-
sible to automate the derivation of a mathematical representation, obtaining a “good”
formulation still demands experience and an understanding of the optimisation algo-
rithms.

3.2 Superstructure formulation


Once a set of alternatives has been identified, a mathematical representation of this super-
structure must be formulated.

3.2.1 Continuous representation


One possibility is to derive a process model that includes all possible alternatives in a con-
tinuous manner. When this process is optimised, the large cost of selecting all the units
should force zero values for some of the flowrates at the optimal solution. The units with null
inlet and outlet flowrates can then be eliminated from the process. In this case, the discrete
decisions (should we use a CSTR, a PFR or both) are made implicitly.
There may however be some decisions which are difficult to quantify and incorporate in
the objective function using this implicit approach. In particular, one may want to impose a
minimum size on any unit, when it exists. If the minimum volume of a reactor is Vmin , we
want the optimal reactor volume V to satisfy either of these equations:

V = 0 OR V ≥ Vmin (3.1)

How can this be expressed mathematically?

3.2.2 Some definitions


Propositional logic Statements such as Eq. (3.1) can be expressed through proposi-
tional logic by relating a set of r clauses Pi , i = 1, . . . , r through the symbols shown in
the table below.

Symbol Meaning
∨ OR
⊕ EXCLUSIVE OR
∧ AND
¬ NOT

• The proposition P1 ⇒ P2 can be written as ¬P1 ∨ P2 .


• The reactor example can be expressed as (V = 0) ⊕ (V ≥ Vmin ).
3.2. SUPERSTRUCTURE FORMULATION 48

Algebraic equations Propositional logic expressions can be transformed into algebraic


equations. Algebraic equations are mathematical expressions which only involve alge-
braic operators only, i.e. +, −, ×, ÷, trigonometric functions, transcendental functions.
They can be equalities or inequalities. This requires the introduction of binary variables.

Binary variables are variables which can only take on the values 0 and 1. By convention,
they are usually denoted by y. Often, a binary variable y is associated with a process
unit as follows:

{
0 if unit does not exist
y= (3.2)
1 if unit exists

If such a variable is defined in the reactor example, Eq. (3.1) can be rewritten as the following
algebraic equation:
Vmin y ≤ V ≤ M y (3.3)

where M is a very large number (or an upper bound on reactor volume, if one is known).
When the reactor does not exist, y = 0 and Eq. (3.3) becomes 0 ≤ V ≤ 0, i.e. V = 0. When
the reactor exists, y = 1 and Eq. (3.3) becomes Vmin ≤ V ≤ M .
It is worth noting that some recent optimisation strategies can handle propositional logic
expressions directly. For further information, refer to the disjunctive programming literature
[1, 2].

3.2.3 From Propositional Logic to Algebraic Equations


3.2.3.1 Some basic transformations
Proposition Mathematical representation
¬P1 1 − y1 = 1 or y1 = 0
P1 ∨ P2 ∨ P3 y1 + y2 + y3 ≥ 1
P1 ∧ P2 ∧ P3 y1 ≥ 1, y2 ≥ 1, y3 ≥ 1
¬P1 ∨ P2 1 − y1 + y2 ≥ 1 or y1 − y2 ≤ 0
P1 ⇔ P2 or (¬P1 ∨ P2 ) ∧ (¬P2 ∨ P1 ) y1 − y2 ≤ 0, y2 − y1 ≤ 0 or y1 = y2
P1 ⊕ P2 ⊕ P3 y1 + y2 + y3 = 1
P1 ∨ P2 ⇒ P3 or (¬P1 ∨ P3 ) ∧ (¬P2 ∨ P3 ) y1 − y3 ≤ 0, y2 − y3 ≤ 0

3.2.3.2 A systematic procedure

Any propositional logic statement can be transformed into an equivalent algebraic equation
with binary variables using a systematic procedure.

Step 1 Introduce a binary variable yi for each clause Pi . If Pi is true, yi = 1. If Pi is


false, yi = 0.

Step 2 Transform the logical expressions into an equivalent conjunctive normal form
Q1 ∧ Q2 ∧ . . . ∧ Qn , where the Qj ’s (j = 1, . . . , n) are clauses that depend on the Pi ’s.
The n Q’s are independently true.
CHAPTER 3. MODELLING DISCRETE ALTERNATIVES 49

1. Replace the implication operator ⇒ by its equivalent propositional logic form:


(P1 ⇒ P2 ) ⇔ (¬P1 ∨ P2 ).
2. Apply de Morgan’s theorem to put any negation inwards:

¬(P1 ∧ P2 ) ⇔ (¬P1 ∨ ¬P2 )


¬(P1 ∨ P2 ) ⇔ (¬P1 ∧ ¬P2 )

3. Factor out the OR operator over the AND operator using:

((P1 ∧ P2 ) ∨ P3 ) ⇔ ((P1 ∨ P3 ) ∧ (P2 ∨ P3 ))

Step 3 Transform each Q into an equivalent algebraic equation or set of algebraic equa-
tions.

Illustration Transform ((P1 ∧ P2 ) ∨ P3 ) ⇒ (P4 ∨ P5 ) into algebraic equations.

Step 1 Assign y1 , . . . , y5 to P1 , . . . , P5 , respectively.

Step 2 Conjunctive normal form.

1. Transform the implication operator:

¬ ((P1 ∧ P2 ) ∨ P3 ) ∨ (P4 ∨ P5 ).

2. Apply de Morgan’s theorem:

(¬(P1 ∧ P2 ) ∧ ¬P3 ) ∨ (P4 ∨ P5 ),

((¬P1 ∨ ¬P2 ) ∧ ¬P3 ) ∨ (P4 ∨ P5 ).

3. Factor out OR operator:

((¬P1 ∨ ¬P2 ) ∨ (P4 ∨ P5 )) ∧ (¬P3 ∨ (P4 ∨ P5 )) ,

(¬P1 ∨ ¬P2 ∨ P4 ∨ P5 ) ∧ (¬P3 ∨ P4 ∨ P5 ) .


| {z } | {z }
Q1 Q2

Step 3 The equivalent algebraic equation for Q1 is 1 − y1 + 1 − y2 + y4 + y5 ≥ 1. For Q2 ,


it is 1 − y3 + y4 + y5 ≥ 1. If both these equations are satisfied, the original statement
is satisfied.

3.2.3.3 Common examples

The systematic procedure can be applied to some frequently arising situations in process
synthesis.

m ∑
m
1. Mutually exclusive alternatives: Select only one out of m: Pi , or yi = 1.
i=1 i=1


m ∑
m
2. Non-exclusive alternatives: Select at most one out of m: Pi , or yi ≤ 1.
i=1 i=1
3.2. SUPERSTRUCTURE FORMULATION 50

Figure 3.3: Plot of a linear cost function defined over a discontinuous domain

3. Contingent decisions: Pk ⇒ Pj (but Pj does not necessarily imply Pk ): ¬Pk ∨ Pj , or


yk − yj ≤ 0.

4. Either/or constraints: f1 (x) ≤ 0 or f2 (x) ≤ 0.

• Propositional logic: (f1 (x) ≤ 0 ∨ f2 (x) ≤ 0, or


• Algebraic equation:
f1 (x) ≤ M y1
f2 (x) ≤ M (1 − y1 )
where M is a large number. Note these constraints are referred to as big-M
constraints.

5. Activation and de-activation of equality constraints. If unit i exists, hi (x) = 0. Other-


wise, hi (x) is irrelevant (i.e. −∞ ≤ hi (x) ≤ +∞) — We define the non-negative slack
variables s+ and s− and write
hi (x) + s+ − s− = 0
s+ + s− ≤ U (1 − yi )
s+ , s− ≥ 0
where U is a large positive scalar.

6. Activation and de-activation of inequality constraints. If unit i exists, gi (x) ≤ 0.


Otherwise, gi (x) is irrelevant (i.e. gi (x) ≤ +∞) — We define the non-negative slack
variable s and write
gi (x) ≤ s
s ≤ U (1 − yi )
s≥0
where U is a large positive scalar.

7. Discontinuous domains: x = 0 or L ≤ x ≤ U . This is the same as the reactor example


— (x = 0) ∨ (L ≤ x ≤ U ), or Ly ≤ x ≤ U y.

8. Discontinuous functions: Fixed charge cost model with a discontinuous domain (see
Fig. 3.3): {
α + βx L ≤ x ≤ U
C=
0 x=0

• Propositional logic: ((L ≤ x ≤ U ) ∧ (C = α + βx)) ∨ (x = 0 ∧ C = 0), or


• Algebraic equations:
C = αy + βx
Ly ≤ x ≤ U y
Note that when the lower bound x is equal to zero, the combination y = 1, x = 0 will not
be chosen even though it satisfies the equations. This is because it cannot correspond to a
minimum in the objective function.
CHAPTER 3. MODELLING DISCRETE ALTERNATIVES 51

Figure 3.4: Piecewise linear continuous cost function

Figure 3.5: The superstructure for a pump network with three pump types

Exercise Construct the mathematical representation of a unit with a piecewise linear cost
function of the continuous capacity variable x, as shown in Fig. 3.4.

3.2.4 Integer Variables


Another type of decision which can arise in process synthesis are integer decisions. Consider
the example of a pump network where three different types of pumps can be selected from
a manufacturer [3]. For each pump type, we would like to know how many parallel lines we
should have and how many pumps should be installed in series (see Fig. 3.5). This problem
involves binary decisions (whether pumps of a given type should be selected at all) and integer
decisions (how many pumps in series, how many lines in parallel). We have two options in
formulating this problem mathematically:

• we could introduce integer variables (nsi ∈ {0, 1, 2, 3}, the number of pumps of type i
in series)

• we could introduce a set of binary variables to be combined equivalently.

Once again, there exists a systematic way to express an integer variable through a set of
binary variables. Consider the integer variable n such that n ∈ {0, Un } ∩ N. Define K such
that
K − 1 ≤ log2 Un < K (3.4)
and K binary variables, yk , k = 1, . . . , K. Then, the following equations can be used to define
n in terms of binary variables:
∑K
n= 2k−1 yk
k=1 (3.5)
n ≤ Un
Bibliography

[1] E. Balas. “Disjunctive Programming and a Hierarchy of Relaxations for Discrete Opti-
mization Problems”. In: SIAM Journal on Algebraic and Discrete Methods 6.3 (1985),
pp. 466–486.
[2] A. Vecchietti and I.E. Grossmann. “LOGMIP: A Disjunctive 0-1 Non-Linear Optimizer
for Process System Models”. In: Computers and Chemical Engineering 23.4-5 (1999),
pp. 555–565.
[3] T. Westerlund, F. Pettersson, and I. E. Grossmann. “Optimization of Pump Configu-
ration Problems as a MINLP Problem”. In: Computers and Chemical Engineering 18.9
(1994), pp. 845–858.
Chapter 4

Mixed Integer Programming

The general form of mixed-integer problems is given by:

min f (x, y)
x,y
s.t. h(x, y) = 0
g(x, y) ≤ 0 (4.1)
x ∈ Rn
y ∈ {0, 1}q

The x variable vector represents the continuous decisions (flowrates, equipment sizes, pres-
sure, temperature, heat duties) and the y variables represent the existence or non-existence
of process units. Problems of form (4.1) are referred to as MINLPs (mixed-integer nonlinear
programs) when at least one function in the problem is nonlinear, and MILPs (mixed-integer
linear programs) when all functions are linear.

4.1 Mixed-Integer Linear Programming (MILP)


MILPs are very commonly used in industry and are especially useful for planning and schedul-
ing problems, assignment problems. A number of very efficient algorithms exist (including
commercial software such as CPLEX or XPRESS) which can solve problems with millions
of binary variables. MILP algorithms are guaranteed to identify the best solution of the
problem (provided enough time and memory are available). The general form of an MILP
problem is
min cTx x + cTy y
x,y
s.t. Ax + By ≤ 0 (4.2)
x≥0
y ∈ {0, 1}q

4.1.1 Brute Force Approach


This approach is based on the complete enumeration of all combinations of the binary vari-
ables. Once these variables are fixed, the remaining problem is an LP which can be solved
through the Simplex method or an interior-point method, and the global solution can be
4.1. MIXED-INTEGER LINEAR PROGRAMMING (MILP) 54

q 2 5 10 20 50 100
2q 4 32 1024 106 1015 1030

Table 4.1: Evolution of the number of combinations with increasing number of binary vari-
ables

found by comparing the solutions of the LPs. There are 2q combinations to be tested. The
combinatorial explosion in the number of solutions to be tested is highlighted in Table 4.1.
Considering 50 binary variables, and assuming that it takes only 10 ms to solve each LP,
it would take 31 years to try all combinations, . . .

4.1.2 Relaxation and Rounding Approach


4.1.2.1 LP Relaxation

Let us “relax” the MILP by removing the integrality condition on the y variables. They
are now allowed to vary continuously between 0 and 1 and the resulting problem is an LP.
Note that because we have less stringent conditions on the problem, the solution we will
obtain cannot be greater than the solution of the original MILP. In some special cases, the
solution of the LP is equal to that of the MILP. A sufficient condition for this to occur is
the matrix B in problem (4.2) to be unimodular (i.e., every square non-singular matrix of
B has a determinant equal to 1). However, for general unstructured MILPs, some of the y
variables will be non-integer at the solution of the relaxed LP. This is usually the case in
process synthesis.

Exercise Consider an assignment problem were m jobs must be distributed over m ma-
chines. The cost of the assignment is determined by cost coefficients Cij corresponding to
the cost of carrying out job i on machine j. We define the binary variable yij as equal to 1
if job i is assigned to machine j and 0 otherwise. The problem can be formulated as

m ∑
m
min Cij yij
i=1 j=1

m
s.t. yij = 1, j = 1, . . . , m
i=1 (4.3)

m
yij = 1, i = 1, . . . , m
j=1
yij ∈ {0, 1}, i = 1, . . . , m, j = 1, . . . , m

The first set of equality constraints ensures that one and only one machine is assigned to each
job. The second set ensures that one and only one job is assigned to each machine. Show
that the solution of the relaxed problem is the same as that of the MILP for m = 3.

4.1.2.2 Rounding Scheme

In the general case where the solution of the relaxed problem is non-integer, one may apply
a rounding scheme such as rounding the solution to the nearest integer. However, this may
result in a sub-optimal solution, or even in an infeasible combination!
CHAPTER 4. MIXED INTEGER PROGRAMMING 55

Example Consider the following small problem:

min −1.2y1 − y2
s.t. y 1 + y2 = 1
(4.4)
1.2y1 + 0.5y2 ≤ 1
(y1 , y2 ) ∈ {0, 1}2

The solution of the relaxed problem is (y1 , y2 ) = (0.715, 0.285)T with an objective function
value of −1.148. Rounding to the nearest integer yields (y1 , y2 ) = (1, 0)T , a combination
which violates the inequality constraint. The optimal solution is in fact (y1 , y2 ) = (0, 1)T
(the opposite of what was found) and the corresponding objective function value is -1. Note

that fM ∗
ILP > fLP .

4.1.3 Branch-and-Bound Techniques


The main idea is to use a divide and conquer strategy to decision making by generating
partial solutions to the problem and eliminating unpromising regions of the solution space.
In this manner, the exhaustive enumeration of all 0-1 combinations proposed in the brute-
force approach can be avoided, and one is guaranteed to find the solution.

4.1.3.1 LP Relaxations and their Properties

Consider an LP relaxation of problem (4.2) where the binary variables in some set Ji have
been fixed (dim{Ji } < q) and the remaining binary variables are allowed to vary between 0
and 1. This subproblem Pi is given by

min cTx x + cTy y


s.t. Ax + By ≤ d
x≥0 (Pi )
0≤y≤1
yk fixed, k ∈ Ji

Consider another LP relaxation Pj derived from Pi where the set of variables Jj that have
been fixed includes Ji (Ji ⊂ Jj and dim{Jj } ≤ q). Pj is given by

min cTx x + cTy y


s.t. Ax + By ≤ d
x≥0 (Pj )
0≤y≤1
yk fixed, k ∈ Jj

We define f ∗ as the objective function value at the solution of problem (4.2). fi∗ and fj∗
denote the objective function values at the solution of problems (Pi ) and (Pj ) respectively.
Then, the following properties hold:

1. If Pi is infeasible, then Pj is infeasible.

2. If Pj is feasible, then Pi is feasible.


4.1. MIXED-INTEGER LINEAR PROGRAMMING (MILP) 56

3. If Pj is feasible, fj∗ ≥ fi∗ .

4. If y is integer at the solution of Pj , fj∗ ≥ f ∗ .

Branch-and-bound algorithms take advantage of these properties to efficiently explore the


solution space.

4.1.3.2 Basic Methodology

The branch-and-bound tree shown in Fig. 4.1 describes the process.

At the first iteration of the B&B algorithm for the MILP, problem P0 , the full LP
relaxation of the original problem is solved.

• If all binary variables are integer at its solution f0∗ , the optimal solution of (4.2)
has been found.
• Otherwise, f0∗ constitutes a lower bound on the problem.

At the second iteration, two subproblems of P0 or nodes are created by fixing one of
the binary variables, y 1 .

• In the first subproblem P1,0 , y 1 is set to zero.


• In the second, P1,1 , it is set to one.
• All other binary variables are still relaxed.

∗ and f ∗
The solutions of these two subproblems give two new tighter lower bounds, f1,0 1,1
respectively.

∗ , f ∗ }.
• The smallest known lower bound on the problem is given by fmin = min{f1,0 1,1

• If the solution of any of the relaxations is integer, it provides an upper bound on


the original problem, U .
• If U − fmin < ϵ, the solution has been found.
• If one of the lower bounds obtained is greater than U , the corresponding node
cannot lead to the optimal solution and does not need to considered any further.
• Any node with a lower bound less than U is added to a list of nodes that need to
be further analysed.

We now select one node from the list and use it to create two nodes by fixing an-
other binary variable. This enables us to generate two new tighter lower bounds, to
update the upper bound, test for convergence and to eliminate any node with lower
bound greater than the upper bound. All remaining nodes are added to the list to be
analysed further.

This procedure is repeated until the convergence test is passed.


CHAPTER 4. MIXED INTEGER PROGRAMMING 57

Relaxed LP
Level 0

y =0 y1 = 1
1

Level 1 A
y2 = 1 y2 = 0 y2 = 1
y2 = 0

Level 2
y3 = 0 y3 = 0 y3 = 0
y3 = 0 y3 = 1 y3 = 1
y3 = 1 y3 = 1

Level 3 B

Figure 4.1: Branch-and-bound tree

4.1.3.3 Algorithmic Options

Branching In the branching step, the binary variable(s) to be fixed at the new children
nodes is (are) selected. While most algorithms only fix one additional binary variable at a
time, more than one can be fixed. Ideally, we would like to fix the variable that will lead to
the greatest increase in the value of the lower bound as this speeds up convergence. However,
there is no way to detect this variable a priori. The following strategies are used in practice:

• Choose a variable randomly, or choose the first variable that has not yet been fixed
from the list of binary variables.

• Choose the most-fractional variable, i.e. the binary variable whose value at the solution
of the relaxation of the parent node is closest to 0.5. Thus, if the solution at the root
node is given by y = (1, 0.8, 0.5)T , select y3 for branching.

• Choose the variable with the greatest sensitivity.

Node selection A number of alternative criteria can be used to decide which node should
be selected for further analysis from the list of open nodes.

• Newest bound rule (depth first, LIFO): expand the most recently created node, as
shown in Fig. 4.2. Features of this rule are:

– Capability to generate a relatively large number of feasible solutions.


– Relatively small storage requirements.

• Best bound rule (“breadth first”, priority rule): Expand node with the smallest lower
bound, as shown in Fig. 4.3. Features of this rule include:

– Relatively larger storage requirement.


4.1. MIXED-INTEGER LINEAR PROGRAMMING (MILP) 58

1
10
4 2
13 12
3
14
+ 18 16
(Infeasible) (18 > 15)(16 > 15) 16
_
15 =Z (16 > 15)

Fathom Fathom Fathom


Fathom
Optimal
solution

Figure 4.2: Order of exploration of nodes using the newest bound rule

1
10
3 2
13 12
4
14
+ 18 16
(Infeasible) (18 > 15)(16 > 15) 16
_
15 =Z (16 > 15)

Fathom Fathom Fathom


Fathom
Optimal
solution

Figure 4.3: Order of node exploration using the best bound rule
CHAPTER 4. MIXED INTEGER PROGRAMMING 59

– Expansion of two nodes at each iteration.

In commercial codes, combinations of these rules are used. For instance, one may use a
depth-first approach, but follow the child node with the lowest bound rather than the left or
right branch specifically.

4.1.3.4 Performance of B&B Algorithms

Worst-case performance The maximum number of nodes that can be explored in a


branch-and-bound tree is 2q+1 − 1. Note that this is greater than the number of combinations
of the binary variables and that there is no guarantee that the branch-and-bound algorithm
will terminate before the entire tree is explored. However, this worst-case behaviour is rare.

Importance of formulation As with all optimisation problems, formulation can have a


great impact on performance. It is important to make the formulation as tight as possible.
For instance, if big-M constraints are used, M should be as small as possible. A suitable
M can often be found based on physical considerations. A valid constraint on the objective
function of the form f (x, y) ≤ f , where f is a scalar, can also help.
The problem statistics (number of variables and constraints) also affect performance.
The computational requirements depend most strongly on the number of binary variables.
The number of constraints is second in importance, followed by the number of continuous
variables.
A measure the tightness of a formulation is given by the integrality gap, the difference
between the values of the objective function at the solution of the MILP and its LP relaxation.

Efficient solution of subproblems By construction, the B&B algorithm requires the


solution of a set of very similar LP problems. This can be exploited to increase the efficiency
of the algorithm as the information gathered when solving the parent node can be used to
speed up the solution of its subproblems.

4.1.3.5 An Example Problem

Consider the following problem:

min f (y) = 3y1 + y2 + 1.5y3 + 2.4y4


s.t. 5y1 + 4y2 + 3y3 + 6y4 ≥ 9
2y1 + y2 + 4y3 + 2y4 ≥ 3
(4.5)
y2 − y4 ≤ 0
y1 + 2y2 + 7y3 ≥ 8
(y1 , y2 , y3 , y4 ) ∈ {0, 1}4

We solve this problem using the best bound rule and branching on the most fractional variable.
The procedure is illustrated in Fig. 4.4.

Iteration 1 Create children nodes by fixing y4 .

Iteration 2 Select node 2 and create children nodes by fixing y2 .


4.1. MIXED-INTEGER LINEAR PROGRAMMING (MILP) 60

0 Z=3.281, y1=0, y2 =0.609, y3=0.868, y4 =0.609


y4 =0 y4 =1
Z=4,y1 =0, y2=0.5, y3 =1
1 2
y2=0 y2=1
Infeasible
Z=4.285, y1 =0, y3 = 0.85
4
3 y3 =0 y3 =1
Z=0.5
5 6
Infeasible Z=4.5, y1=0

Figure 4.4: Solution procedure for the MILP example problem

Iteration 3 Select node 4 and create children nodes by fixing y3 . A feasible integer
solution is found for y3 = 1, giving an upper bound of 4.5. This upper bound can be
used to rule out nodes 1, 3, and 5. The only remaining node is node 6, which has an
integer solution. The optimal solution has been found!

A total of seven subproblems have been solved. Exhaustive enumeration would have required
the solution of sixteen problems.

4.1.4 Cutting-Plane Algorithms


Cutting plane algorithms start from an LP relaxation of the original problem. This relaxation
is then successively tightened by adding (linear) constraints which narrow down the feasible
region: these are called cutting planes. After a sufficient number of cutting planes have been
introduced, the solution of the relaxed problem becomes integer and therefore equal to the
solution of the original problem.
Fig 4.5 illustrates this process for a two-variable integer problem. The feasible region is
delimited by the original constraints and consists only of integer combinations of y1 and y2
(black dots). When this problem is relaxed, a non-integer solution is found for the LP. It
satisfies the original constraints apart from integrality. We can now impose another constraint
which removes the first solution from the feasible region. This gives us a second non-integer
solution. We can continue this process until an integer solution is found.

Comments on Cutting Plane Algorithms

• Unlike B&B algorithms, no feasible solution can be identified before the optimal solution
has been found.

• In addition, finite convergence of the algorithm is difficult to prove: thus, if the solution
procedure is very slow, we may end up with no solution at all. For B&B algorithms, at
least one feasible solution can usually be identified in reasonable time.

• There is no single strategy to construct cutting planes. Some methods are described in
Discrete Optimization by Parker and Rardin ( Academic Press, 1988).
CHAPTER 4. MIXED INTEGER PROGRAMMING 61

y y
2 2

y y
1 1

Original constraints Cutting plane


Solution of relaxed problem

Figure 4.5: Two iterations of a cutting plane algorithm

• Cutting plane algorithms have not been as popular as Branch and Bound algorithms
but their popularity is growing. It is possible to combine both approaches.

4.2 Mixed-Integer Non-Linear Programming (MINLP)


4.2.1 General formulation
Many physical phenomena and hence processing units are best described with non-linear
models. Thus, the structural optimisation of general process flowsheets, or even networks of
specialised units such as pumps can be most usefully represented through the general MINLP
form:
min f (x, y)
s.t. h(x, y) = 0
g(x, y) ≤ 0 (4.6)
x∈R n

y ∈ {0, 1}q
In solving problems of this type, two difficulties must be overcome: the combinatorial nature
of the problem which arises from the presence of binary variables, and, when the non-linear
functions are nonconvex, the presence of local minima.

4.2.2 Branch-and-Bound Approaches


The principle is the same as with branch-and-bound techniques for MILP problems. The
main difference is that the relaxation at each node is an NLP rather than an LP. If this NLP
is nonconvex, the solution of the relaxation can only be guaranteed to be a lower bound on the
MINLP if its global minimum can be identified. Otherwise, the B&B algorithm may converge
4.2. MIXED-INTEGER NON-LINEAR PROGRAMMING (MINLP) 62

to a local solution. In terms of efficiency, it is not as easy to update the NLPs at each node
as it is to update LPs and more effort is therefore needed at each node. Finally, note that
B&B algorithms can be used to solve problems with integer variables without reformulation.

4.2.3 The Generalised Benders Decomposition (GBD)


This algorithm was proposed by Geoffrion (1972). It can be applied to a subclass of problem
(4.6) represented by:
min f x (x) + xT Ay + cT y
x,y
s.t. hx (x) + xT By + dT y = 0
g x (x) + xT Cy + eT y ≤ 0 (4.7)
x∈R n

y ∈ {0, 1}q
The participation of the binary variables is limited to mixed-bilinear and linear terms. This
is however not restrictive as any problem of form (4.6) can be transformed in to a problem
of form (4.7).
The GBD algorithm is based on three main principles:

Partitioning of the variable set The y variables are referred to as complicating vari-
ables and handled differently from the x variables. In Geoffrion’s original work, the
complicating y variables were not restricted to binary variables, but could also be con-
tinuous. Thus, this algorithm can also be used to handle bilinear nonconvexities in a
rigorous manner.

Decomposition of the problem The problem is solved by considering two types of


derived problems: a primal problem which provides an upper bound on the MINLP,
and a master problem which provides a lower bound on the MINLP.

Iterative refinement By using the information provided by any given primal and master
problems, new primal and master problems can be constructed in such a way that the
bounds become tighter and convergence can be achieved within a finite number of
iterations.

4.2.3.1 The Primal Problem

The kth primal problem is obtained by fixing the binary variables in (4.7) to some combination
y k . Let us define f (x, y) = f x (x) + xT Ay + cT y, h(x, y) = hx (x) + xT By + dT y and
g(x, y) = g x (x) + xT Cy + eT y. The resulting primal problem is given by

min f (x, y k )
x
s.t. h(x, y k ) = 0
(P k )
g(x, y k ) ≤ 0
x ∈ Rn

This NLP provides an upper bound on (4.7). When solving the primal problem P k , two
situations can be encountered: the NLP is feasible or infeasible.
CHAPTER 4. MIXED INTEGER PROGRAMMING 63

Feasible primal problem In this case, the solution of the primal problem yields an upper
k
bound f on the MINLP, values of the continuous variables at that solution xk and values of
the optimal Lagrange multipliers at that solution λk and µk . A Lagrange function can then
be formulated as

L(x, y; λk , µk ) = f (x, y) + λkT h(x, y) + µkT g(x, y) (4.8)

Infeasible primal problem In this case, a feasibility problem is formulated and solved in
an attempt to identify a feasible or nearly feasible point:

∑ p
min αi
x,α i=1
s.t. h(x, y k ) = 0
(4.9)
g(x, y k ) ≤ α
x ∈ Rn
α≥0

The solution of this problem is greater than 0 if no feasible point can be found. At the
solution xk , the Lagrange multipliers λIP,k and µIP,k enable the specification of the following
Lagrange function:

LIP (x, y; λIP,k , µIP,k ) = λIP,kT h(x, y) + µIP,kT g(x, y). (4.10)

4.2.3.2 The Master Problem

The relaxed master problem at iteration K is constructed from the evaluation of the La-
grange functions at the solution of the primal and infeasible primal problems for all previous
iterations:
min η K
y ,ηK
s.t. η K ≥ L(xk , y, λk , µk ), k = 1, . . . , K (4.11)
0 ≥ LIP (xk , y, λIP,k , µIP,k ), k = 1, . . . , K
y ∈ {0, 1}q

This problem is an MILP with a single continuous variable and can be solved easily. η K is a
lower bound on the solution of (4.7). Since a new constraint is added to the master problem
at each iteration, the sequence of lower bounds obtained in non-decreasing. The solution
vector y K can be used to construct the primal problem for iteration K + 1.

4.2.3.3 Integer Cuts

To ensure that any combination y k of the binary variables is not generated twice, integer
cuts may be added to the set of constraints. Let Z k = {i : yik = 0} and N Z k = {i : yik = 1}.
Then the constraint
∑ ∑
yi − yi ≤ |N Z k | − 1, (4.12)
i∈N Z k i∈Z k

where |N Z k | denotes the cardinality of N Z k , makes y k an infeasible solution.


4.2. MIXED-INTEGER NON-LINEAR PROGRAMMING (MINLP) 64

Objective
function

+
Current best
upper bound
+
+
+ Upper bound (Primal)
+ + Lower bound (Master)
+
l l l l l l l l l
1 2 3 4 5 6 7 8 9 Iterations

Figure 4.6: Progress of the bounds in the GBD algorithm

4.2.3.4 Termination Criteria

Typical progress for the upper and lower bounds during a GBD run is shown on Fig. 4.6. The
algorithm can be terminated when the lower bound exceeds the current best upper bound
(i.e. the smallest upper bound) since subsequent solutions can only yield a larger objective
function.
When integer cuts are added to the relaxed master at each iteration, it is possible for
the relaxed master to become infeasible before the lower and upper bound have converged.
This means that all feasible integer combinations have been explored and that the solution
is therefore the best upper bound found so far.

4.2.3.5 An Example Problem

Consider the problem:

min −2.7y + x2
x,y
s.t. g1 = − ln(1 + x) + y ≤ 0
g2 = − ln(x − 0.57) − 1.1 + y ≤ 0 (4.13)
0≤x≤2
y ∈ {0, 1}
The feasible region and the objective function are convex as plotted in Fig. 4.7.

• Iteration 1

1. Set y (1) = 1.
2. The first primal problem is P (1) :

min −2.7 + x2
x
s.t. g1 = − ln(1 + x) + 1 ≤ 0
g2 = − ln(x − 0.57) − 0.1 ≤ 0
0≤x≤2
CHAPTER 4. MIXED INTEGER PROGRAMMING 65

x
2 x=2

g2 =0
_
1 Decreasing f(x,y)

g1 =0

|1 y
0

Feasible region

Figure 4.7: Illustration of the GBD example problem

Its solution is (x(1) , µ(1) ) = (1.7183, 9.3417, 0)T with an objective function of
0.2525. The Lagrange function is given by

L(x(1) , y, µ(1) ) = −2.7y + 2.9525 + 9.3417(−1 + y)

3. The master problem is

min η (1)
y,η (1)
s.t. η (1) ≥ L(x(1) , y, µ(1) )

Its solution gives a lower bound of η (1) = −6.3892 at y (2) = 0.

• Iteration 2

1. The primal problem P (2) is built with y (2) = 0, as specified by the solution of
the previous relaxed master. The solution of P (2) is (x(2) , µ(2) ) = (0.9028, 0, 0.6)T .
The objective function value is 0.815 which is greater than the previous upper
bound of 0.2525. The best upper bound is therefore 0.2525.
2. The relaxed master problem is

min η (2)
y,η (2)
s.t. η (2) ≥ L(x(1) , y, µ(1) )
η (2) ≥ L(x(2) , y, µ(2) )

The solution is η (2) = 0.2525 at y (3) = 1. This is equal to the upper bound and
the solution has therefore been found.

The integer cut y ≤ 0 could have been introduced in the first relaxed master problem to
eliminate y = 1 as this possibility had already been tried. This would not have changed the
solution. However, introducing the additional integer cut −y ≤ −1 in the second relaxed
master would have resulted in an infeasible problem and hence convergence.
4.2. MIXED-INTEGER NON-LINEAR PROGRAMMING (MINLP) 66

Figure 4.8: Linear representation of a convex feasible region

4.2.4 The Outer-Approximation Algorithm

The Outer-Approximation Algorithm, developed by Grossmann and coworkers (from 1986),


can be applied to problems of the form:

min f x (x) + cT y
x, y
s.t. hx (x) + dT y = 0
g x (x) + eT y ≤ 0 (4.14)
x ∈ Rn
y ∈ {0, 1}q

The general procedure in the outer-approximation algorithm is the same as in the GBD
algorithm. The main difference stems from the construction of the relaxed master problem.

4.2.4.1 The Master Problem

If we assume that the objective function and feasible region are convex, problem (4.14) can be
expressed through the linearisation (outer-approximation) of the constraints and objective
function at an infinite number of points, as shown in Fig. 4.8. A relaxation is necessary
to make the linear master problem tractable and only a reduced number of linearisations is
considered. These linearisations are taken at the solution of the primal problems, to give the
following relaxed master problem at the K th iteration:

min cT y + η K
x,y ,ηK 
η K ≥ f x (xk ) + ∇f x (xk )T (x − xk ) 

[ ]
s.t. T k hx (xk ) + ∇hx (xk )T (x − xk ) + dt y ≤ 0 ∀k = 1, . . . , K (4.15)


g x (xk ) + ∇g x (xk )T (x − xk ) + eT y ≤ 0
x ∈ Rn
y ∈ {0, 1}q
CHAPTER 4. MIXED INTEGER PROGRAMMING 67

 

 −1 if λi < 0 
k

where T k is the relaxation matrix, a diagonal matrix where tii = +1 if λki > 0 for

 0 if λk = 0  
i
i = 1, . . . , m, and where λki denotes the Lagrange multiplier for the ith equality constraint at
the solution of the kth primal problem. This matrix essentially transforms equality constraints
into inequality constraints without modifying the solution of the problem. In the case where
the primal problem is infeasible, the linearisation of the constraints at the solution of the
feasibility problem is added but the objective function is not linearised.
The relaxed master problem is an MILP which involves the x and y variables as well
as η K . Its solution is a lower bound on the solution of (4.14), provided that the convexity
conditions are met. As in the GBD algorithm, the lower bounds increase monotonically as
the iterations proceed.

4.2.4.2 Termination Criteria

The algorithm terminates when the lower bound is greater than or equal to the upper bound
on the solution. Note that when the convexity conditions are met, the two bounds are exactly
equal at the end of the run. When integer cuts are used, the infeasibility of the relaxed master
also indicates convergence.

4.2.4.3 An Example Problem

We now solve problem (4.15) using the outer-approximation algorithm.


• Iteration 1

1. Select y (1) = 1.
2. The first primal problem P (1) is the same as with the GBD and therefore yields
the same upper bound of 0.2525 at x(1) = 1.7183. We linearise the constraints
and the objective function using the gradient information generated in the NLP
solver:
f xL (x) = f (x(1) ) + 2x(1) (x − x(1) ) = 3.4366x − 2.9526
g1L (x, y) = g1 (x(1) , y) − 1+x1 (1) (x − x(1) ) = −0.3679 − 0.3679x + y
g2L (x, y) = g2 (x(1) , y) − x(1) −0.57
1
(x − x(1) ) = 0.2581 − 0.8709x + y

3. The relaxed master problem is then


min −2.7y + η (1)
x,y,η (1)
s.t. η (1) ≥ 3.4366x − 2.9526
−0.3679 − 0.3679x + y ≤ 0
0.2581 − 0.8709x + y ≤ 0
0≤x≤2
y ∈ {0, 1}
The solution to this problem has an objective function value of -1.9339 at x =
0.29637, y = 0. Note that this lower bound is greater than that of the GBD
algorithm after the first iteration (-6.3891).
4.2. MIXED-INTEGER NON-LINEAR PROGRAMMING (MINLP) 68

A2 B2
2
B
A 1 C

3 B3
A3 B1 (purchased)

Figure 4.9: Superstructure for the process synthesis example

• Iteration 2

1. Solve the primal problem P (2) with y (2) = 0. The solution is the same as with the
GBD. Linearise the objective function and constraints around this point.
2. Solve the relaxed master problem:

min −2.7y + η (2)


x,y,η (2)
s.t. η (2) ≥ 3.4366x − 2.9526
η (2) ≥ 1.8056x − 0.8150
−0.3679 − 0.3679x + y ≤ 0
−0.1689 − 0.5255x + y ≤ 0
0.2581 − 0.8709x + y ≤ 0
2.7130 − 2.3256x + y ≤ 0
0≤x≤2
y ∈ {0, 1}

The solution of this problem gives a lower bound of 0.2525. The algorithm can
terminate.

4.2.5 Comparison of the GBD and OA Algorithms


The lower bounds generated by the OA are usually tighter than those obtained by the GBD.
The OA therefore tends to require a smaller number of iterations. However, the Master
problem for the OA involves more constraints and may be more difficult to solve.

4.2.5.1 A Process Synthesis Example

It is proposed to manufacture a chemical C with a process 1 that uses raw material B. B can
either be purchased or manufactured with either two processes, 2 or 3, which use chemical A
as a raw material. The task is then to find the selection of processes and production levels
that maximise total profit. The superstructure for this problem is shown in Fig. 4.9.

Data

Conversions Let FA , FB and FC denote the flowrates of A, B and C in tons/hr.


CHAPTER 4. MIXED INTEGER PROGRAMMING 69

Iteration 1 2 3 4 5
GBD -27.33 -23.83 -11.85 -2.72 -1.92
OA/ER -3.71 +∞ — — —

Table 4.2: Progress of the lower bound for the GBD and OA algorithms

• Process 1: FC = 0.9FB
• Process 2: FB = ln(1 + FA )
• Process 3: FB = 1.2 ln(1 + FA )

Maximum Capacity 5 tons/hr of B produced by 2 and 10 tons/hr produced by 3.

Prices A $1800/ton; B $7000/ton; C $13000/ton.

Maximum Demand 1 ton/hr.

Fixed cost (103 $/hr) Variable cost (103 $/ton)


Process 1 3.5 2.0
Investment Cost
Process 2 1.0 1.0
Process 3 1.5 1.2

Formulation Let us define three binary variables y1 , y2 and y3 . These are equal to 1 if
processes 1, 2 and 3 exist respectively, and 0 otherwise. The profit is given by

f (F , y) = 13FC − 7FB1 − 1.8(FA2 + FA3 ) − (3.5y1 + 2FC + y2 + FB2 + 1.5y3 + 1.2FB3 ).

The mass balances are


FC = 0.9FB
FB2 = ln(1 + FA2 )
FB3 = 1.2 ln(1 + FA3 )
FB = FB1 + FB2 + FB3
The logical constraints are
F C ≤ y1
FB2 ≤ 5y2
FB3 ≤ 10y3
y2 + y3 ≤ 1
Note that since we want to maximise profit, we minimise the negative of the profit.

Solution outline Choose a starting point of y 1 = (1, 1, 0)T . The sequence of lower bounds
for the two algorithms is shown in Table 4.2. The optimal solution has a profit of $1920/hr
with binary variables y = (1, 0, 1)T . The corresponding superstructure is shown in Fig. 4.10.
4.2. MIXED-INTEGER NON-LINEAR PROGRAMMING (MINLP) 70

B
A 1 C

3 B3
A3

Figure 4.10: Optimal superstructure for the process synthesis problem


Chapter 5

Deterministic global optimisation

This chapter is co-authored with Nikolaos Kazazakis.

The behaviour of many physical systems may be expressed as the solution to an optimi-
sation problem. Out of those, it is often the case that the global solution of a problem is of
special significance. In fact, in order to correctly predict the behaviour of numerous systems
(e.g. phase equilibria, path finding, crystal prediction), it is necessary to locate the global
optimum of an optimisation problem. In other systems (e.g., financial modelling, product
design), the difference between a locally optimal solution and the global one may translate
into the loss of millions of pounds.
There are numerous alternative ways in which this problem may be approached. The first
is to create a linear model of the system one is attempting to describe. This linear model may
be reliably solved to global optimality, even for millions of variables, using a linear solver (e.g.
CPLEX, Gurobi), but the modelling accuracy may be very low. The second is to create a
nonlinear model which adequately describes the complexity, but is difficult to solve to global
optimality. To overcome this difficulty, numerous local optimisation runs may be attempted
in order to locate a reasonably good solution, or a global optimisation algorithm may be
employed. Some of these algorithms provide a theoretical guarantee of convergence to the
global solution under certain circumstances.
Neumaier [10] provided the following classification of global optimisation algorithms,
based on the degree of rigour with which they approach the problem:

1. An incomplete method uses clever intuitive heuristics for searching but has no safe-
guards if the search gets stuck in a local optimum (e.g, an interior-point algorithm), or
one of the many guided search methods such as particle swarm optimisation (PSO) or
genetic algorithms (GA).

2. An asymptotically complete method reaches a global optimum with certainty, or at


least with probability one, if allowed to run indefinitely long, but the user has no means
of knowing when a global solution has been found (e.g., a multistart algorithm with
uniform distribution of the starting points).

3. A complete method reaches a global optimum with certainty, assuming exact compu-
tations and indefinitely long run-time, and can guarantee after a finite time that an
5.1. SPATIAL BRANCH-AND-BOUND METHODS 72

approximate global solution has been found, within prescribed tolerances (e.g., the
αBB algorithm [6, 3], which will be discussed in this chapter).

4. A rigorous method reaches a global optimum with certainty and within given toler-
ances even in the presence of rounding errors, except in near-degenerate cases, where
the tolerances may be exceeded. Because every part of a software implementation needs
to be rigorous not only algorithmically, but also with respect to numerical calculations
(including all included software libraries), software which implements this type of algo-
rithms is very difficult to develop.

Thus, if it is necessary to achieve certainty that a global solution has been located, a complete
or rigorous methods must be employed. In this chapter, we will focus on deterministic
global optimisation (DGO) algorithms that belong to the classes of complete methods. Such
methods are based on a detailed theoretical analysis of the functions involved in the problem.
Because global optimisation problems are very difficult to solve, numerous heuristics (rules-
of-thumb) are typically combined with theoretical elements in order to accelerate the solution
and solve the problem in reasonable time. Thus, in this chapter will introduce theoretical
underestimation methodologies as well as some of the most commonly used heuristic methods
in modern DGO. In the sections which follow, it is assumed that the optimisation problem
is the minimisation problem P :

P : min f (x)
x∈X
s.t. g(x) ≤ 0
h(x) = 0
X = [xL , xU ] (5.1)

where all functions f, g, h are general non-linear twice differentiable1 non-infinite functions,
defined over a hyper-rectangular domain X ⊂ RN , and N is the number of variables.

5.1 Spatial branch-and-bound methods


One of the broadest and most intuitive classes of DGO methods is that of spatial branch-
and-bound methods. The methods which fall under this category build on three fundamental
principles: (i) the search of the global optimum is performed by branching on the variables
in the solution space, (ii) bounds on the range of the objective function are derived within
those spatial branches, and (iii) certain sub-domains are discarded based on this information.
The combination of these concepts is referred to as the branch-and-bound methodology.
This procedure is illustrated in Figure 5.1 for a function in one variable, i.e., solving

Punconstrained : min f (x). (5.2)


x∈X
1
There exist DGO methods for other classes of problems (e.g., with zero-order or first-order differentiability,
or with differential-algebraic equations), but the availability of second-order information is valuable because
it may be used to create algorithms with second-order convergence properties.
CHAPTER 5. DETERMINISTIC GLOBAL OPTIMISATION 73

The first step is to derive lower and upper bounds on the objective function that are valid
over the whole range of the variable (First iteration). The two bounds are derived as follows:
(i) the upper bound f¯∗ on the value of the objective function, typically acquired by local
minimisation of the original problem, and (ii) a rigorous lower bound f ∗ on the range of
the possible values that the objective function may acquire within that region. The lower
bound may be calculated by constructing a convex relaxation f˘ of the original function
f : f˘(x) ≤ f (x), ∀x ∈ X. We will discuss in some detail how this can be constructed later in
the chapter. Because this new function is convex and smaller than or equal to f everywhere
in X, it is possible to extract a lower bound on the value of f by solving the corresponding
local optimisation problem2 :

P̆unconstrained : min f˘(x) (5.3)


x∈X

At the first iteration, the upper bound we find is the best known upper bound on the global
solution of the problem, f U B and we set f U B = f¯∗ . Furthermore, we know that f ∗ is a valid
lower bound on f for all x ∈ X and we can set the best overall lower bound f LB accordingly:
f LB = f ∗ . We can thus state that the value of the objective function at the global solution,
f ∗ , is such that f LB ≤ f ∗ ≤ f U B .
Then, in the second iteration, one “branches on the variable”, i.e., divides the solution
space X ⊂ R in two subregions, with x ∈ X2 as denoted by “Region 2” in the figure and
with x ∈ X3 as denoted by “Region 3”. This is followed by bounding steps in each subregion,
where once again two types of bounds need to be calculated: (i) an upper bound, f¯i∗ , which
will be valid for the original space X and may again be obtained by local optimisation; (ii) a
lower bound on the current subregion, obtained by constructing relaxations of the function
for this specific region. For instance, for Region 2, the lower bound will be derived from a
convex relaxation f over X2 , i.e., f˘(x) ≤ f (x), ∀x ∈ X2
The corresponding problem over subregion i (x ∈ Xi ) is given by:

P̆unconstrained : min f˘(x) (5.4)


x∈Xi

Once a valid upper bound and lower bound are available for each branch, the next step
is to determine whether this information allows any reduction of the solution space. In other
words, we ask the following question: “given these bounds, is there any branch Xi , i = 2, 3
where it is impossible to find a global solution?”.
To answer this, we first update the value of the best known upper bound, such that
{ }
f U B = min f U B , f¯2∗ , f¯3∗ . In the example illustrated in Figure 5.1, the upper bound has
remained the same as at Iteration 1. Considering the lower bounds in the figure, we see that
the lower bound generated in Region 3 is greater than the best upper bound. Therefore, we
can reliably infer that it is impossible to find the global solution in Region 3, and remove that
region from our search. This process of removing a region from the search tree is referred to
as fathoming that region. Furthermore, we can conclude that the lower bound on f in Region
2, f ∗2 is a valid lower bound on the global solution for all x ∈ X and update f LB = f ∗2 .
2
Because the problem is convex, the local solution is equivalent to the global one, and can be reliably
located by means of a local optimisation method.
5.2. CONVEX RELAXATION OF FUNCTIONS 74

First iteration Second iteration

Upper bound
Upper bound

Lower bound
Lower bound
1111111Region 1
0000000 0000
1111
00
11
0000Region 3 00
1111
111111111
000000000 0000Region 2 11
1111 00 11
Objective function
Convex relaxation of the objective function
Previous relaxation of the objective function

Region containing the global optimum solution Fathomed region

Figure 5.1: Illustration of the branch-and-bound methodology for an 1-dimensional function

This process may be repeated iteratively, until the separation distance f U B −f LB becomes
smaller than a predefined tolerance ϵ, in which case the algorithm achieves ϵ−convergence.
If the branch (node) where the global minimum is located is also detected to be convex, the
algorithm can achieve full convergence, as shown in Figure 5.2.

5.2 Convex relaxation of functions


As seen in Figure 5.1, it is essential to have tools to convexify functions in deriving bounds.
Because of the complexity involved in solving DGO problems, convexification of functions is
a very active field of research, offering a thesaurus of methods to generate lower bounds and
to convexify feasible regions (Section 5.3.1).

5.2.1 Bounding with interval arithmetic


The simplest convex relaxation of a function is a constant lower bound. For example, a valid
lower bound on f (x) = x2 over x ∈ [−5, 5] is simply f˘(x) = 0.
Constant bounds that are rigorously valid can be derived from interval arithmetic, a
concept that was introduced by Ramon Moore, not in the context of global optimisation,
but as a means to control the errors associated with computer calculations. Its use has
since been extended to numerous applications such as optimisation, the solution of systems
of nonlinear equations and bounded-error parameter estimation. The central concept of
interval computations is the generation of a guaranteed range for a function given ranges for
its variables. The enclosure of f (a) for a ∈ [a, a] is denoted by f ([a, a]).
CHAPTER 5. DETERMINISTIC GLOBAL OPTIMISATION 75

2 3

5 6 7 4

9 8

10

i Node visited at iteration i

Fathomed node
111
000
000
111
000
111 Optimal node
000
111

Figure 5.2: Example of node exploration in a branch-and-bound tree.

A set of simple rules has been developed to perform operations on intervals. Given two
intervals [a1 , a1 ] and [a2 , a2 ], the basic interval arithmetic operations are

• [a1 , a1 ] + [a2 , a2 ] = [a1 + a2 , a1 + a2 ],

• [a1 , a1 ] − [a2 , a2 ] = [a1 − a2 , a1 − a2 ],

• [a1 , a1 ].[a2 , a2 ] = [min{a1 a2 , a1 a2 , a1 a2 , a1 a2 }, max{a1 a2 , a1 a2 , a1 a2 , a1 a2 }].

• [a1 , a1 ]/[a2 , a2 ] = [min{a1 /a2 , a1 /a2 }, max{a1 /a2 , a1 /a2 }], for a2 ≥ 0.

Ranges for trigonometric and transcendental functions can be computed based on their known
behaviour (monotonicity properties or location of extrema). Composite functions constructed
from these simple operations can then be enclosed through recursive evaluations.

Examples
As an illustration, enclosures for two non-linear expressions are computed for (x1 , x2 ) ∈
[−1, 2] × [−1, 1].

1. Enclosure for − cos x1 sin x2 :


Each part of the expression is evaluated separately.
cos([−1, 2]) = [cos(2), 1]. Hence − cos([−1, 2]) = [−1, − cos(2)] = [−1, 0.416148].
Similarly, sin([−1, 1]) = [sin(−1), sin(1)] = [−0.841472, 0.841472].
Therefore, the enclosure for the expression is given by

[min{−1 (−0.841472), −1 0.841472, 0.416148 0.841472, 0.416148 − 0.841472},


max{−1 (−0.841472), −1 0.841472, 0.416148 0.841472, 0.416148 − 0.841472}
= [−0.841472, 0.841472].
5.2. CONVEX RELAXATION OF FUNCTIONS 76

2. Enclosure for − sin x1 cos x2 + 2x2


(x22 +1)2
:

Since sin([−1, 2]) = [sin(−1), 1] = [−0.841472, 1], − sin([−1, 2]) = [−1, 0.841472].
Moreover, cos([−1, 1]) = [cos(1), 1] = [0.540301, 1].
Hence, − sin([−1, 2]) cos([−1, 1]) = [−1, 0.841472].
2[−1, 1] = [−2, 2] and ([−1, 1]2 + 1)2 = ([0, 1] + 1)2 = [1, 2]2 = [1, 4].
2[−1,1] [−2,2]
Thus ([−1,1] 2 +1)2 = [1,4] = [min{−2/1, −2/4}, max{2/1, 2/4}] = [−2, 2].
2[−1,1]
Finally, − sin([−1, 2]) cos([−1, 1]) + ([−1,1]2 +1)2
= [−1, 0.841472] + [−2, 2]
= [−3, 2.841472].

5.2.2 McCormick relaxations


One of the first methods used to convexify non-linear functional forms was proposed by
McCormick [8]. McCormick was interested in solving problems involving factorable functions,
of which the simplest example is a bilinear term (i.e., of type x1 x2 ). This type of function
is ubiquitous in numerous practical applications (e.g, pooling problems). He proposed that,
given a bilinear term x1 x2 , illustrated in Figure 5.3, a convex relaxation of that term over
(x1 , x2 ) ∈ [xL
1 , x1 ] × [x2 , x2 ] could be derived by replacing x1 x2 by a variable wB such that :
U L U

wB ≥ x L
1 x2 + x2 x1 − x1 x2
L L L

wB ≥ x U
1 x2 + x2 x1 − x1 x2
U U U
(5.5)

In other words, the value of x1 x2 is bounded from below as follows:

1 x2 + x2 x1 − x1 x2 ; x1 x2 + x2 x1 − x1 x2 }
wB = max{xL L L L U U U U
(5.6)

Al-Khayyal and Falk [2] later showed that this relaxation is the convex hull 3 of the bilinear
expression. An illustration of the containment set that these constraints generate is given in
Figure 5.4.
Exercise Show that

x1 x2 ≥ xL
1 x2 + x2 x1 − x1 x2 ,
L L L

x1 x2 ≥ xU
1 x2 + x2 x1 − x1 x2 .
U U U
(5.7)

In a similar fashion, the concave relaxation of a bilinear term may be expressed as:

wB ≤ x U
1 x2 + x2 x1 − x1 x2
L U L

wB ≤ x L
1 x2 + x2 x1 − x1 x2
U L U
(5.8)

Thus, a problem where all the non-linearities involve only bilinear terms may be relaxed by
constructing a linear problem where every one of the bilinear terms (e.g., term j involving
variables xi and xi′ is replaced by a new variable wBj , and a matching set of linear constraints.
Each wBj has a bijective relationship with the set of variables it is replacing, i.e., with xi xi′ ,
3
The tightest possible convex set which contains the original function.
CHAPTER 5. DETERMINISTIC GLOBAL OPTIMISATION 77

x1 x2

1
0.5
0
-0.5
-1
1
0.5
-1 0
-0.5
0 -0.5 x2
0.5 -1
1
x1 Figure 5.3: Illustration of a bilinear term x1 x2 .

wB

1
0.5
0
-0.5
-1
1
0.5
-1 0
-0.5
0 -0.5
x2
0.5 -1
x1 1
Figure 5.4: Illustration of the McCormick containment set for a bilinear term.
5.2. CONVEX RELAXATION OF FUNCTIONS 78

and this term will be substituted by the same wBj wherever it appears in the problem.

Example: Generate the convex relaxation of the following problem:

min x1 x2 + x1 x3
x∈X
s.t. x1 x2 ≤ 0
x1 x3 ≤ 0 (5.9)

This problem has two bilinear terms, x1 x2 , and x1 x3 . Let wB1 and wB2 be the variables which
substitute each of them respectively. Then, the convex relaxation of the original problem is:

min wB1 + wB2


x∈X
wB ∈W
s.t. wB1 ≤ 0
wB2 ≤ 0
wB1 ≥ xL
1 x2 + x2 x1 − x1 x2
L L L

wB1 ≥ xU
1 x2 + x2 x1 − x1 x2
U U U

wB2 ≥ xL
1 x3 + x3 x1 − x1 x3
L L L

wB2 ≥ xU
1 x3 + x3 x1 − x1 x3
U U U
(5.10)

Although the relaxed problem is bigger, in the sense that it has more variables and constraints,
it is much easier to solve than the original one, because all constraints are linear.

5.2.2.1 Other multilinear terms

Underestimating trilinear terms In a similar fashion, Maranas and Floudas [7] derived
a tight relaxation for a trilinear term x1 x2 x3 by decomposing it into two bilinear terms,
wB = x1 x2 and wT ∈ WT = wB x3 :

wT ≥ x 1 x L
2 x3 + x1 x2 x3 + x1 x2 x3 − 2x1 x2 x3
L L L L L L L L

wT ≥ x 1 x U
2 x3 + x1 x2 x3 + x1 x2 x3 − x1 x2 x3 − x1 x2 x3
U U L U L U L L U U U

wT ≥ x 1 x L
2 x3 + x1 x2 x3 + x1 x2 x3 − x1 x2 x3 − x1 x2 x3
L L U L U L U U L L L

wT ≥ x 1 x U
2 x3 + x1 x2 x3 + x1 x2 x3 − x1 x2 x3 − x1 x2 x3
L U U L U L U L U U U

wT ≥ x 1 x L
2 x3 + x1 x2 x3 + x1 x2 x3 − x1 x2 x3 − x1 x2 x3
U L L U L U L U L L L

wT ≥ x 1 x L
2 x3 + x1 x2 x3 + x1 x2 x3 − x1 x2 x3 − x1 x2 x3
U L U U U L L U U U U

wT ≥ x 1 x U
2 x3 + x1 x2 x3 + x1 x2 x3 − x1 x2 x3 − x1 x2 x3
L U L L L U U L L L L

wT ≥ x 1 x U
2 x3 + x1 x2 x3 + x1 x2 x3 − 2x1 x2 x3
U U U U U U U U
(5.11)

where wT ∈ WT , and WT is the set of all variables which substitute trilinear terms.

Underestimating fractional terms Fractional terms xx12 may be underestimated by intro-


ducing a new variable wF and two new constraints [7] to the relaxed problem, which depend
CHAPTER 5. DETERMINISTIC GLOBAL OPTIMISATION 79

on the sign of the bounds on x1 . For xL2 > 0, these equations are:
 L
 x1 xL

 + x1
− 1
1 ≥0
, xL
 x2 U
x2 xU


2
 U1 − L1 U + L1 , xL
x L x x L
x 2
1 <0
wF ≥ xx2U x2 x2 x2
(5.12)
 x U

 x2 + xL2 − xL2 , x1 ≥ 0
 1 x1 1 U



 xL1 − xL1 xU2 + xU1 , xU < 0
U U

x2 x2 x2 x2 1

where wF ∈ WF , and WF is the set of all variables which substitute fractional terms.

Underestimating fractional trilinear terms For the underestimation of fractional tri-


linear terms, eight constraints must be added to the relaxation [7]. A fractional term xx1 x3 2 is
1 , x2 ≥ 0 and x3 > 0 are:
replaced by a new variable wF T , and the new constraints for xL L L

x1 xL xL
1 x2 xL L
1 x2 xL
1 x2
L
wF T ≥ 2
+ + − 2
xU3 xU
3 x3 xU3
x1 xL xL
1 x2 xL U
1 x2 xL
1 x2
U xL
1 x2
L
wF T ≥ 2
+ + − −
xU3 xL3 x3 xL3 xU3
x1 xU xU
1 x2 xU L
1 x2 xU
1 x2
L xU
1 x2
U
wF T ≥ 2
+ + − −
xL3 xU
3 x3 xU3 xL3
x1 xU xU
1 x2 xL U
1 x2 xL
1 x2
U xU
1 x2
U
wF T ≥ 2
+ + − −
xU3 xL3 x3 xU3 xL3
x1 xL xL
1 x2 xU L
1 x2 xU
1 x2
L xL
1 x2
L
wF T ≥ 2
+ + − −
xU3 xL3 x3 xL3 xU3
x1 xU xU
1 x2 xL U
1 x2 xL
1 x2
U xU
1 x2
U
wF T ≥ 2
+ + − −
xU3 xL3 x3 xU3 xL3
x1 xL xL
1 x2 xU L
1 x2 xU
1 x2
L xL
1 x2
L
wF T ≥ 2
+ + − −
xU3 xL3 x3 xL3 xU3
x1 xU xU
1 x2 xU U
1 x2 xU
1 x2
U
wF T ≥ 2
+ + − 2 (5.13)
xL3 xL3 x3 xL3
where wF T ∈ WF T , and WF T is the set of all variables which substitute fractional trilinear
terms. The full set of constraints for the remaining cases may be found in [1].

5.2.3 Univariate concave function


An important class of nonconvex functions are concave functions in one variable (univariate
concave functions), such as f (x) = ln(x) or f (x) = −x2 . Univariate concave functions may
be trivially underestimated by a linear function at the lower bound of the variable range.
The convex envelope of the concave function fU T (x), x ∈ [xL , xU ] is :
fU T (xU ) − fU T (xL )
f˘U T (x) = fU T (xL ) + (x − xL ) (5.14)
xU − xL
The construction of the tightest possible convex underestimator for a univariate concave
function does not require the introduction of additional variables or constraints. It can
simply be replaced by f˘U T (x).
5.2. CONVEX RELAXATION OF FUNCTIONS 80

5.2.4 Underestimation of general functions: the αBB underestimator


The αBB convex relaxation [6, 3] is a hybrid method which embodies some of the techniques
described above, while providing three main advantages: (i) it does not significantly increase
the size of the problem, (ii) it may be used to relax any continuously twice-differentiable
expression, and (iii) the separation gap is proven to reduce quadratically every time a node
is branched on.
Given a twice-continuously differentiable function f (x), the following functional form is
used to construct a convex relaxation:


N

i − xi )(xi − xi )
αi (xL U
L(x) = f (x) + (5.15)
i=1
| {z }
convexifying quadratic q(x)

where N is the number of variables of f , and α is a vector of non-negative real scalars.


The reasoning behind this formulation is to superpose a convexifying quadratic of sufficient
magnitude (as determined by the αi values) along every basis vector, onto f . The components
of this α vector are dependent on the variable bounds in X.
To investigate the convexity of L, we examine the matrix of the second-order derivatives
of this twice-differentiable function (the Hessian matrix). First, we define the Hessian matrix
Hf (x) derived from f (x):
 
∂2 ∂2 ∂2
∂x21
f (x) ∂x1 ∂x2 f (x) ... ∂x1 ∂xN f (x)
 
 ∂2
∂x2 ∂x1 f (x)
∂2
f (x) ... ∂2
f (x)

Hf (x) =  ∂x22 ∂x2 ∂xN 

 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
2 2 2
∂ ∂ ∂
∂xN ∂x1 f (x) ∂xN ∂x2 f (x) . . . ∂x 2
N
f (x)

 
h11 (x) h12 (x) . . . h1N (x)
 h21 (x) h22 (x) . . . h2N (x) 
= 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
hN 1 (x) hN 2 (x) . . . hN N (x)
Then the Hessian matrix H(x) derived from L(x) is given by

H(x) = Hf (x) + 2αI,

where I is the identify matrix. In expanded form, this is:


 
h11 (x) + 2α1 h12 (x) ... h1N (x)
 h (x) 
 21 h22 (x) + 2α2 . . . h2N (x) 
H(x) =  
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
hN 1 (x) hN 2 (x) . . . hN N (x) + 2αN

To ensure that H(x) remains positive semi-definite for all x ∈ [xL , xU ], and hence that L(x)
is convex over this domain, we must ensure that the minimum eigenvalue of H(x) over all x
is non-negative. First consider a given vector x and a vector α with identical elements, i.e.,
α = αi = αi−1 , i = 2, . . . , N . If we define λxmin (x) to be the minimum eigenvalue of Hf (x)
(please note the subscript f ) for this given x, then the corresponding minimum eigenvalue for
CHAPTER 5. DETERMINISTIC GLOBAL OPTIMISATION 81

H(x) is λxmin (x) + 2α. Thus L is convex at x if and only if λxmin + 2α ≥ 0. Now considering
all x, we seek λmin = min λxmin (x) and need α such that λmin + 2α ≥ 0, i.e.,
x∈[xL ,xU ]
{ }
1
α ≥ max 0, − λmin
2

Rigorous calculation of α. For one- or two-dimensional functions, it is easy to derive a


valid value for α as shown in the following example.

Example Let f (x, y) = x3 − xy 2 with x × y ∈ [−2, 3] × [1, 4]. Then L(x, y) = x3 − xy 2 +


α(x + 2)(x − 3) + α(y − 1)(y − 4).
To find an α value, first derive the Hessian matrix from f :
[ ]
6x −2y
Hf (x, y) =
−2y 2x

Then compute the minimum eigenvalue by finding det(Hf (x, y) − λI) = 0.


[ ]
6x − λ −2y
det = (6x − λ)(2x − λ) + 4y 2 = 0.
−2y 2x − λ

Hence √
4x − 16x2 − 4(−12x2 + 4y 2 )
λxmin (x, y) =
2
Using interval arithmetic, we can get a lower bound on λxmin (x, y) over all x and y:

4(−2) − 16 × 32 − 4(−12 × 32 + 4 × 12 )
λmin ≥ = −15.833
2

Therefore α = max{0, − 12 (−15.833)} = 7.9165 gives a convex relaxation.


For higher-dimensional functions, it is easier to derive α values from the interval Hessian
matrix [Hf (X)], which is obtained by carrying out an interval evaluation of each element of
Hf (x) for x ∈ X [1]. In the example above, the interval Hessian matrix is given by
[ ] [ ]
6[−2, 3] −2[1, 4] [−12, 18] [−8, −2]
[Hf ] = =
−2[1, 4] 2[−2, 3] [−8, −2] [−4, 6]

A single value of α = max(0, − 21 λmin ), where λmin is the minimum eigenvalue over the set of
scalar Hessian matrices in [Hf ], i.e., it is given by:

λmin = min λmin (Hf ),


Hf ∈[Hf ]

where λmin (Hf ) is the minimum eigenvalue of Hf .


These components may be rigorously calculated for the general case using interval arith-
metic (see Section 5.2.1) as a means of deriving the components of α, as follows.
5.3. THE αBB ALGORITHM 82

Theorem 5.1 (Scaled Gerschgorin) For any positive vector d and a symmetric interval
matrix [Hf ], define the vector α as:
  
1 ∑ d
αi = max 0, − hii − |h|ij 
j
(5.16)
2 di
j̸=i

where |h|ij = max{|hij |, |hij |}. Then, for all Hf ∈ [Hf ], the matrix H = Hf + 2∆ with
∆ = diag(αi ) is positive semi-definite. For simplicity, d=1 can be chosen.

Applying this to the example, with d = 1, we find that:

1
α1 = max(0, − (−12 − 8)) = 10
2

1
α2 = max(0, − (−6 − 8)) = 7
2
This yields the following convex relaxation:

L(x, y) = x3 − xy 2 + 10(x + 2)(x − 3) + 7(y − 1)(y − 4).

Note that using a single value of α = max(α1 , α2 ) = 10 also gives a valid convex relaxation:

L(x, y) = x3 − xy 2 + 10(x + 2)(x − 3) + 10(y − 1)(y − 4).

5.3 The αBB algorithm


Let us now consider how to integrate the convex relaxations we have seen so far into a
branch-and-bound algorithm. We take the example of the αBB algorithm [1] but other
branch-and-bound algorithms work on the same principle.

5.3.1 Convexification of the feasible region


Convexification of a function is just one step towards formulating a convex relaxation of the
full non-convex problem (P ). In order to achieve this, two types of convexifications must
be performed: (i) convexification of the objective function, and (ii) convexification of the
constraints of the problem. The methods by which both these procedures are performed are
very similar, but they affect the problem in a different way. The distinction is illustrated in
Figure 5.5. The objective function has no effect on the feasible region of the problem; the
feasible region is solely determined by the constraints4 .
When convexifying the objective function, we do so because we want to calculate a rigorous
lower bound on the value of that function. When convexifying constraints however, the
motivation is not to calculate bounds on their values (although this information may be used
by heuristics). The convexification of a constraint aims to convexify the feasible region that
this constraint defines. This is necessary because convex analysis dictates that an optimisation
problem is convex if and only if it is defined over a convex set. Hence, if the feasible region
CHAPTER 5. DETERMINISTIC GLOBAL OPTIMISATION 83

Objective
function

Feasible
region

Figure 5.5: Convexification of an objective vs. convexification of the feasible region.

on which the relaxed problem is defined is not convex, our local solver may get stuck at a
local minimum, and produce bounds that are not valid.
Because constraints define a region, constraint types are important. For instance, x2 ≤ 0
is a convex constraint, but x2 ≥ 0 (equivalently −x2 ≤ 0) is a concave one, and must be
convexified. More generally, convexified constraints ğ(x, w) ≤ 0 always define a convex
feasible region. This is not the case however for non-linear equality constraints which always
define a non-convex (non-simply connected) region. Any equality constraint h which, after
any substitutions, still has non-linear terms, must be re-written as two inequalities of opposite
signs, i.e.,

{
h+ (x) = h(x) ≤ 0
(5.17)
h− (x) = −h(x) ≤ 0

and these inequalities must be underestimated independently as:

{
h̆+ (x) ≤ 0
(5.18)
h̆− (x) ≤ 0

5.3.2 The αBB lower bounding problem

While the αBB functional form is very general, specialised relaxations like the McCormick
constraints may contribute to making the algorithm converge much more quickly. Thus, some
types of non-convex terms are treated in a different way. In the αBB algorithm [1] formula-
tions for a variety of frequently encountered special cases of non-convex terms are considered:
(i) bilinear, (ii) trilinear, (iii) fractional, (iv) fractional trilinear, and (v) univariate concave
terms. Combining these techniques, a full convex lower bounding problem may be formulated

4
Remember, the participating functions must be non-infinite in X, hence the objective may not have
asymptotes.
5.4. HEURISTICS IN DGO 84

for general problem P :

P̆ : min f˘(x, w)
x∈X
w∈W
s.t. ğ(x, w) ≤ 0
hL (x, w) = 0
+
h̆ (x, w) ≤ 0

h̆ (x, w) ≤ 0
C(x, w) ≤ 0
(5.19)

where ˘ denotes a convexified function, hL refers to equality constraints which, after any
substitutions, only have linear terms, and C(x, w) ≤ 0 represents all additional constraints
which are added to the lower bounding problem in order to build relaxations of special non-
convex terms.

5.4 Heuristics in DGO


Even though algorithms such as αBB guarantee ϵ-convergence in finite time, this time may
be unacceptably long for many practical problems (e.g. 100 years). In order to reduce that
time to a reasonable magnitude, a number of heuristic strategies (rules-of-thumb) may be
employed, which can accelerate the convergence rate of a DGO algorithm greatly. The most
common among these heuristics are branching strategies and bounds tightening methods.

5.4.1 Branching strategies


Branching strategies are heuristic methods which guide the creation of a branch-and-bound
tree. Given a list of all nodes which might contain the global solution, two main questions
emerge: (i) how to choose which node to bisect5 , and (ii) upon selection of that node, how
to choose which variable will be bisected in order to produce new nodes.

5.4.1.1 Lowest-lower bound (LLB) branching

LLB branching is one of the most prevalent node selection strategies in DGO. Given the
list of all nodes which might contain the global solution, known as the branching pool, the
node with the worst (lowest) lower bound is selected for branching. If the convex relaxation
guarantees better bounds when the domain becomes smaller, like αBB does, this strategy
provides two advantages: (i) the worst current estimate of the objective value is guaranteed
to improve (or at least not become worse) at every iteration, and (ii) it is very easy to check
for convergence, because once the node with the worst bound is within convergence tolerance
of the best known local solution (the upper bound), it is guaranteed that there are no other
nodes with worst bounds (and thus, outside the convergence tolerance) in the branching pool.
5
Other, more advanced methods of subdividing nodes exist, but bisection is generally accepted as an overall
balanced choice.
CHAPTER 5. DETERMINISTIC GLOBAL OPTIMISATION 85

5.4.1.2 Least reduced axis

The least reduced axis criterion dictates that, given a node, the variable with bounds which
have been reduced the least since the solution process begun is selected for branching. This
criterion is the simplest, as it involves minimal calculations, and guarantees that the range
of all variables will keep diminishing uniformly. However, convergence using this criterion
may be very slow because it only takes the range of a variable into account: it is insensitive
to the degree to which a variable affects the overall bounds. For instance, in the expression
−x21 − x242 , x1 ∈ [−1, 1], x2 ∈ [−0.1, 0.1], the range of x2 is ten times smaller than the range
of x1 , thus, according to the least reduced axis criterion, x1 will be selected for branching.
However, branching on x2 is clearly a better choice, as the lower bound would improve
massively.

5.4.1.3 Maximum separation distance

According to the strategy of maximum separation distance, a measure µterm of contribution to


non-convexity for each non-linear term is estimated depending on that term’s type (bilinear,
trilinear, general non-linear, etc.), and a branching variable is selected from the highest
ranking term using the least reduced axis criterion:

tmax M SD = max(µjt ) (5.20)


j

where tmax M SD is the term with the maximum separation distance, j is the jth term in the
problem, and µjt is the maximum separation distance of term j, calculated depending on its
type t. Once the term is chosen, one of its participating variables is chosen for branching
using the least reduced axis criterion.
While this strategy is more costly than the least reduced axis because dedicated calcula-
tions for each term of the problem are necessary, in a great majority of cases the additional
computational cost is a worthwhile investment. This strategy is able to guide the creation of
the tree much more intelligently because it takes the degree to which a variable affects lower
bounds into account.

5.4.1.4 Most non-convex variable

The maximum separation distance strategy aims to achieve a balance between the time in-
vested in selecting the branching variable and the quality of the resulting lower bounds. How-
ever, because it takes the contribution of terms into account, rather than the contributions
of the variables themselves, variable selection is not always optimal. The most non-convex
variable branching strategy estimates a measure (weight) of the overall contribution of a
variable i to the non-convexity of a relaxation, as follows:


NT ∑
Nt
i
µ = µij
t (5.21)
t=1 j=1

where t is a term type, NT is the number of different term types in the problem, j is the
jth term of type t, and Nt is the number of terms of type t. µij t is the measure of the
contribution of term j which contains variable i and belongs to the term group t. The sum of
5.4. HEURISTICS IN DGO 86

those contributions, for all terms in each term type which contain variable i, yields an overall
estimate of the contribution of that variable.
This strategy provides more holistic information about the likelihood of achieving signifi-
cantly tighter bounds by branching on a particular variable, but at increased cost: expensive
interval eigenvalue calculations will need to be performed numerous times in order to extract
this information.

5.4.1.5 Strong Branching

Strong branching is a procedure where variables are tested for their potential to improve
problem bounds. Virtual nodes are generated (two for each candidate variable), and as many
lower bounding problems are solved. The variable which produced the best lower bound is
selected for branching. If all variables are tested, rather than the most promising subset of
them, the procedure is called full strong branching. Strong branching may result in tighter
bounds, but it is very costly because of the increased number of computationally expensive
lower bounding problems which need to be solved before each branching step. Therefore,
strong branching is typically used for the first few branching steps of the tree and then less
expensive methods are chosen to guide the branching.

5.4.1.6 Final notes on the selection of branching strategy

While good performance may be achieved by judicious choice of branching strategies, it is


important to note that these methods are heuristic: there is no theoretical proof that any of
them will outperform the others given a random application. Thus, if convergence times are
too long for a particular application, selection of a different combination of heuristic methods
may lead to faster convergence.

5.4.2 Bounds tightening


Bounds tightening (BT) is the process of reducing the interval [xL U
i , xi ] of a variable xi , while
maintaining interval containment. BT techniques do not change the feasible space, but they
accelerate DGO algorithms because smaller boxes ([xL , xU ]) result in tighter bounds.
BT methods vary in complexity; the most prominent BT techniques may be categorised as
feasibility-based bounds tightening (FBBT) and optimality-based bounds tightening (OBBT).
Feasibility-based bounds tightening, also known as constraint propagation, is typically inex-
pensive, and may be used in every node of a spatial branch-and-bound tree. FBBT methods
typically make use of interval arithmetic to propagate bounds between the constraints, while
OBBT requires construction of a convex or linear relaxation and solving for the edges of the
convex feasible region. Because OBBT requires the global solution of non-linear problems, it
is typically employed in the first few nodes of the branch-and-bound tree.

5.4.2.1 Feasibility-based bounds tightening (FBBT)

In feasibility-based bounds tightening, interval evaluations of the constraints of the problem


are performed in increasingly smaller domains. This iterative procedure aims to infer whether
some variable ranges are such that it would be impossible to achieve a better solution than
CHAPTER 5. DETERMINISTIC GLOBAL OPTIMISATION 87

the current best upper bound and simultaneously satisfy all the constraints of the problem.
Based on this information, tighter bounds may be derived.
FBBT is relatively inexpensive to perform, and is thus commonly used to tighten the
bounds of every new node. However, it does not provide any guarantee that the bounds will
be tightened, and the degree of tightening may be negligible if the interval bounds are not
very tight.

5.4.2.2 Optimality-based bounds tightening (OBBT)

Optimality-based bounds tightening is a procedure where numerous convex optimisation


problems are generated and solved. The objective of each problem is to maximise the lower
bound and to minimise the upper bound of each variable. This procedure effectively locates
the boundaries of the convexified feasible region along each basis vector. The problems are
formulated as follows.

Improvement of the lower bound on xi

xL
i,new = min xi
x∈X
w∈W
s.t. ğ(x, w) ≤ 0
hL (x, w) = 0
+
h̆ (x, w) ≤ 0

h̆ (x, w) ≤ 0
C(x, w) ≤ 0
(5.22)

Improvement of the upper bound on xi

xU
i,new = max xi
x∈X
w∈W
s.t. ğ(x, w) ≤ 0
hL (x, w) = 0
+
h̆ (x, w) ≤ 0

h̆ (x, w) ≤ 0
C(x, w) ≤ 0
(5.23)

This method is significantly more expensive than FBBT, but is much more likely to result
in significant domain reduction. A common strategy is to use FBBT on every new node, and
if the problem is not found infeasible, to proceed with OBBT to achieve further domain
reduction. Nevertheless, because 2N convex optimisation problems need to be solved (two
for each variable), this procedure is applied at the first few nodes of the branch-and-bound
tree.
5.5. AVAILABLE DGO SOFTWARE 88

5.5 Available DGO software


Numerous general purpose software packages which implement DGO methods are available,
both commercial and open-source. A non-exhaustive selection is given below:

1. ANTIGONE
ANTIGONE [9] is a general mixed integer framework which implements algorithms for
continuous/integer global optimisation of non-linear equations. It provides numerous
features, including reformulation of user input, efficient linearisation techniques, and
specialised handling of terms of different types.

2. BARON
A general purpose solver for optimization problems with nonlinear constraints and/or
integer variables. BARON [12, 11] provides fast specialized solvers for many linearly
con- strained problems. It is based on branching and box reduction using convex and
polyhedral relaxation and Lagrange multiplier techniques.

3. Couenne
Couenne [4] is an open source branch-and-bound package for solving MINLP problems.
Couenne implements linearization, bound reduction, and branching methods.

4. LINDO Global
Branch and bound code for global optimization with general factorable constraints,
including nondifferentiable expressions. LINDO (www.lindo.com) is based on linear
relaxations and mixed integer reformulations.

5. GlobSol
Branch and bound code for global optimization with general factorable constraints,
with rigorously guaranteed results (even round-off is accounted for correctly). Glob-
Sol [5] is based on branching and box reduction using interval analysis to verify that a
global minimizer cannot be lost.
Bibliography

[1] C. S. Adjiman et al. “A Global Optimization Method, αBB, for General Twice-Differentiable
Constrained NLPs – I. Theoretical Advances”. In: Computers and Chemical Engineer-
ing 22 (1998), pp. 1137–1158.
[2] F. A. Al-Khayyal and J. E. Falk. “Jointly constrained biconvex programming”. In:
Mathematics of Operations Research 8.2 (1983), pp. 273–286.
[3] I.P. Androulakis, C. D. Maranas, and C. A. Floudas. “αBB : a global optimization
method for general constrained nonconvex problems”. In: Journal of Global Optimiza-
tion 7 (1995), pp. 337–363.
[4] Pietro Belotti et al. “Mixed-integer nonlinear optimization”. In: Acta Numerica 22
(2013), 1131.
[5] R. Baker Kearfott. “GlobSol: History, Composition, and Advice on Use”. In: Global Op-
timization and Constraint Satisfaction: First International Workshop on Global Con-
straint Optimization and Constraint Satisfaction, COCOS 2002, Valbonne-Sophia An-
tipolis, France, October 2002. Revised Selected Papers. Ed. by Christian Bliek, Christophe
Jermann, and Arnold Neumaier. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003,
pp. 17–31. isbn: 978-3-540-39901-8. doi: 10 . 1007 / 978 - 3 - 540 - 39901 - 8 _ 2. url:
http://dx.doi.org/10.1007/978-3-540-39901-8_2.
[6] C. D. Maranas and C. A. Floudas. “A global optimization approach for Lennard-Jones
microclusters”. In: The Journal of Chemical Physics 97.10 (1992), pp. 7667–7677.
[7] C. D. Maranas and C.A. Floudas. “Finding all solutions of nonlinearly constrained
systems of equations”. In: Journal of Global Optimization 7.2 (1995), pp. 143–182.
[8] G. P. McCormick. “Computability of global solutions to factorable nonconvex programs:
Part I - Convex underestimating problems”. In: Mathematical Programming 10.1 (1976),
pp. 147–175.
[9] Ruth Misener and Christodoulos A. Floudas. “ANTIGONE: Algorithms for coNTinuous
/ Integer Global Optimization of Nonlinear Equations”. In: Journal of Global Optimiza-
tion 59.2 (2014), pp. 503–526. issn: 1573-2916. doi: 10.1007/s10898- 014- 0166- 2.
url: http://dx.doi.org/10.1007/s10898-014-0166-2.
[10] A. Neumaier. “Complete search in continuous global optimization and constraint sat-
isfaction”. In: Acta numerica 13.June 2004 (2004), pp. 271–369.
[11] N. V. Sahinidis. BARON 14.3.1: Global Optimization of Mixed-Integer Nonlinear Pro-
grams, User’s Manual. 2014.
BIBLIOGRAPHY 90

[12] M. Tawarmalani and N. V. Sahinidis. “A polyhedral branch-and-cut approach to global


optimization”. In: Mathematical Programming 103 (2 2005), pp. 225–249.
Chapter 6

Process Operability

6.1 Introduction
In previous chapters, we have focused on economic criteria for design. Here we consider the
role of operability.

6.1.1 What is Operability?


Operability is a combination of factors which make the plant “easy to operate” in the face of
constantly changing circumstances. Ease of operation affects the viability of the plant through
quality control, downtime minimisation, and the avoidance of accidental releases. Thus, it
has repercussions on profitability, safety and environmental impact which are difficult to
quantify. Ease of operation can be broken down into a number of concepts including:

Flexibility The capability of the process to maintain feasible operation under changing
conditions.

Controllability The capability of the process for a good dynamic response and stability
in the presence of external disturbances.

Reliability The capability of the process to withstand mechanical and electrical failures.

Safety The capability of the process to maintain safe operation under all required oper-
ating conditions.

Start-up/shut-down The process should be started-up and shut-down in a safe and


cost-effective way.

Issues of flexibility and controllability arise in normal process operation, while reliability,
safety and start-up/shut-down are relevant in abnormal situations.

6.1.2 Process Uncertainty


A process is subject to a large number of uncertainties which can be classified in two main
categories:

• External uncertainties, such as feed composition, temperature of the cooling water, etc.
6.2. PROCESS FLEXIBILITY 92

η
2

A η1 U
C(inert) k
f A B
x
C T
W

Figure 6.1: Uncertain parameters in a process

• Internal uncertainties, such as reaction constants, heat transfer coefficients, etc.

Example Consider the process depicted in Fig. 6.1. The uncertain parameters include:

1. The inert mole fraction in the feed xfC .

2. The temperature of the cooling water TW .

3. The compressor efficiencies η1 and η2 .

4. The reaction rate constant k.

5. The heat transfer coefficient U .

Time Scales Yet other uncertainties affect process operation in the long run. For instance,
changes in product demand or specification may affect the desired setpoints, so that the
process may have to move away from its nominal design. This happens on a much larger
time scale than the external and internal uncertainties mentioned above.

6.2 Process Flexibility


6.2.1 Problem Statement
When measuring process flexibility, we aim to quantify the response of the process to changing
conditions arising from uncertain parameters. Since we would like to determine whether
operation remains feasible, the inequality constraints are especially important.
In the process design problem, the following variables are relevant:

θ– the vector of uncertain parameters,


d– the vector of design variables (sizes, structure),
z– the vector of control variables (degrees of freedom that can be adjusted during operation),
x– the vector of state variables.
CHAPTER 6. PROCESS OPERABILITY 93

6.2.1.1 Process Model

The heat and mass balances and design equations take the general form

h(d, x, z, θ) = 0 (6.1)

and the process specifications (e.g. product purity ≥ 95%) and physical constraints (e.g.
outlet temperature ≤ 400K) are given by

g(d, x, z, θ) ≤ 0 (6.2)

Since the vectors h and x have the same dimension, Eq. (6.1) can (sometimes) be used to
obtain explicit expressions for x:
x = x(d, z, θ). (6.3)
Substituting Eq. (6.3) into Eq. (6.2), we get the reduced model

g(d, x(d, z, θ), z, θ) = f (d, z, θ) ≤ 0 (6.4)

We can analyse feasible operation either in terms of the full model or in terms of the reduced
model.

6.2.1.2 Problems in Flexibility

Flexibility Analysis Given a design d and a specified range of uncertain parameters θ,

• Is the design feasible for operation for all θ in the given range? → Flexibility Test

• How flexible is the design? → Flexibility Index

A key aspect in both problems is to anticipate that the controls z can be adjusted during
operation for every θ.

Design and Synthesis of Flexible Processes In this case, the design of the process is
no longer fixed.

• Determine an optimal design d that is feasible at finite points in the uncertain parameter
space → Multiperiod Design Problem

• Determine an optimal design d that is feasible for all uncertain parameters in the
specified range → Design Problem Under Uncertainty

6.2.2 Flexibility Analysis


6.2.2.1 Motivating Example

Consider the heat exchanger network shown in Fig 6.2. The design (network topology)
has already been fixed. The control variable z is the heat load on the cooler QC . The
state variables are the temperatures T2 , T4 , T6 and T7 . The uncertain parameters are the
temperatures T3 and T5 .
These parameters have nominal values T3N =388K and T5N =583K. The expected deviation
is ±10K and therefore the parameter ranges are 378 ≤ T3 ≤ 398 and 573 ≤ T5 ≤ 593.
6.2. PROCESS FLEXIBILITY 94

620K T5
1.5 kW/K 1 kW/K

T3 T4
1 2 563K
2 kW/K

T2 T
6

313K
3 393K
3kW/K
Q
C
350K T < 323K
7

Figure 6.2: Motivating example for flexibility analysis

Derive the process model and the reduced process model.

Process Model

• Heat balance equations

Exchanger 1 1.5(620 − T2 ) = 2(T4 − T3 ) (6.5)


Exchanger 2 T5 − T6 = 2(563 − T4 ) (6.6)
Exchanger 3 T6 − T7 = 3(393 − 313) (6.7)
Cooler QC = 1.5(T2 − 350) (6.8)

• Temperature specifications

Exchanger 1 T2 − T3 ≥ 0 (6.9)
Exchanger 2 T6 − T4 ≥ 0 (6.10)
Exchanger 3 T7 − 313 ≥ 0 (6.11)
Exchanger 3 T6 − 393 ≥ 0 (6.12)
Exchanger 3 T7 ≤ 323 (6.13)

The inequalities (6.9) to (6.12) are physical constraints based on a minimum approach tem-
perature of 0K. Eq. (6.13) is a specification.

Reduced Process Model This linear model involves four equalities and four state vari-
ables. We can therefore construct the reduced network model by eliminating the state vari-
ables. First, solve Eq. (6.5) to (6.8) for T2 , T4 , T6 , T7 . Substitute these in Eq. (6.9) to (6.13)
CHAPTER 6. PROCESS OPERABILITY 95

and re-arrange as f (z, θ) ≤ 0:

f1 = T3 − 0.666QC − 350 ≤ 0
f2 = −T3 − T5 + 0.5QC + 923.5 ≤ 0
f3 = −2T3 − T5 + QC + 1274 ≤ 0 (6.14)
f4 = −2T3 − T5 + QC + 1144 ≤ 0
f5 = 2T3 + T5 − QC − 1284 ≤ 0

Flexibility Test Are the inequalities in the process model satisfied for the range of uncer-
tain parameters 378 ≤ T3 ≤ 398 and 573 ≤ T5 ≤ 593, given that QC will be adjusted for each
value of T3 and T5 ?
First we need to decide how we can establish whether the inequalities can be made feasible
at fixed values of T3 and T5 by adjusting QC . One approach is to select QC to minimise the
largest constraint value given by max{fj (QC , T3 , T5 )}. Thus we need to calculate
j

ψ(T3 , T5 ) = min max{fj (QC , T3 , T5 )} (6.15)


QC j

This can be formulated as a standard optimisation problem as follows.




 min u

 QC ,u



 s.t. u ≥ f1 (QC , T3 , T5 )

ψ(T3 , T5 ) = u ≥ f2 (QC , T3 , T5 ) (6.16)

 u ≥ f3 (QC , T3 , T5 )



 u ≥ f4 (QC , T3 , T5 )



u ≥ f5 (QC , T3 , T5 )

This problem is the flexibility test. If ψ(T3 , T5 ) ≤ 0, a QC can be chosen to satisfy the
reduced process model (and hence the process model). If ψ(T3 , T5 ) > 0, there is no QC which
leads to feasible operation.
The flexibility test requires the analysis of ψ(T3 , T5 ) for all (T3 , T5 ) within the specified
range, i.e. the region shown in Fig. 6.3. We must ascertain whether ψ is non-positive for all
the points in the region. Since the inequalities in this problem are linear in QC , T3 and T5 ,
the maximum value of ψ can be found simply by evaluating ψ at the vertices of this region.
This corresponds to solving the following LP problem


 min u




QC ,u
 s.t. u ≥ T3k − 0.666QC − 350




 u ≥ −T3k − T5k + 0.5QC + 923.5
k
ψ = u ≥ −2T3k − T5k + QC + 1274 (6.17)



 u ≥ −2T3 − T5 + QC + 1144
k k





 u ≥ 2T3k + T5k − QC − 1284

 Q ≥0 C

where k is the vertex number with


6.2. PROCESS FLEXIBILITY 96

T5
2 1
593

583 .

573
3 4

T3
378 388 398

Figure 6.3: The region of interest for the flexibility test

k T3k T5k
1 388 + 10 583 + 10
2 388 – 10 583 + 10
3 388 – 10 583 – 10
4 288 + 10 583 – 10

The results are shown in the table below. Since ψ k < 0, ∀k, the network can be operated
without any constraint violations within the specified range of parameters and has therefore
passed the flexibility test.

k ψk QC (kW)
1 -5 110
2 -5 70
3 -3.33 48.33
4 -3.33 83.33

Flexibility index The goal is to determine how large the temperature deviations for T3 ,
T5 can be while still maintaining feasible operation. Thus, we are looking for the maximum
value of δ with
388 − 10δ ≤ T3 ≤ 388 + 10δ,
(6.18)
583 − 10δ ≤ T5 ≤ 583 + 10δ,
as illustrated in Fig. 6.4. The maximum value of δ which maintains feasible operation is
called the flexibility index, F. Given that the model is linear, we only need to determine
the maximum allowable value of δ in each of the vertex directions (1,2,3,4). F is then the
minimum δ over all vertex values. Thus, we must solve the following LPs:


 max δ
 δ,QC


δk = s.t. fj (QC , T3 , T5 ) ≤ 0, j = 1, . . . , 5 (6.19)

 T = 388 k 10δ

 3 3
 T5 = 583k5 10δ
CHAPTER 6. PROCESS OPERABILITY 97

T5

593

δ>1
583 .
δ=1

573 δ<1

T3
378 388 398

Figure 6.4: The flexibility index problem

where  depends on k as follows.


k 1 2 3 4
k3 + – – +
k5 + + – –
The results for the motivating example shown in the table below. The flexibility index is
F =1.5 and hence the network can tolerate temperature deviations of ±15K. Vertex 3 defines
the critical point at which the process becomes infeasible: T3C = 388 − 15 = 373K and
T5C = 583 − 15 = 568K.

k δk Active constraints
1 +∞ —
2 +∞ —
3 1.5 f1 , f2
4 2.0 f2 , f5

Geometric interpretation To plot the boundary of the feasible region, we can use
the information on the active constraints. At vertex 3, f1 and f2 are active and hence

f1 = T3 − 0.666QC − 350 = 0
(6.20)
f2 = −T3 − T5 + 0.5QC + 923.5 = 0

Eliminating QC , we get ϕ3 = −0.333T3 − 1.333T5 + 881.026 = 0. Similarly, at vertex 4, we


have
f2 = −T3 − T5 + 0.5QC + 923.5 = 0
(6.21)
f5 = 2T3 + T5 − QC − 1284 = 0
and therefore ϕ4 = −T5 + 563 = 0. Since the ϕ’s were obtained through linear combinations
of fj ’s, fj ≤ 0 ⇒ ϕ ≤ 0. The two lines ϕ3 and ϕ4 are used in Fig. 6.5 to show the region
of feasible operation. If we assume that QC has a set value and cannot be adjusted for
different values of T3 and T5 (i.e. there is no control), we get a much smaller region of
feasible operation, as shown on Fig. 6.6 for QC = 75kW. All positive deviations violate f5 .
6.2. PROCESS FLEXIBILITY 98

T5

593 |

583 .
573
|

φ4

φ3
| |
T3
378 388 398

Figure 6.5: The region of feasible operation for the motivating example.

T5 f
1

593
|

583 .
573
|

| | T3
378 388 398 f2
f4
f5

Figure 6.6: Region of feasible operation for the motivating example when QC = 75kW. Note
that f3 is too relaxed to be shown on the picture.
CHAPTER 6. PROCESS OPERABILITY 99

θ2

Boundary of R defined by
ϕ (d,δ ) = 0
θ1

Figure 6.7: Region of feasible operation

6.2.2.2 Mathematical Formulations for Flexibility Analysis

The process has a given design and is described by the reduced model

fj (z, θ) ≤ 0, j ∈ J (6.22)

that defines feasible operation. Nominal values of the uncertain parameters, θN , as well as
positive and negative expected deviations, ∆θ+ and ∆θ− , are given.

Flexibility test problem Halemane and Grossmann [2] studied whether the design is
feasible to operate within the range T = {θ|θL ≤ θ ≤ θU }, where θL = θN − ∆θ− and
θU = θN + ∆θ+ .
To account for the adjustments of control z, define for a given θ

ψ(θ) = min max fj (z, θ). (6.23)


z j∈J

If ψ(θ) ≤ 0, operation is feasible at θ. It is infeasible otherwise. ψ(θ) is the solution of the


following optimisation problem:
{
min u
ψ(θ) = u,z (6.24)
s.t. fj (z, θ) ≤ u, ∀j ∈ J

This is an LP if the reduced model is linear in z and an NLP otherwise.

The region of feasible operation is given by

R = {θ|ψ(θ) ≤ 0}. (6.25)

To test whether feasible operation can be achieved for all θ ∈ T , it suffices to consider the
largest value of ψ(θ) over T (see Fig 6.7). We are therefore looking for
6.2. PROCESS FLEXIBILITY 100

θ2 θ C point with smallest feasibility

θ1

θ2

θC
point with largest violation

θ1

Figure 6.8: Critical point in the feasible and infeasible cases

χ = max ψ(θ) (6.26)


θ∈T

or equivalently,
χ = max min max fj (z, θ) (6.27)
θ∈T z j∈J

If χ ≤ 0, the design is feasible for T and θC is the point with the smallest feasibility (largest
ψ). If χ > 0, the design is infeasible for T and θC is the point with the largest violation
(largest ψ), as illustrated on Fig. 6.8.

Flexibility index problem (Swaney and Grossmann [5]). Let the parameter range vary
so that
T (δ) = {θ|θN − δ∆θ− ≤ θ ≤ θN + δ∆θ+ }, δ ≥ 0. (6.28)
CHAPTER 6. PROCESS OPERABILITY 101

θ2

+
∆θ 2

N
θ2
-
∆θ 2

- +
∆θ 1 ∆θ 1
N
θ1 θ1

Critical point θ C

Figure 6.9: The flexibility index problem

What is the largest T (δ) that the design can tolerate? This is given by the flexibility index
F such that {
max δ
F = (6.29)
s.t. T (δ) ⊆ R

Thus, a constraint of the flexibility index problem is that the flexibility test must hold for
the largest δ, i.e.



 max δ
F = s.t χ = max min max fj (z, θ) ≤ 0 (6.30)
 θ∈T (δ) z j∈J

 T (δ) = {θ|θN − δ∆θ− ≤ θ ≤ θN + δ∆θ+ }, δ ≥ 0

This problem is shown in Fig. 6.9

Theoretical properties of the flexibility index problem

1. The critical parameter θC lies at a vertex when one of the following conditions is met:

• fj (z, θ) is linear in z and θ, ∀j ∈ J.


• fj (z, θ) is convex in z and θ, ∀j ∈ J.
• fj (z, θ) is one-dimensional quasi-convex in z and jointly convex in θ, ∀j ∈ J.

In the presence of nonconvexities, θC is not necessarily a vertex (see Fig. 6.10).

2. At the critical point, there are usually n + 1 active constraints, where n = dim{z} (see
Fig. 6.11).
6.2. PROCESS FLEXIBILITY 102

θ2

Non vertex
θC

θ1

Figure 6.10: The critical point may not be a vertex for nonconvex feasible regions

f 1=0

f =0
2
θ
θC θN

Figure 6.11: The number of active constraints at the critical point depends on the size of the
control vector
CHAPTER 6. PROCESS OPERABILITY 103

6.2.2.3 Algorithms for Flexibility Analysis – Vertex Enumeration Schemes

Flexibility Test

Step 1 For each vertex θk , k ∈ V , solve problem (6.24) to find ψ k .

Step 2 Set χ to the maximum of all ψ k ’s. If χ ≤ 0, the process passes the flexibility test.
It fails otherwise.

Flexibility Index Let ∆θk denote the kth vertex direction.

Step 1 For each vertex direction ∆θk , k ∈ V , solve




 max δ
δ,z
δk = s.t. fj (z, θ) ≤ 0, j ∈ J (6.31)


θ = θN + δ∆θk

Step 2 Set F = min{δ k }.


k∈V

What does this procedure become when there are no controls?

Comments on vertex enumeration algorithms

• These algorithms are only suitable for a small number of uncertain parameters as there
is an exponential growth in the number of vertices (dim{θ} = p ⇒ dim{V } = 2p ).

• With the flexibility index calculations, several pieces of information can be obtained:

– The actual parameter range that can be handled:

θN − F ∆θ− ≤ θ ≤ θN + F ∆θ+
∑ ∂f
– The sensitivity of the flexibility index to design changes. ∂F
∂di =− µj ∂dji where
j∈J
di is the ith design variable and µj is the Lagrange multiplier of the j process
constraint.

6.2.2.4 Algorithms for Flexibility Analysis – Active Set Strategy

In order to avoid the 2p vertex searches and to predict non-vertex critical points, we convert
the max min max problem into a single optimisation problem (Grossmann and Floudas [1]).

The Flexibility Test problem can be posed as



 max ψ(θ)
θ∈T
χ= (6.32)
 s.t. ψ(θ) = min max{fj (z, θ)}
z j∈J

which is a two-level optimisation problem. Since in general the inner minimisation problem
has n + 1 active constraints at the solution and it has n variables (θ is fixed in the inner
6.2. PROCESS FLEXIBILITY 104

f 1=0

f =0
2

f 3=0

θ
θ

f 1 , f 2 active
ψ(d ,θ) f 2 , f 3 active constraints
constraints

Figure 6.12: Illustration of the dependence of the active constraint set on θ


CHAPTER 6. PROCESS OPERABILITY 105

problem), then it has no degrees of freedom. If we knew which inequalities are active, we
could treat the problem as a system of equations. However, the set of active constraints
depends on the value of θ (Fig. 6.12). In addition, ψ(θ) is usually non-differentiable. Since
ψ(θ) is the solution of an optimisation problem, we can represent it through its Kuhn-Tucker
conditions. We use the following form of the inner problem:

{
min u
ψ(θ) = u,z (6.33)
s.t. fj (z, θ) ≤ u, j ∈ J

The Kuhn-Tucker conditions are


1− µj = 0
j∈J
∑ ∂fj
µj ∂zi = 0, i = 1, . . . , n
j∈J
(6.34)
µj ≥ 0, j ∈ J
fj (z, θ) − u ≤ 0, j ∈ J
µj (fj (z, θ) − u) = 0, j ∈ J

The main difficulty is due to the complementarity conditions which implies a choice regarding
the nature of each inequality constraint: if a constraint is active, the corresponding µ is
greater than 0; otherwise, it is equal to 0. We model this decision with binary variables. For
a constraint fj (z, θ) − u ≤ 0, yj = 1 if the constraint is active and yj = 0 otherwise. We also
define a slack variable sj for constraint j where sj ≥ 0 and sj = u − fj (z, θ). Then, assuming
that n + 1 constraints must be active µj [fj (z, θ) − u] = 0 can be replaced by

sj − U (1 − yj ) ≤ 0, j ∈ J
µj − yj ≤ 0, j ∈ J
∑ (6.35)
yj = n + 1
j∈J
sj ≥ 0, j ∈ J

where U is a large positive number. If yj = 1, sj = 0 and 0 ≤ µj ≤ 1: the constraint is active.


If yj = 0, µj = 0 and 0 ≤ sj ≤ U : the constraint is inactive.

Substituting this representation of the complementarity conditions into the Kuhn-Tucker


conditions, and replacing the inner optimisation problem in (6.32) with the Kuhn-Tucker
6.2. PROCESS FLEXIBILITY 106

conditions, we get



 max u

 θ,u,z ,µ,s,y

 ∑

 s.t. µj = 1



 j∈J


 ∂f

 µj ∂ zj = 0




j∈J

 sj + fj (z, θ) − u = 0, j ∈ J



 sj − U (1 − yj ) ≤ 0, j ∈ J

χ= µj − yj ≤ 0, j ∈ J (6.36)

 ∑

 yj = n + 1



 j∈J

 µj ≥ 0, j ∈ J





 sj ≥ 0, j ∈ J



 θL ≤ θ ≤ θU



 u∈R



yj ∈ {0, 1}, j ∈ J

This problem is an MINLP. If the fj (z, θ) linear in z and θ for all j ∈ J, the problem is an
MILP. This formulation can be extended to handle the full process model rather than the
reduced process model. This is useful when the state variables cannot easily be eliminated
through manipulation of the equalities. Correlated uncertain parameters expressed as r(θ) =
0 can also be included in the formulation.

The flexibility index problem can be posed in a similar way:



 max δ

 θ,δ,z ,µ,s,y

 ∑

 s.t. µj = 1



 j∈J


 ∂f

 µj ∂ zj = 0




j∈J

 sj + fj (z, θ) = 0, j ∈ J



sj − U (1 − yj ) ≤ 0, j ∈ J
F = (6.37)

 µj − yj ≤ 0, j ∈ J

 ∑

 yj = n + 1



 j∈J



 θN − δ∆θ− ≤ θ ≤ θN + δ∆θ+



 µj ≥ 0, j ∈ J



 sj ≥ 0, j ∈ J


 yj ∈ {0, 1}, j ∈ J

Note that u has been set to zero in this formulation. Why?


CHAPTER 6. PROCESS OPERABILITY 107

Application of active set strategy to motivating example The flexibility test problem
is given by
max u
s.t. µ1 + µ2 + µ3 + µ4 + µ5 = 1
−0.666µ1 + 0.5µ2 + µ3 + µ4 − µ5 = 1
s1 + f1 − u = 0
s2 + f2 − u = 0
s3 + f3 − u = 0
s4 + f4 − u = 0
s5 + f5 − u = 0
1000y1 + s1 ≤ 1000
1000y2 + s2 ≤ 1000
1000y3 + s3 ≤ 1000
1000y4 + s4 ≤ 1000
1000y5 + s5 ≤ 1000
−y1 + µ1 ≤ 0
−y2 + µ2 ≤ 0
−y3 + µ3 ≤ 0
−y4 + µ4 ≤ 0
−y5 + µ5 ≤ 0
y1 + y2 + y3 + y4 + y5 = 2
378 ≤ T3 ≤ 398
573 ≤ T5 ≤ 593
µj ≥ 0, sj ≥ 0, yj ∈ {0, 1}, j = 1, . . . , 5

The solution of the MILP is u = −3.33 (as with the vertex enumeration algorithm). At
the solution, y1 = y3 = y4 = 0 and y2 = y5 = 1: f2 and f5 are the limiting constraints.
µ2 = 0.667 and µ5 = 0.333. T3C = 378K and T5C = 573K.

Solution strategy in the nonlinear case One can solve the problem as an MINLP or
use an active set strategy.

1. Let us assume that the constraint functions fj are monotonic in z, we can identify
candidate active sets a priori from the following conditions:
∑ ∂f
µj ∂ zj = 0
j∈J
µj − yj = 0, j ∈ J

yj = n + 1.
j∈J

Since the sign of the partial derivative of fj with respect to z is constant, the set of
nonzero µj (and therefore active constraints) must contain both positive and negative
derivatives.

2. By selecting n + 1 constraints for different assignments satisfying the above equations,


6.2. PROCESS FLEXIBILITY 108

H2 450C H1 310C
2 kW/C F H1

C2 t
1 1 2 290C
2 kW/C
115C
t
2

C1
3 120C
3kW/C
Q 40C
C
280C t < 50C
3

Figure 6.13: Morari example

we solve for each active set a

ua = max u
θ,u,z (6.38)
s.t. fj (z, θ) = u, j ∈ JAa

where JAa defines the ath active set

3. Set χ = max{ua }.
a

Consider the example from Saboo and Morari [4] shown in Fig. 6.13. The uncertain
parameter is FH1 with 1 ≤ FH1 ≤ 1.8 kW/C. The control variable QC . The constraints are

f1 = t1 − t2 ≤ 0 Feasibility heat exchanger 2


f2 = 120 − t2 ≤ 0 Feasibility heat exchanger 3
(6.39)
f3 = 40 − t3 ≤ 0 Feasibility heat exchanger 3
f4 = t3 − 50 ≤ 0 Specification

∑ ∂f
After eliminating t1 , t2 and t3 , set µj ∂ zj = 0:
j∈J

(1/FH1 − 0.5) µ1 +(1/FH1 ) µ2 +(1/FH1 ) µ3 −(1/FH1 ) µ4 = 0


| {z } | {z } | {z } | {z }
+ve +ve +ve −ve

Since there is one control variable, the active set is expected to contain two constraints. Given
that the coefficient of µ4 is the only negative coefficient, there are three possible combinations
all involving f4 : (f1 , f4 ), (f2 , f4 ) and (f3 , f4 ). In general, the number of candidate sets
m!
is smaller than the maximum number of assignments as given by (n+1)!(m−n−1)! (m is the
number of inequalities) because of the non-negativity constraints on µ and s.
Consider the flexibility test for 1 ≤ FH1 ≤ 1.8kW/o C.
CHAPTER 6. PROCESS OPERABILITY 109

Q
C
f1
(z)
f2

300 _ f3
f4

200 _

100 _

| | F H1 (θ)
1.8
1

Figure 6.14: Geometrical interpretation of the Morari example

• Analyse the vertices:


FH1 = 1 ⇒ ψ(1) = −5
FH1 = 1.8 ⇒ ψ(1.8) = −5
Feasible operation is possible at both vertices.

• Apply the mixed-integer formulation.

Active constraints u
1,4 5.108
2,4 -31.67
3,4 -5

C = 1.372kW/o C, an interior point.


Hence χ = 5.108: infeasible operation occurs at FH1

The geometrical interpretation of this problem is shown in Fig. 6.14.

6.2.2.5 The effect of design on flexibility

Consider an example taken from Swaney and Grossmann [5] of two networks (Fig. 6.15).
There are 12 uncertain parameters: 4 flowrates, 4 inlet temperatures and 4 fouling resis-
tances. Although the two designs meet the same specifications at the nominal values of the
parameters, the flexibility index of the first network is 0.06 and that of the second network
is 0.816.

6.2.3 Design and Synthesis of Flexible Processes


Two major cases:

1. Design for a finite number of conditions – multiperiod optimisation.

2. Design for a specified range of uncertain parameters:


6.2. PROCESS FLEXIBILITY 110

(a) F = 0.06
C4
(20kW/C-+ 2
115C-+ 5) CW
H2 1 <280C
4
+
(20kW/C- 2
450C-+ 5) C3
(30kW/C -+ 3
40C -+ 5)
H1 2 3 <50C
(10kW/C-+ 1
310C-+ 5)
>290C <120C

(b) F=0.816
>290C

CW
H2 2 <280C
4
(20kW/C-+ 2
450C-+ 5) C3
(30kW/C -+ 3
40C -+ 5)
H1 1 3 <50C
(10kW/C-+ 1
310C-+ 5)
C4
(20kW/C-+ 2 <120C
115C-+ 5)

Figure 6.15: Two designs with different flexibility indices


CHAPTER 6. PROCESS OPERABILITY 111

H
480K
440K
C1 1 500K
420K
Steam
440K

C2 2 >430K
385K

<410K

Figure 6.16: Overdesign example problem

(a) Iterative multiperiod optimisation


(b) Design with explicit flexibility constraints

Is overdesign always required for flexibility? Consider the example in Fig. 6.16. The
flowrate-heat capacities of the streams are 15kW/K for H, 30kW/K for C1 and 10kW/K
for C2 . The heat transfer coefficients are U1 = U2 =800W/m2 K. The minimum approach
temperature is 10K. Assume we overdesign the heat exchange areas A1 and A2 by 20%.
If U1 deviates by +20% and U2 by −20%, however, the operation of the network becomes
infeasible. In fact, there are two alternative ways to overdesign the heat exchange areas to
ensure operational feasibility under these circumstances:
1. Overdesign A1 by 20% and A2 by 108%.

2. Underdesign A1 by 16% and overdesign A2 by 23%

6.2.3.1 Multiperiod optimisation

Given N periods of operation each of length Wi , i = 1, . . . , N , with values of the parameters


θi , select the design vector d which makes the process feasible over all N periods and which
optimises the cost function

N
C= c(d) + Wi Φi (d, z i , θi ) (6.40)
|{z}
investment cost |i=1 {z }
operating cost

Mathematical formulation

N
min c(d) + Wi Φi (d, z i , θi )
d,z 1 ,...,z N i=1
(6.41)
s.t. (d, z i , θi ) ≤ 0, i = 1, . . . , N
g(d) ≤ 0
6.2. PROCESS FLEXIBILITY 112

g(d)g1 . . . . . . . . . . .g . . .
d M
0000000
1111111
0000000
1111111
d 1111111
0000000
11111111
0000000
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
. 0000000
1111111
. 0000000
1111111
0000000
1111111
0000000
1111111 000000
111111
. 0000000
1111111 000000
111111
000000
111111
. 000000
111111
. 000000
111111
000000
111111
000000
111111
. 000000
111111
000000
111111 000000
111111
. 000000
111111
000000
111111
000000
111111
000000
111111
dN 000000
111111
000000
111111

Figure 6.17: Block angular structure of the multiperiod optimisation problem

• The problem size increases substantially with the number of periods N .

• The problem structure (block angular) can be exploited in efficient decomposition pro-
cedures (Fig 6.17).

• The formulation can be applied to design problems concerned with equipment sizing or
with synthesis.

• Applications of this formulation include multiperiod MILP transhipment model for


HENs, multiperiod distillation sequence synthesis, multiperiod batch plant design under
uncertainty, multiperiod approach to retrofit design.

6.2.3.2 Design for a Specified Range of Uncertain Parameters

Iterative Multiperiod Optimisation

Step 1 Select n0 points θi in range T = {θ|θL ≤ θ ≤ θU }, for instance the nominal point
and some expected critical points. Assign a weight Wi to each point θi . Set iteration
counter k = 0.

Step 2 Solve the corresponding multiperiod optimisation problem.

Step 3 Perform a flexibility analysis on the resulting design (flexibility test or flexibility
index).

Step 4 Check for convergence:

• If the design is feasible, stop.


• Otherwise, include the critical point θC found in the analysis in the set of points
θi , set k = k + 1 and return to step 2.
CHAPTER 6. PROCESS OPERABILITY 113

Design with Explicit Flexibility Constraints (Pistikopoulos and Grossmann [3]). The
basic idea is to perform a prior analysis on the active set and to generate explicit constraints
for flexibility. Consider the following case:
1. fj (d, z, θ), j ∈ J linear in d, z and θ.

2. The objective function involves design variables only (e.g. retrofit): c = αT y + β T d.


The mathematical formulation is
min c = αT y + β T d
d,y
max ψ(d, θ) ≤ 0
s.t. θ∈T
s.t. T = {θ|θL ≤ θ ≤ θU } (6.42)
di ≥ 0, i = 1, . . . , r
di − U y i ≤ 0, i = 1, . . . , r
y i ∈ {0, 1}, i = 1, . . . , r
where
ψ(d, θ) = min u
s.t.fj (d, z, θ) ≤ u, j ∈ J
The linearity of fj can be exploited to simplify max ψ(d, θ) ≤ 0 constraint.
Key properties of this problem include
Property 1 The active sets and mutipliers µaj , j ∈ JAa , a = 1, . . . , N AS (N AS: number
of active sets) are invariant to d and θ. This follows from the stationarity conditions
∑ ∑ ∂f
( µj = 1 and µj ∂ zj = 0) which form a system of n + 1 equations in n + 1
a
j∈JA a
j∈JA
unknowns (µj ). This implies that

max ψ(d, θ) ≤ 0 ⇔ max ψ a (d, θ) ≤ 0, a = 1, . . . , N AS. (6.43)


θ∈T θ∈T

Property 2 At the optimal solution, the Lagrangian is equal to ψ a (d, θ), i.e. ψ a (d, θ) =
∑ a
µj fj (d, z, θ).
a
j∈JA

Property 3 The critical points θCa for each active set a can be determined a priori and
a
are independent of d (because ψ a (d, θ) is linear in θ). If ∂ψ Ca = θ U . If
∂θi > 0, then θi i
∂ψ a
∂θi < 0, then θiCa = θiL .
From these properties, the design problem reduces to an MILP problem with explicit
flexibility constraints.
min c = αT y + β T d
d,y
∑ a
s.t. µj fj (d, z, θCa ) ≤ 0, a = 1, . . . , N AS
a
j∈JA
(6.44)
di − U yi ≤ 0, i = 1, . . . , r
d≥0
y ∈ {0, 1}r
The design procedure can be outlined as follows.
6.2. PROCESS FLEXIBILITY 114

Step 1 Identify N AS active sets JA and obtain µaj , j ∈ JAa by solving the linear equations

µj = 1
a
j∈JA
∑ ∂f
µj ∂ zj = 0
a
j∈JA

∑ ∂ψ a
Step 2 Set ψ a (d, θ) = µaj fj (d, z, θ) and determine θCa by analysing the sign of ∂θ .
a
j∈JA

Step 3 Setup an MILP problem of form (6.44) and solve.

Example Find d for the following problem:

f1 = −z + θ ≤ 0
f2 = z − 2θ + 2 − d ≤ 0
1≤θ≤2

The procedure can be extended to nonlinear constraints: the design formulation becomes
an MINLP of the form

min c = αT y + β T d
∑ a,l
s.t. µj fj (d, z, θa,l ) ≤ 0, k = 1, . . . , N AS, l = 1, . . . , L
a
j∈JA
di ≥ 0, i = 1, . . . , r (6.45)
di − U yi ≤ 0, i = 1, . . . , r
yi ∈ {0, 1}, i = 1, . . . , r

where L is the number of points at which each active set is evaluated. The algorithm is

Step 1 Set L = 1, guess d.

Step 2 Identify N AS active sets and for each set a solve

χa (d) = max u
u,z ,θ
s.t. fj (d, z, θ) = u, j ∈ JAa
θL ≤ θ ≤ θU

This yields θa,l and µa,l . If χa (d) ≤ 0, stop.

Step 3 Set up and solve the MINLP for d. Set L = L + 1 and return to step 2.

This approach can be extended to address the following issues.

1. Obtain a parametric curve for cost vs. flexibility index by formulating the design
problem in terms of F (see Fig 6.18).

2. Assuming probability distribution functions for the uncertain parameters θ, one can
maximise the expected revenue and determine the optimal flexibility (see Fig 6.19).
CHAPTER 6. PROCESS OPERABILITY 115

Cost

Figure 6.18: A typical cost vs. flexibility curve

expected revenue

profit

F
F*

cost

Figure 6.19: Typical expected revenue, profit and cost vs. flexibility curves
Bibliography

[1] I.E. Grossmann and C.A. Floudas. “Active constraint strategy for flexibility analysis in
chemical processes”. In: Computers & Chemical Engineering 11.6 (1987), pp. 675 –693.
issn: 0098-1354. doi: http://dx.doi.org/10.1016/0098-1354(87)87011-4. url:
http://www.sciencedirect.com/science/article/pii/0098135487870114.
[2] K. P. Halemane and I. E. Grossmann. “Optimal process design under uncertainty”.
In: AIChE Journal 29.3 (1983), pp. 425–433. issn: 1547-5905. doi: 10 . 1002 / aic .
690290312. url: http://dx.doi.org/10.1002/aic.690290312.
[3] E.N. Pistikopoulos and I.E. Grossmann. “Optimal retrofit design for improving pro-
cess flexibility in linear systems”. In: Computers & Chemical Engineering 12.7 (1988),
pp. 719 –731. issn: 0098-1354. doi: http://dx.doi.org/10.1016/0098- 1354(88)
80010-3. url: http://www.sciencedirect.com/science/article/pii/0098135488800103.
[4] Alok K. Saboo and Manfred Morari. “Design of resilient processing plantsIV”. In:
Chemical Engineering Science 39.3 (1984), pp. 579 –592. issn: 0009-2509. doi: http:
//dx.doi.org/10.1016/0009-2509(84)80054-8. url: http://www.sciencedirect.
com/science/article/pii/0009250984800548.
[5] R. E. Swaney and I. E. Grossmann. “An index for operational flexibility in chemi-
cal process design. Part I: Formulation and theory”. In: AIChE Journal 31.4 (1985),
pp. 621–630. issn: 1547-5905. doi: 10.1002/aic.690310412. url: http://dx.doi.
org/10.1002/aic.690310412.

You might also like