2024 03 07 Lecture Notes

Introduction to Optimisation
Lecture notes
Philine Schiewe
assistant professor for operations research
Aalto University, School of Science
based on lecture notes by

Fabricio Oliveira and Anita Schöbel
2
Contents
1 Introduction 5
1.1 What is optimisation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Types of optimisation problems/ mathematical programming . . . . . . . . . . . . 6
1.3 Modelling real-world problems using optimisation . . . . . . . . . . . . . . . . . . . 7
1.4 Modelling problems as LP and graphical solution approach . . . . . . . . . . . . . 13
1.4.1 Graphical sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Linear programming 21
2.1 Representations of linear programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Geometric and algebraic basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Gauss-Jordan elimination and simplex tableaus . . . . . . . . . . . . . . . . . . . . 29
2.5 Artificial variables and feasible initial solutions . . . . . . . . . . . . . . . . . . . . 33
2.5.1 The M-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5.2 Two-phase method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.6 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.7 Duality in linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.7.1 Primal-dual relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.7.2 Dual simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.8 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.8.1 Changes in the independent term (b) . . . . . . . . . . . . . . . . . . . . . . 54
2.8.2 Changes in the objective function coefficients (c) . . . . . . . . . . . . . . . 55
3 Integer programming 59
3.1 Integer programming problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2 Modelling with integer variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.1 Fixed cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.2 Disjunctions and implications . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3 Solving general integer programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3.1 Branch-and-bound method . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3
4 Contents
4 Nonlinear optimisation 77
4.1 Revision of calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Nonlinear optimisation models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3 Optimality conditions - Unconstrained problems . . . . . . . . . . . . . . . . . . . 81
4.4 Convexity of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.5 One dimensional optimisation methods - Line search . . . . . . . . . . . . . . . . . 85
4.5.1 Bisection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.5.2 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.6 Multidimensional functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.7 Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.8 Multidimensional optimisation methods . . . . . . . . . . . . . . . . . . . . . . . . 93
4.8.1 Steepest descent/ gradient descent method . . . . . . . . . . . . . . . . . . 93
4.8.2 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.9 Optimality conditions for constrained problems . . . . . . . . . . . . . . . . . . . . 96
4.9.1 Karush-Kuhn-Tucker (KKT) conditions . . . . . . . . . . . . . . . . . . . . 99
4.10 Solution approaches for constrained problems . . . . . . . . . . . . . . . . . . . . . 102
4.10.1 Newton’s method for constrained problems . . . . . . . . . . . . . . . . . . 103
4.10.2 Barrier method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.10.3 Primal-dual interior point method . . . . . . . . . . . . . . . . . . . . . . . 109
Chapter 1
Introduction
1.1 What is optimisation?

In our daily life, we encounter many things we might want to do better : We are looking for the
fastest way to work, the most energy-efficient way to light our home or the cheapest price for our
favourite coffee. We can think about these problems in the context of the algorithm engineering
cycle, see Figure 1.1.
problem
n m
t io od
ta el
re lin
rp g
te
in
optimisation
solution model
Figure 1.1: Algorithm engineering cycle.
Starting from our problem, we first have to derive a mathematical model to represent it. We have
to identify
• variables, representing decisions. These can be business decisions, parameter definitions,

settings, geometries, etc.
• the domain, representing constraints. These can be dictated by logic, design, engineering,
etc.
• a function, representing our objective. Here, we measure the (decision) quality.
This lecture focuses on the next step, optimisation. The goal is to find a feasible solution, i.e.,
a variable assignment that does not violate any constraint, that maximises (or minimises) the
objective function we are considering. This can be achieved by
• analysing properties of functions/extreme points or

• applying numerical methods as in Figure 1.2.
During the course of this lecture, we will get to know different classes of optimization problems
and consider appropriate solution approaches.
5
6 Chapter 1. Introduction
Figure 1.2: Example for a numerical optimization method.
When we have computed an optimal (or near-optimal) solution, we might or might not be done.
We might notice that the problem we just solved was not the problem we actually wanted to
consider: the fastest way to work might be by car but that will not work if you don’t own a car.
Thus, a careful interpretation of the solution is necessary and can lead to changes in the model –
starting the algorithm engineering cycle all over again.
1.2 Types of optimisation problems/ mathematical program-

ming
Optimisation has important applications in fields such as mathematical programming and opera-
tions research (OR), economics and statistics as well as machine learning and artificial intelligence.
Note that mathematical program is used synonymously to optimization model in the context of
this lecture and is not to be confused with computer programming.
We now state an optimisation problem in its general form and define important classes of optimi-
sation problems.
Definition 1.1 (Optimization problem in general form). Let x ∈ Rn be a vector of (decision)

variables xj , j = 1, . . . , n and f : Rn → R ∪ {±∞} an objective function. Let X ⊆ Rn be
the ground set (representing physical constraints) and gi , hj : Rn → R constraint functions for
i = 1, . . . , m, j = 1, . . . l. Then a general optimization problem P is given as
(P ) : min f (x)
s.t. gi (x) ≤ 0, i = 1, . . . , m
hj (x) = 0, j = 1, . . . , l
x ∈ X.
Note that we call gi (x) ≤ 0 for i = 1, . . . , m inequality constraints and hj (x) = 0 for j = 1, . . . , l
equality constraints.
Remark. Depending on the problem at hand, we might be interested in maximising or minimising

an objective function. By substituting
max f (x)
with
− min −f (x)
it suffices to consider only minimisation problem for our theoretical analysis.
1.3. Modelling real-world problems using optimisation 7
We are interested in feasible and especially optimal solution. While we write “min” in P , a
minimum (or optimal solution) does not necessarily exist.
Definition 1.2 (Feasible and optimal solutions). Let P be an optimization problem as in Defini-
tion 1.1.
• x ∈ X is called feasible solution if
gi (x) ≤ 0, for all i = 1, . . . , m

hj (x) = 0, for all j = 1, . . . , l.
• The feasible region F of P is the set of all feasible solutions, i.e.,
F = {x : gi (x) ≤ 0, i = 1, . . . , m and hj (x) = 0, j = 1, . . . , l} .
• We call P infeasible if no feasible solution exists, i.e., if F = ∅.

• A feasible solution x∗ ∈ F is called (globally) optimal if there exists no other feasible solution
with a strictly better objective value, i.e., if
f (x∗ ) ≤ f (y) for all y ∈ F.
Our goal will be to solve variations of the general problem P . The simpler the assumptions are
that define a type of problem, the better the methods to solve such problems. We focus on the
following four types of optimization problems.
• Linear programming (LP): linear f (x) := c⊤ x with c ∈ Rn ; constraint functions gi (x)

and hi (x) are affine (a⊤ n n
i x − bi , with ai ∈ R , bi ∈ R); X = {x ∈ R : xj ≥ 0, j = 1, . . . , n}.
• Nonlinear programming (NLP): some (or all) of the functions f, gi or hi are nonlinear;
• (Mixed-)integer programming ((M)IP): LP where (some of the) variables are binary
n−k
(or integer), i.e., X = Rk × {0, 1} (or X = Rk × Zn−k )
• Mixed-integer nonlinear programming (MINLP): some (or all) of the functions f, gi or
n−k
hi are nonlinear and (some of the) variables are binary (or integer), i.e., X = Rk × {0, 1}
(or X = Rk × Zn−k )
1.3 Modelling real-world problems using optimisation

Let us start with a simple example to illustrate the optimisation modelling framework.
Example 1.3. Consider the following: A carpenter makes tables and chairs using wood and her
labour. The carpenter has a limited availability of labour and wood, see Table 1.1. What is the
optimal weekly production of tables and chairs?
To model the problem mathematically, we need to follow three key steps:
1. Determine what needs to be decided (decision variables)

x1 - amount of tables
x2 - amount of chairs
Table Chair Available (per week)

Selling price ($) 800 600 -
Workload (h) 3 5 40
Wood (u) 7 4 60
Table 1.1: Resources and selling price for tables and chairs.
2. How solutions are assessed (objective function)

Maximise revenue: max z = 800x1 + 600x2
3. The requirements that must be satisfied (constraints)
3x1 + 5x2 ≤ 40 (available labour)

7x1 + 4x2 ≤ 60 (available wood)
x1 , x2 ≥ 0
The complete model then is:
max z = 800x1 + 600x2 (profit)

s.t. 3x1 + 5x2 ≤ 40 (available labour)
7x1 + 4x2 ≤ 60 (available wood)
x1 , x2 ≥ 0
Remark. Note that models are simplified representations of reality. Simplifying assumptions in
this example include:
• fractional number of chairs/ tables;

• no uncertainty;
• no production cost and no wastage of resources;
• perfect demand (all production is sold)...
The most suitable optimisation method for solving an optimisation model depends on the model’s
mathematical properties.
• is the model linear ?

• are there integer variables?
• are the nonlinear terms convex ?
• are gradients available?
In this course, we will learn how to specify a suitable method for a model given its properties. For
now, we will concentrate on (continuous) linear (optimisation) models (LPs).
Linear models have particular properties that can be exploited to devise an efficient optimisation
method.
Example 1.3 (continued). Consider again the model for calculating the optimal number of tables
and chairs to produce. We can interpret the constraints for available labour and available wood
as a half-space in R2 . The feasible region is the intersection of these two half-spaces and R2+ =
{(x1 , x2 ) : x1 , x2 ≥ 0}, see Figure 1.3a.
14 3x1 + 5x2 = 40 14 3x1 + 5x2 40

7x1 + 4x2 = 60 Feasible region
12 Feasible region 12 7x1 + 4x2 60
z = 800x1 + 600x2
10 10
z = 4800
8 8
x2
x2
6 6
x ** = (6.08, x2* = 4.34)
z = 2400 z = 7478.3
4 4
2 2
0 4 14 0 4 14
0 2 6 8 10 12 0 2 6 8 10 12
x1 x1
(a) Feasible region. (b) Finding an optimal solution.
Figure 1.3: Carpentry example.
To find optimal solution in the feasible region we consider level sets of the objective function. For
a given objective value z, the level set {(x1 , x2 ) ∈ R2 : 800x1 + 600x2 = z} describes all solution
with objective value z. Note that in our case, the level sets are lines in R2 with the same slope.
Figure 1.3b shows how we can graphically find the largest value z such that the corresponding level
set intersects with the feasible region. This intersection is the set of optimal solutions.
A more complex real-world problem to be modelled is the production planning problem which is a
problem considered in operations research (OR).
Example 1.4 (Real-world model: production planning problems (OR)). In the production plan-
ning problem, we need to plan the production and distribution of goods. The transportation costs
are proportional to the distance travelled, factories have a capacity limit and clients have a known
demands.
Factories Clients
NY
SD
CH
SE
MI
Figure 1.4: Problem data: factories and clients.

Clients
Factory NY Chicago Miami Capacity
Seattle 2.5 1.7 1.8 350
San Diego 3.5 1.8 1.4 600
Demands 325 300 275 -
Table 1.2: Problem data: unit transportation costs, demands and capacities.
Let i ∈ I = {Seattle, San Diego} be the index set representing the factories. Similarly, let j ∈
J = {New York, Chicago, Miami} be the index set representing the clients. Figure 1.4 shows the
factories and clients while Table 1.2 gives the corresponding costs, demands and capacities.
To model the problem mathematically, we follow the same three key steps as above:

Let xij be the amount produced in factory i and sent to client j.
Minimise total distribution cost:
min z = 2.5x11 + 1.7x12 + 1.8x13 + 3.5x21 + 1.9x22 + 1.4x23 ,
which can be more compactly expressed as
XX
min z = cij xij
i∈I j∈J
where cij is the unit transportation cost from i to j.

x11 + x12 + x13 ≤ 350 (capacity limit Seattle)
x21 + x22 + x23 ≤ 600 (capacity limit San Diego)
x11 + x21 ≥ 325 (demand in New York)
x12 + x22 ≥ 300 (demand in Chicago)
x13 + x23 ≥ 275 (demand in Miami).
These constraints can be expressed in the more compact form
X
xij ≤ Ci , ∀i ∈ I
j∈J
X
xij ≥ Dj , ∀j ∈ J,
i∈I
where Ci is the production capacity of factory i and Dj is the demand of client j.
The complete model:

min z = 2.5x11 + 1.7x12 + 1.8x13 + 3.5x21 + 1.9x22 + 1.4x23
s.t. x11 + x12 + x13 ≤ 350, x21 + x22 + x23 ≤ 600
x11 + x21 ≥ 325, x12 + x22 ≥ 300, x13 + x23 ≥ 275
x11 , . . . , x23 ≥ 0.
Or, more compactly, in the so called algebraic (symbolic) form

XX
min z = cij xij
i∈I j∈J
X
s.t. xij ≤ Ci , ∀i ∈ I
j∈J
X
xij ≥ Dj , ∀j ∈ J
i∈I
xij ≥ 0, ∀i ∈ I, ∀j ∈ J.
The next example is a calssification problem used in machine learning (ML).

Example 1.5 (Real-world model: classification problem (ML)). Suppose we are given a data
set D ⊂ Rn that can be separated into two disjunct sets in Rn : I − = {x1 , . . . , xN } and I + =
{xN +1 , . . . , xM }.
Each element xi ∈ D is an observation of a given set of features; belonging to either I − or I +
defines a classification, see Figure 1.5a.
15 15
Pos. obs. Pos. obs.
Neg. obs. Neg. obs.
Classifier
10 10
x2
x2
5 5
0 0
10 0 10 20 10 0 10 20
x1 x1
(a) Input. (b) Optimal solution.
Figure 1.5: Classification problem.
Our task is to obtain a function f : Rn → R from a given family of functions such that
f (xi ) < 0, xi ∈ I − and f (xi ) > 0, xi ∈ I + .
Here, f is selected as a linear classifier, i.e., f (xi ) = a⊤ xi − b, in which we try to set optimal a
and b considering the classification error.
The best possible classifier is that which minimises misclassification.
Let us define the following error measures:
(
− − 0, if a⊤ xi − b ≤ 0,
e (xi ∈ I ; a, b) =
a⊤ xi − b, if a⊤ xi − b > 0.
(
0, if a⊤ xi − b ≥ 0,
e+ (xi ∈ I + ; a, b) =
b − a ⊤ xi , if a⊤ xi − b < 0.
Thus, we obtain the following non-linear model

X X
(LC ′ ) : min e− (xi ; a, b) + e+ (xi ; a, b)
xi ∈I − xi ∈I +
n
s.t. a ∈ R
b ∈ R.
However, we can translate this to a linear model, by introducing slack variables.

To model e− , we introduce slack variables (ui )N −
i=1 and consider for xi ∈ I the following constraints.
a⊤ xi − b − ui ≤ 0 (1.1)
ui ≥ 0 (1.2)
If xi ∈ I − is correctly classified, we get the following.
a⊤ xi − b ≤ 0 ⇒ a⊤ xi − b − 0 ≤ 0 ⇒ e− (xi ; a, b) = 0 =: ui ≥ 0
If xi ∈ I − is not correctly classified, we get
a⊤ xi − b > 0 ⇒ a⊤ xi − b − (a⊤ xi − b) ≤ 0 ⇒ e− (xi ; a, b) = a⊤ xi − b =: ui ≥ 0.
Note that choosing a smaller value for ui would lead to a contradiction. As ui is minimised,
constraints (1.1) and (1.2) model e− correctly.
+ +
Analogously, we introduce slack variables (vi )M
i=N +1 to model e and for each xi ∈ I we add the
following constraints.
a⊤ xi − b + vi ≥ 0 (1.3)
vi ≥ 0 (1.4)
If xi ∈ I + is correctly classified, we get the following.
a⊤ xi − b ≥ 0 ⇒ a⊤ xi − b + 0 ≥ 0 ⇒ e+ (xi ; a, b) = 0 =: vi ≥ 0
If xi ∈ I − is not correctly classified, we get
a⊤ xi − b < 0 ⇒ a⊤ xi − b + (b − a⊤ xi ) ≥ 0 ⇒ e+ (xi ; a, b) = b − a⊤ xi =: vi ≥ 0.
Note that choosing a smaller value for vi would lead to a contradiction. As vi is minimised,
constraints (1.3) and (1.4) model e+ correctly.
1.4. Modelling problems as LP and graphical solution approach 13
We therefore get the following complete linear model.
N
X M
X
(LC) : min ui + vi
i=1 i=N +1
s.t. a⊤ xi − b − ui ≤ 0, i = 1, . . . , N
a⊤ xi − b + vi ≥ 0, i = N + 1, . . . , M
a ∈ Rn
b∈R
ui ≥ 0, i = 1, . . . , N
vi ≥ 0, i = N + 1, . . . , M.
An optimal solution is given in Figure 1.5b.
1.4 Modelling problems as LP and graphical solution ap-

proach
In this section, we consider how we can get from a problem statement to a corresponding mathe-
matical model.
One of the most classic linear optimisation problems is the diet problem, which is often also referred
to as the mixture problem. It is perhaps one of the first optimisation problems to be implemented
in practice.
Some typical applications for the diet problem are:
• feed composition;
• metal alloy production;
• fuel specification;
• drug manufacturing.
Example 1.6 (Diet problem). A farms uses at least 800 lb of a special feed daily. The special
feed is a mixture of corn and soybean meal with the following compositions:
Feedstuff Protein Fibre Cost($/lb)

Corn 0.09 0.02 0.30
Soybean meal 0.60 0.06 0.90
Table 1.3: Protein, fibre and cost per lb of feedstuff
The dietary requirements of the special feed are at least 30% protein and at most 5% fiber.
Our goal is to determine the optimal feed mix composition.

To determine a mathematical model we follow the three steps:

x1 - amount (lb) of corn in the daily mix
x2 - amount (lb) of soybean meal in the daily mix
Minimise the costs

min z = 0.30x1 + 0.90x2
x1 + x2 ≥ 800 (min. feed amount)

0.09x1 + 0.6x2 ≥ 0.3(x1 + x2 ) (min. protein)
0.02x1 + 0.06x2 ≤ 0.05(x1 + x2 ) (max. fibre)
x1 , x2 ≥ 0
The complete model is:
min z = 0.30x1 + 0.90x2

s.t. x1 + x2 ≥ 800
0.09x1 + 0.6x2 ≥ 0.3(x1 + x2 )
0.02x1 + 0.06x2 ≤ 0.05(x1 + x2 )
x1 , x2 ≥ 0
It is convenient to reformulate problems to a format with variables on the left-hand side and
constants on the right-hand side.
n
X
aij xj ≤ bi , i = 1, . . . , m
|{z}
j=1
| {z } RHS
LHS
The reformulated model is:
min z = 0.30x1 + 0.90x2 (1.5)

s.t. x1 + x2 ≥ 800 (1.6)
0.21x1 − 0.30x2 ≤ 0 (1.7)
0.03x1 − 0.01x2 ≥ 0 (1.8)
x1 , x2 ≥ 0 (1.9)
Let us solve it graphically in the decision-variable space (x1 , x2 ).

Remark. In practice, optimisation problems have many more variables. Two-variable problems are
however graphically representable, which is useful to infer geometrical properties of linear problems
(LP).
Example 1.6 (Diet problem, continued). To compute an optimal solution to the diet problem,
we first graph the feasible set. Note that due to the domain of the variables (1.9), we only need to
consider R2+ = {(x1 , x2 ) : x1 ≥ 0, x2 ≥ 0}.
Each of the constraints (1.6), (1.7) and (1.8) defines a halfspace in R2 , see Figure 1.6. the feasible
region is the intersection of these halfspaces and R2 .
2000 2000
x1 + x2 800 x1 + x2 800
0.21x1 0.30x2 0
1500 1500
1000 1000
x2
x2
500 500
0 0
0 500 1000 1500 2000 0 500 1000 1500 2000
x1 x1
(a) Constraint (1.6). (b) Adding constraint (1.7).
2000 2000
x1 + x2 800 Feasible region
0.21x1 0.30x2 0 x1 + x2 = 800
0.03x1 0.01x2 0 0.21x1 0.30x2 = 0
1500 1500 0.03x1 0.01x2 = 0
1000 1000
x2
x2
500 500
0 0
0 500 1000 1500 2000 0 500 1000 1500 2000
x1 x1
(c) Adding constraint (1.8). (d) Feasible region.
Figure 1.6: Feasible region for the diet problem.
To determine an optimal solutions we consider the level set (x1 , x2 ) ∈ R2 : 0.3x1 + 0.9x2 = z for

varying values of z, see Figure 1.7a. Notice that in R2 , level sets are parallel lines. By reducing z
as long as possible while the corresponding level set intersects with the feasible region, we find an
optimal solution, see Figure 1.7b.
2000 2000
z = 1800 Feasible region z = 1800 Feasible region
x1 + x2 = 800 x1 + x2 = 800
0.21x1 0.3x2 = 0 0.21x1 0.3x2 = 0
1500 z = 1300 0.03x1 0.01x2 = 0 1500 z = 1300 0.03x1 0.01x2 = 0
z = 0.30x1 + 0.90x2 z = 0.30x1 + 0.90x2
1000 z = 900 1000 z = 900

x2
x2
500 500
x * = (x1* = 470.6, x2* = 329.4)
z * = 437.64
0 0
0 500 1000 1500 2000 0 500 1000 1500 2000
x1 x1
(a) Level sets. (b) Optimal solution.
Figure 1.7: Finding an optimal solution for the diet problem.
Remark. The diet problem example illustrates some important concepts of the geometry of linear
programs. For any feasible solution, i.e., any point in the feasible region, we differentiate between
active and inactive constraints.
• A constraint is active if the corresponding resources (requirements) fully depleted (mini-

mally satisfied), i.e., if the inequality constraint is satisfied with equality. In Example 1.6,
the constraints (1.6) (blue) and (1.7) (orange) are active for the optimal solution x∗ , see
Figure 1.7b.
• A constraint is inactive if the corresponding resources (requirements) not fully depleted

(over satisfied), if the inequality constraint is not satisfied with equality. In Example 1.6, the
constraint (1.8) (green) is inactive for the optimal solution x∗ , see Figure 1.7b.
The intersection of two (linearly independent) active constraints form a vertex of the feasible region
in R2 . We will see later, that it suffices to consider vertices as candidates for optimal solutions.
In the next example, we consider a production planning problem.
Example 1.7 (Production planning). A paint factory produces exterior and interior paint from
raw materials M1 and M2, see Table 1.4. The maximum demand for interior paint is 2 tons/day.
Moreover, the amount of interior paint produced cannot exceed that of exterior paint by more than
1 ton/day.
Our goal is to determine optimal paint production.
material (ton)/paint (ton)

exterior paint interior paint daily availability (ton)
material M1 6 4 24
material M2 1 2 6
profit ($1000 /ton) 5 4
Table 1.4: Paint shop problem data.
To model the problem mathematically, we follow three steps:

x1 - amount (ton) of exterior paint
x2 - amount (ton) of interior paint
max z = 5x1 + 4x2
6x1 + 4x2 ≤ 24 (M1 avail.)

x1 + 2x2 ≤ 6 (M2 avail.)
x2 ≤ x1 + 1 (dem. int. paint )
x2 ≤ 2 (dem. ext. paint)
x1 , x2 ≥ 0
The complete (reformulated) model is:
max z = 5x1 + 4x2 (1.10)

s.t. 6x1 + 4x2 ≤ 24 (1.11)
x1 + 2x2 ≤ 6 (1.12)
x2 − x1 ≤ 1 (1.13)
x2 ≤ 2 (1.14)
x1 , x2 ≥ 0 (1.15)
The model could be also compactly represented in matrix form as
max z = c⊤ x
Ax ≤ b
x≥0
where
 
6 4
1 2
c = (5, 4)⊤ , x = (x1 , x2 )⊤ , A =   and b = (24, 6, 1, 2)⊤ .
 
−1 1
0 1
The feasible region as well as the level sets {(x1 , x2 ) : 5x1 + 4x2 = z} for the objective function are
given in Figure 1.8a and Figure 1.8b, respectively.
6 6
6x1 + 4x2 24 Feasible region
5 x1 + 2x2 6 5 6x1 + 42x2 = 24
x1 + x2 1 x1 + 2x2 = 6
x2 2 x1 + x2 = 1
4 4 z = 15 x2 = 2
z = 5x1 + 4x2
3 3
x2
2 x2 2
z=5
1 1
0 1 4 5 0 1 4 5
0 2 3 6 0 2 3 6
x1 x1
(a) Feasible region. (b) Level sets.
6
4 z = 15 z
3
x2
2 z ** = 21*
z=5 x = (x1 = 3, x2* = 1.5)
1
0 1 4 5
0 2 3 6
x1
(c) Gradient and optimal solution.
Figure 1.8: Graphical representation of the production planning problem.
The direction in which the objective function increases, we can consider the gradient ∇z =
∂z
( ∂x , ∂z )⊤ = (5, 4)⊤ , see Figure 1.8c. Moving the level set in the direction of ∇z as long as
1 ∂x2
the intersection with the feasible region is non-empty, gives us an optimal solution.
Remark. In case of minimisation instead of maximisation, we move towards −∇z.
1.4.1 Graphical sensitivity analysis

An optimal solution to an optimization problem depends on the parameters of the problem, i.e.,
the coefficients in both the objective function and the constraints. However, these coefficients
might not be exact and we might be interested in the effect of changing these parameters. Thus,
we are conducting a sensitivity analysis.
We can use the graphical representation of an optimization problem to calculate the marginal value
of a resource. However, the analysis only holds for a given set of active (and inactive) constraints
if performed for an individual element.
Using sensitivity analysis we can enquire about the following:
1. For which changes in the coefficients of the objective function (c) does this vertex remains
optimal?
2. For which changes in the right-hand side (b) does this set of active constraints remain optimal?
Remark. Note that, when changing the right-hand side b, the optimal solution x, i.e., the coordi-
nates of the optimal solution, will change, but not the intersection of active constraints that forms
it.
We conduct a sensitivity analysis for the production planning problem in Example 1.7.
Example 1.7 (Production planning, continued). We first consider the case where the coefficients
of the objective functions c can change. For which changes does the optimal vertex and the
corresponding solution x∗ remain optimal?
Let z ′ = c1 x1 + c2 x2 be the perturbed objective function. Then x∗ remains optimal if the slope of
z ′ lies between (1.10) (blue) and (1.11) (orange), see Figure 1.9a. This is the same as requiring
1 c1 6
2 ≤ c2 ≤ 4 .
Next we consider changes in the right-hand side b. For which changes in bi , i ∈ {1, 2, 3}, does the
set of active constraints remain optimal?
We first consider changing b1 , i.e., the right-hand side of (1.10), restricting the availability of M1,
see Figure 1.9b. The set of active constraints remains the same for changes in b1 between 20 and
36 (pre-calculated). Then, the marginal value y1 for b1 ∈ [20, 36] is
∆z z(D) − z(G)
=
∆b1 b1 (D) − b1 (G)
= 750 ($/ton).
Next, we consider changing b2 , i.e., the right-hand side of (1.11), restricting the availability of M2,
see Figure 1.9c. The set of active constraints remains the same for changes in b2 between 4 and 20 3
(pre-calculated). Then, the marginal value y2 for b2 ∈ [4, 6 32 ] is
∆z z(H) − z(F )
=
∆b2 b2 (H) − b2 (F )
= 500 ($/ton).
Note that the remaining constraints are inactive. When the corresponding right-hand side is
changed a little, there is thus no influence on the optimal solution. However, it can happen that
the right-hand side is changed so much that the originally optimal solution is no longer feasible,
and thus also no longer optimal.
6 6
Feasible region Feasible region
6x1 + 4x2 = 24 6x1 + 4x2 = 36
5 x1 + 2x2 = 6 5 6x1 + 4x2 = 20
x1 + x2 = 1 x1 + 2x2 = 6
x2 = 2 x1 + x2 = 1
4 z = 5x1 + 4x2 4 x2 = 2
3 3
x2
2 z * = 21 x2 2 (C) (D) z ** = 21*

x * = (x1* = 3, x2* = 1.5) x = (x1 = 3, x2* = 1.5)
(E)
1 1 (B)
(A) (F) (G)

0 1 4 5 0 1 4 5
0 2 3 6 0 2 3 6
x1 x1
(a) Changing the objective function c. (b) Changing right-hand side coefficient b1 .
6
Feasible region
6x1 + 4x2 = 24
5 x1 + 2x2 = 20/3
x1 + 2x2 = 4
x1 + x2 = 1
4 x2 = 2
3
x2
2 (H)
(C) (D) z ** = 21*
x = (x1 = 3, x2* = 1.5)
(E)
1 (B)
(A) (F) (G)

0 1 4 5
0 2 3 6
x1
(c) Changing the objective function c.
Figure 1.9: Sensitivity analysis for the production planning problem.

Chapter 2
Linear programming
In this chapter, we consider linear optimization problems and the simplex method for solving
linear optimization problems. In order to apply the simplex method, we have to reformulate LPs
in standard form.
2.1 Representations of linear programs

Definition 2.1. From Definition 1.1, we get linear optimization problem in the following general
form.
(P ) : min c⊤ x
s.t. a⊤
j x ≤ bj , j = 1, . . . , m1
a⊤
j x ≥ bj , j = m1 + 1, . . . , m2
a⊤
j x = bj , j = m2 + 1, . . . , m
xi ≥ 0, i ∈ I1
xi ≤ 0, i ∈ I2
xi ∈ R, i ∈ I3
with c ∈ Rn , aj ∈ Rn , bj ∈ R, j = 1, . . . m, I1 ∪ I2 ∪ I3 = {1, . . . , n} and I1 , I2 , I3 pairwise disjoint.

Note that you can rewrite this in matrix form as
(P ) : min c⊤ x
s.t. A1 x ≤ b1
A2 x ≥ b2
A3 x = b3
xi ≥ 0, i ∈ I1
xi ≤ 0, i ∈ I2
xi ∈ R, i ∈ I3
with
A1 ∈ Rm1 ×n , b1 ∈ Rm1 , A2 ∈ R(m2 −m1 )×n , b2 ∈ Rm2 −m1 , A3 ∈ R(m−m2 )×n , b3 ∈∈ Rm−m2 .
To ease the notation for the following chapter, we additionally define a standard form for linear
programs as well as a ≤-form.
21
22 Chapter 2. Linear programming
Definition 2.2. We call a linear program a linear program in standard form if
(P ) : max c⊤ x
Ax = b
x≥0
with A ∈ Rm×n , b ∈ Rm
+.
Definition 2.3. We call a linear program a linear program in ≤-form if
(P ) : min c⊤ x
s.t. Ax ≤ b
x ∈ Rn
with A ∈ Rm×n , b ∈ Rm .
Remark. Note that these terms are not used consistently in the literature, i.e., there exist multiple,
slightly varying, standard forms.
Note that a linear program in general from can be transferred into standard form or ≤-form. Thus,
it suffices to consider LPs either in standard or ≤-form, depending on which is suited better.
Lemma 2.4. Let (P ) be a linear program in general form.
1. We can reformulate (P ) to (P ′ ) in ≤-form.
2. We can reformulate (P ) to (P ′′ ) in standard form.
Proof. We construct the corresponding linear program.
1. The ≤-inequality constraints in (P ) can be transferred to (P ′ ). For each ≥-inequality con-

straint
a⊤
j x ≥ bj
in (P ) we add a ≤-inequality constraints
−a⊤
j x ≤ −bj
to (P ′ ). For each equality constraint

a⊤
j x = bj
in (P ) we add two inequality constraints
a⊤
j x ≤ bj
−a⊤
j x ≤ −bj
to (P ′ ). For each variable xi ≥ 0 we add the constraint −xi ≤ 0 and for each variable xi ≤ 0
we add the constraint xi ≤ 0.
2. The equality constraints in (P ) can be transferred to (P ′′ ). To obtain equalities from in-

equalities, we include slack/ surplus variables:
• a⊤ ⊤ ⊤
j x ≤ bj becomes aj x + sj = bj with sj = bj − aj x and sj ≥ 0 (slack).
2.1. Representations of linear programs 23
• a⊤ ⊤ ⊤
j x ≥ bj becomes aj x − si = bj with sj = aj x − bj and sj ≥ 0 (surplus).
Variables from the standard form are processed as follows. Nonpositive variables xi ≤ 0 are
replaced with −yi , where yi ≥ 0. Unrestricted variables xi ∈ R are replaced with yi+ − yi− ,
where yi+ , yi− ≥ 0.
All equality constraints a⊤ ⊤
i x = bi with negative bi are replaces by −ai x = −bi .
To convert the minimisation to maximisation, min z = c⊤ x is replaced with max −z =
−c⊤ x. Notice the sign change for z.
Remark. Surplus variables are often also called slack variables.

Example 2.5. Consider the following optimization problem:
(P ) min z = 1x1 + 4x2

s.t. 5x1 + 1x2 ≥ 4
1x1 + 2x2 = 2
x1 ≤ 0
x2 ∈ R.
In ≤-form we get
(P ) min z = 1x1 + 4x2

s.t. − 5x1 − 1x2 ≤ −4
1x1 + 2x2 ≤ 2
− 1x1 − 2x2 ≤ −2
1x1 ≤ 0
x1 , x2 ∈ R.
To transform the problem to standard from, we first replace the variables: x1 is replaced by −y1 , x2
is replaced by y2+ − y2− . Additionally, we introduce a surplus variable s1 to handle the ≥ constraint
and negate the objective function. Thus we get
(P ) max −z = 1y1 − 4y2+ + 4y2−

s.t. − 5y1 + 1y2+ − 1y2− − 1s1 = 4
− 1y1 + 2y2+ − 2y2− = 2
y1 , y2+ , y2− , s1 ≥ 0.
2.2 Geometric and algebraic basics

In this section we consider a few basic geometric and algebraic concepts used for finding optimal
solutions of linear programs.
Definition 2.6. Let a ∈ Rn , b ∈ R.
• Ha,b = x ∈ Rn : a⊤ x = b is called the hyperplane corresponding to a and b.

≤ ≥
• Ha,b = x ∈ Rn : a⊤ x ≤ b , Ha,b = x ∈ Rn : a⊤ x ≥ b are the half-spaces generated by

hyperplane Ha,b .
Definition 2.7. A polyhedron P is the intersection of finitely many half-spaces, i.e.,
\
Q= x ∈ Rn : a⊤j x ≤ bj
j=1,...m
with aj ∈ Rn , bj ∈ R.
We use the convexity of polyhedra in the following.

Definition 2.8. A set M ⊂ Rn is convex, if for all x, y ∈ M and for all λ ∈ [0, 1]
λx + (1 − λ)y ∈ M
is satisfied.
Lemma 2.9. 1. Any half-space is convex.
2. The intersection of finitely many convex sets is convex.
3. Polyhedra are convex.
As we can reformulate any linear program into ≤-form, we can easily see that the feasible set F
of a linear program is a polyhedron and therefore convex.
Next, we consider the connection between extreme points and vertices of polyhedra.
Definition 2.10. Consider a polyhedron Q = j=1,...m x ∈ Rn : a⊤
T n
j x ≤ bj with aj ∈ R , bj ∈ R.
• A point z ∈ Q is called extreme point of Q if for x, y ∈ Q and λ ∈ (0, 1) with z = λx+(1−λ)y

it follows that z = x = y. (z cannot be written as a convex combination of other points in
Q.)
• For a point z ∈ Q, a constraint a⊤ j x ≤ bj is called active, if the inequality is satisfied with
equality at z, i.e., if a⊤
j z = b j .
• A point z ∈ Q is called vertex if n linearly independent constraint are active for z.
With this notation, we can formulate the fundamental theorem of linear optimisation.
Theorem 2.11 (Fundamental theorem of linear optimisation). Consider a linear program
(P ) max c⊤ x : Ax ≤ b, x ∈ Rn with feasible set F = Q. Note that Q is a polyhedron. Suppose Q
has at least one extreme point. If (P ) has an optimal solution, then (P ) has an optimal solution
that is an extreme point of Q.
2.3. Simplex method 25
Additionally, one can show the equivalence of vertices and extreme points of polyhedra.
Theorem 2.12 (Equivalence of extreme points and vertices). Let Q be a non-empty polyhedron.
Then z is an extreme point of Q if and only if z is a vertex of Q.
Using these two theorems, we have the fundamental ideas for the simplex method.
• It suffices to consider vertices (extreme points) of the feasible region when looking for optimal
solutions.
• There are only finitely many vertices, aswe have a total of m constraints, of which n have
to be active. Thus, there are at most mn vertices to consider.
Instead of iterating over all possible vertices, the simplex methods exploits a neighbourhood struc-
ture to speed up the process as detailed in the next section.
2.3 Simplex method

To formulate the simplex method, we consider linear programs in standard form. This allows for
solving systems of linear equations in order to find optimal solutions.
We additionally make the following assumption for the remaining chapter.
Assumption 2.13. We consider linear programs (P ) in standard from, i.e.,
(P ) : max c⊤ x
Ax = b
x≥0
with A ∈ Rm×n , b ∈ Rm
+ , where n ≥ m and rank A = m.
Remark. Assuming that rank A = m, i.e., that A has full rank, is not really a restriction. Suppose
rank a = k < m. Then there are two possibilities:
• rank (A|b) = rank A = k: In this case, there are redundant constraints in Ax = b, i.e.,
constraints that are a linear combination of other constraints. As we can remove these
constraints iteratively, we end up with a matrix of full rank eventually.
• rank (A|b) > rank A = k: In this case, the system Ax = b is infeasible and we do not need
to consider it further.
Definition 2.14. Consider a linear program in standard form (P ) max c⊤ x : Ax = b, x ≥ 0 .

• An index set B = {B(1), . . . , B(m)} ⊂ {1, . . . , n} is called basis if the corresponding columns
of A are linearly independent. We call the matrix corresponding to these columns AB .
• The variables xB(1) , . . . , xB(m) are called basic variables. We abbreviate this as xB =
(xB(1) , . . . , xB(m) ).
• The remaining indices N = {1, . . . , n} \ {B(1), . . . , B(m)} are called nonbasic indices. We
call the matrix corresponding to these columns AN .
• The variables xk , k ∈ N , are called nonbasic variables. We abbreviate this as xN = (xk )k∈N .
• Similarly, we write cB for the costs vector of the basic variables and cN for the cost vector
of the nonbasic variables.
Example 2.15. Consider the linear program
max 1x1 − 1x2 + 2x3

2x1 + 1x2 =4
3x1 + 2x3 = 2
x1 , x2 , x3 ≥ 0.
The corresponding matrix is

!
2 1 0
A= .
3 0 2
! !
1 0 2
One possible basis is B = {2, 3}. Then N = {1}, AB = , xB = (x2 , x3 )⊤ , AN = ,
0 2 3
xN = (x1 ), cB = (−1, 2)⊤ and cN = (1).
Remark. As the columns of AB are linearly independent for a basis B, they form a basis of Rm ,
i.e., the corresponding matrix is non-singular. Thus, fixing the nonbasic variables xN also fixes the
basic variables xB .
Definition 2.16. Let B be a basis of A. A solution x is called basic solution, if Ax = b and xi = 0

for all i ∈ N . A basic solution is called feasible if xi ≥ 0 for all i ∈ B.
Remark. Note that a linear program (P ) in standard form with m constraints and n variables
can be represented as a linear program (P ′ ) in ≤-form with 2m + n constraints and n variables.
A basic solution satisfies n linearly independent constraints in (P ′ ) with equality: of the 2m
corresponding to the equality constraints in (P ) only m are linearly independent and the n − m
linearly independent constraints corresponding to the nonbasic variables. Thus, a basic solution of
(P ) corresponds to a vertex in (P ′ ).
Therefore, we can reformulate the fundamental theorem of linear programming.
Theorem 2.17 (Fundamental theorem of linear programming – Reformulation). If

(P ) max{c⊤ x : Ax = b, x ≥ 0} has an optimal solution, (P ) has an optimal basic solution
(xB , xN ) = (A−1
B b, 0) for a basis B of A.
For a graphic interpretation, we consider the following example.
Example 2.18. Consider the following linear program (P ).
(P ) max z = 4x1 + 3x2

s.t. 2x1 + 1x2 ≤ 4 (2.1)
1x1 + 2x2 ≤ 4 (2.2)
x1 , x2 ≥ 0.
Reformulating the problem in standard form yields (P ′ ).

2.3. Simplex method 27
(P ′ ) max z = 4x1 + 3x2

s.t. 2x1 + 1x2 + 1s1 =4
1x1 + 2x2 + 1s2 = 4
x1 , x2 , s1 , s2 ≥ 0.
Note that any two columns of the corresponding matrix are linearly independent, such that any
two column indices form a basis. By setting the corresponding nonbasic variables to zero, we get
the six basic solutions (A) to (F) as given in Table 2.1 and Figure 2.1. All points are defined by
the intersection of two linearly independent, active constraints. Note that the points (A), (B), (D)
and (E) are feasible basic solutions, i.e., all variables are greater or equal to zero. They form the
vertices and thus the extreme points of the feasible region of (P ).
Point x1 x2 s1 s2 z
(A) 0 0 4 4 0
(B) 2 0 0 2 8
(C) 4 0 -4 0 16
(D) 4/3 4/3 0 0 28/3
(E) 0 2 2 0 6
(F) 0 4 0 -4 16
Table 2.1: Basic solutions.
4 (F) 2x1 + x2 = 4
x1 + 2x2 = 4
optimal
3
2 (E)
x2
(D)
1
0 (A) 1
(B)
4
(C)
0 2 3
x1
Figure 2.1: Basic solutions, feasible set and optimal solution.
When all basic solutions are known as in Example 2.18, we can determine an optimal solution by
iterating over all feasible basic solutions and picking the best one. However, we do not want to
precompute all basic solutions. Thus, the simplex method goes from a basis to an adjacent basis by
swapping a nonbasic variable and a basic variable until no further improvement can be observed in
the objective function or more precisely the reduced costs. Basis transformations such as swapping
a basic and a non-basic variable do not influence the feasible region of the linear program but we
have to make sure that the reduced costs are transformed in the same manner.
Definition 2.19. Let B be a basis of A. For a nonbasic variable xk , k ∈ N , the reduced costs are
given as
−1
c̄k = ck − c⊤
B AB Ak .
The reduced costs of nonbasic variables can be used to determine of a basic solution can be improved
upon.
Theorem 2.20. If x∗ is a feasible basic solution for basis B and the reduced costs satisfy c̄k ≤ 0 for
all nonbasic variables x∗k , k ∈ N , then x∗ is an optimal solution of (P ) max c⊤ x : Ax = b, x ≥ 0 .
Proof. Consider a feasible solution x = (xB , xN ). From Ax = (AB AN )(xB , xN ) = AB xB +

AN xN = b we get, as B is a basis,
xB = A−1 −1
B b − AB AN x N .
We can rewrite the costs c⊤ x as
c⊤ x = c⊤ ⊤
B x B + cN x N
−1 −1
= c⊤ ⊤
B (AB b − AB AN xN ) + cN xN
−1 ⊤ −1
= c⊤ ⊤
B AB b + (cN − cB AB AN ) xN .
| {z }
reduced costs
As B is a basis, fixing xN fixes the complete solution x = (xB , xN ) = (A−1 −1

B b − AB AN xN , xN ).
Thus, instead of solving (P ), we can solve
−1 ⊤ −1
(P ′ ) max c⊤ ⊤
B AB b + (cN − cB AB AN )xN
s.t. xN ≥ 0.
Here, xN ≥ 0 guarantees feasibility of x = (xB , xN ) as x∗ is a feasible basic solution and thus

xB = A−1 ⊤ ⊤ −1
B b ≥ 0. By assumption, we have c̄k ≤ 0, k ∈ N , i.e., (cN − cB AB AN ) ≤ 0. Thus, xN = 0
′ ∗ ∗ ∗ −1
is an optimal solution of (P ) and therefore x = (xB , xN ) = (AB b, 0) is optimal for (P ).
Remark. Note that we can get the reduced costs of nonbasic solutions in the simplex method
without computing the inverse of AB explicitly.
On the other hand, this shows how we can improve a given basis solution that does not have
nonpositive reduced costs: By choosing a nonbasic variable xk , e.g., the one with highest reduced
costs, and increasing its value to xk = δ, we (potentially) get a better solution. Note that when
changing xk , we have to recompute xB to ensure feasibility.
Lemma 2.21. Let ãi,k = (A−1 −1
B Ak )i and b̃i = (AB b)i . Let x̃ = (x̃B , x̃N ) be the solution defined
by increasing xk to δ by setting x̃k = δ ≥ 0 for a k ∈ N and x̃k′ = 0 for k ′ ∈ N \ {k}. Then x̃ is
feasible for ( )
b̃i
δ = min : ãi,k > 0 .
i=1,...,m ãi,k
2.4. Gauss-Jordan elimination and simplex tableaus 29
Proof. Note that by definition, x̃N ≥ 0. Also, by definition Ax̃ = b as we set x̃B = A−1
B b −
A−1
B A N x̃ N . Thus, we have to guarantee x̃ B ≥ 0.
−1
x̃B = AB b − A−1 −1 −1
B AN x̃N = AB b − AB Ak δ ≥ 0
⇐⇒ b̃i − ãi,k δ ≥ 0, i = 1, . . . , m
b̃i
If ãi,k ≤ 0 this is satisfied anyway. For ãi,k ≤ 0 we get δ ≤ ãi,k and thus the minimum as
claimed.
We can even show that this choice of x̃ does not decrease the objective value and ensures that x̃
is a feasible basic solution. We omit the technical proof here.
Theorem 2.22. Let B be a basis, c̄k > 0 for nonbasic variable xk and
( )
b̃r b̃i
δ= = min : ãi,k > 0
ãr,k i=1,...,m ãi,k
with the notation from above.

Then B ′ = B \ {B(r)} ∪ {k} is a basis with objective value at least as good as B and x̃ is the
corresponding feasible basic solution.
Thus, we now have the following information that we can use in the simplex method.
• Determine which nonbasic variable will enter the basis. Here, we choose the one which has
the highest reduced costs and thus can potentially improve the objective value the most.
• Determine which basic variable will leave the basis such that the objective value is not reduced
and the columns corresponding to the basis are linearly independent.
• Update the reduced costs and determine whether more simplex steps are necessary.
Two questions remain at this point:
1. How can we find a basis to start with?

2. How can we implement each Simplex step efficiently, without computing inverse matrices?
If the problem we are considering was transformed from a ≤-from with nonnegative right.hand
side b to standard from, we have an easy basis to start with: All original variables are chosen as
nonbasic variables and set to zero, all newly introduced slack variables form a standard basis of
Rn and are used as basic variable. Other cases are discussed in section 2.5.
The second question is discussed in the following section.
2.4 Gauss-Jordan elimination and simplex tableaus

Efficient implementations of the simplex method use Gauss-Jordan elimination to solve the equa-
tion system for a given basis instead of computing the inverse of AB .
1. The systems coefficient are laid as a matrix, including the objective function forming a so-
called tableau.
2. An identity (submatrix) is formed for the selected basis, which is equivalent to solving the
system for this basis.
3. Coefficients of basic variables are made zero in the objective function, such that the row
corresponds to the reduced costs.
4. Each new system solution is obtained performing elementary row operations (Gauss Jordan
elimination).
• Row permutation.
• Multiply a row by a non-zero scalar.
• Add to one row a scalar multiple of another.
Note that these operations do not change the feasible region of the corresponding linear
program.
Consider (P ) min{c⊤ x : Ax = b, x ≥ 0} with A = (Ā, I). A starting tableau, i.e., an optimal

tableau for the initial basis has the following form.
−c1 . . . − cm 0 ...0 0
Ā I b
Remark. Note that the first row of the tableau represents the equation
z − c⊤ x = 0.
Therefore, the tableau would need one more column at the beginning, representing a variable z
corresponding to the objective value. However, none of the transformations used in the simplex
method have any effect on this additional row such that we do not use it in our notation.
An optimal tableau for basis B has the following form.
−1 −1
−c̄ = −c⊤ + c⊤B AB A c⊤
B AB b
−1 −1
c̄ = c⊤ − c⊤
B AB A −c⊤
B AB b
A−1
B Ā A−1
B I A−1
B b
Thus, we can use the tableau to get the following information from the tableau as used in Theo-
rem 2.22:
• negative reduced costs: −c̄,

−1
• objective value for basic solution (xB , xN ): c⊤
B AB b,
• the transformed matrix: Ã = (A−1 −1

B Ā AB I),
2.4. Gauss-Jordan elimination and simplex tableaus 31
• the values of the basic variables: b̃ = A−1

B b.
Thus, we can pivot operations for changing from basis B to basis B ′ .
• Determine the entering variable xk (pivot column P C): negative coefficient −c̄k with largest
absolute value in first row.
n o
• Determine the leaving variable (pivot row P R): arg mini=1,...,m ãb̃i,k
i
: ãi,k > 0 .
• Use row operations of pivot row P R until the pivot column corresponds to basic vector ei
with a 0 in the first row corresponding to the negative reduced costs.
To test whether the new basis B ′ is optimal, we only need to consider the values of the negative
reduced costs in the first row. As long as there is a negative value, the steps above are iterated.
When all values are non-negative, an optimal solution is found.
Example 2.23. Consider again the linear program of Example 2.18. The initial tableau is:
x1 x2 x3 x4 Sol.
−c̄ -4 -3 0 0 0
x3 2 1 1 0 4
x4 1 2 0 1 4
Note that the first column contains information on the current basis and the first row marks the
variables with s1 = x3 , s2 = x4 .
The first pivot column belongs to x1 as −c̄1 is the negative coefficient with largest absolute value.
x1 x2 x3 x4 Sol.
−c̄ -4 -3 0 0 0
x3 2 1 1 0 4
x4 1 2 0 1 4
n o
b̃i
The leaving variable is x3 , the first basic variable: arg mini=1,...,m ãik : aik > 0 .
After performing suitable row operations, we obtain:
x1 x2 x3 x4 Sol. Operations
−c̄ 0 -1 2 0 8 + (4) × P R
x1 1 1/2 1/2 0 2 ×(1/2) : P R
x4 0 3/2 -1/2 1 2 + (−1) × P R
where the rwo operations are performed to turn P C into a part of the basis.
As there is still a negative entry in −c̄, the method proceeds...
x1 x2 x3 x4 Sol. b̃i /ãik

−c̄ 0 -1 2 0 8 -
x1 1 1/2 1/2 0 2 4
x4 0 3/2 -1/2 1 2 4/3
... reaching optimality at x∗B = (4/3, 4/3), z ∗ = 28/3.
x1 x2 x3 x4 Sol. Operations
−c̄ 0 0 5/3 2/3 28/3 + (1) × P R
x1 1 0 2/3 -1/3 4/3 + (−1/2) × P R
x2 0 1 -1/3 2/3 4/3 ×2/3 : P R
Graphically, process is given here.
4 (F) 2x1 + x2 = 4
x1 + 2x2 = 4
The initial basis results in the feasible basic so- 3
lution (A). To (A), both the basic solutions (B)
and (E) are adjacent.
2 (E)
x2
x1 x2 x3 x4 Sol.
−c̄ -4 -3 0 0 0 (D)
x3 2 1 1 0 4 1
x4 1 2 0 1 4
0 (A) 1
(B)
4
(C)
0 2 3
x1
4 (F) 2x1 + x2 = 4
x1 + 2x2 = 4
By choosing to make x1 a basic variable, we get 3

to the feasible basic solution (B).
2 (E)
x2
x1 x2 x3 x4 Sol.
−c̄ 0 -1 2 0 8
(D)
x1 1 1/2 1/2 0 2
x4 0 3/2 -1/2 1 2 1
0 (A) 1
(B)
4
(C)
0 2 3
x1
2.5. Artificial variables and feasible initial solutions 33
4 (F) 2x1 + x2 = 4
x1 + 2x2 = 4
By choosing to make x2 a basic variable, we get
z = 4x1 + 3x2
to the feasible basic solution (D). As all negative
3
reduced costs are positive, (D) is an optimal so-
2 (E)
x2
lution.
x1 x2 x3 x4 Sol.
(D) xz * == 28/3
* (4/3, 4/3)
−c̄ 0 0 5/3 2/3 28/3
x1 1 0 2/3 -1/3 4/3 1
x2 0 1 -1/3 2/3 4/3
0 (A) 1
(B)
4
(C)
0 2 3
x1
To end this section, we summarize the simplex method in algorithmic form.
Algorithm 1 Simplex method

1: initialise. Convert problem to standard form, if needed. Form basis B.
2: while there are negative elements in row −c̄ for any j = {1, . . . , n} do
3: Select entering variable: k = arg minj∈1,...,n {−c̄
n j} o
4: Select leaving variable: iP R = arg mini=1,...,m ãb̃iki : aik > 0
5: Perform row operations: ãiP R k = 1, ãik = 0 for i = 1, . . . , m : i ̸= iP R
6: B = B ∪ {k} \ {iP R }
7: end while
8: return B, xi = b̃i for i ∈ B, xj = 0 for j ∈ {1, . . . , n} \ B.
Remark. For minimisation problem, we consider positive elements in row −c̄ and choose the largest
of these values.
Modern implementations of the simplex method rely on efficient computational algebra (factorisa-
tion) and a minimum representation of the problem (see revised simplex method).
n

In theory, the simplex method is an algorithm with exponential runtime. A total of m vertices
might need to be visited.
Algorithm 1 does not necessarily terminate. It is possible that the same vertices are visited over
and over again in a loop. We get to this case later.
2.5 Artificial variables and feasible initial solutions

For a linear program in ≤-form with nonnegative right-hand side b, the origin (0, . . . , 0)⊤ ∈ Rn
in the decision-variable space is a feasible solution. Transforming the LP into standard form thus
gives us the origin as trivial basic feasible solution. However, for LPs in general form, the origin is
not necessarily feasible.
Example 2.24. The origin is not feasible for the following LP.
min z = 4x1 + x2
s.t. 3x1 + x2 = 3
4x1 + 3x2 ≥ 6
x1 + 2x2 ≤ 4
x1 , x2 ≥ 0.
Feasible region
4 3x1 + x2 = 3
4x1 + 3x2 = 6
x1 + 2x2 = 4
3
x2
0
0 1 2 3 4
x1
Figure 2.2: Feasible region.
To circumvent this issue, we rely on artificial variables, which have the role of accumulating infea-
sibility.
Each (≥)- or (=)-constraint is augmented with an artificial variable. Then, we minimise their
value, i.e., the total infeasibility.
Example 2.24 (continued). To get to a linear program in standard from, we first add a slack and
a surplus variable.
min z = 4x1 + x2
s.t. 3x1 + x2 = 3
4x1 + 3x2 − x3 = 6
x1 + 2x2 + x4 = 4
x1 , x2 , x3 , x4 ≥ 0
As, however, the origin is no feasible basic solution, we add two further artificial variables r1 and
r2 .
min z = 4x1 + x2
s.t. 3x1 + x2 + r1 = 3
4x1 + 3x2 − x3 + r2 = 6
x1 + 2x2 + x4 = 4
x1 , x2 , x3 , x4 , r1 , r2 ≥ 0.
Now it is easy to construct a feasible basic solution: x1 = x2 = x3 = 0 (nonbasic variables),

r1 = 3, r2 = 6, x4 = 4 (basic variables).
If a solution with zero infeasibility (i.e., artificial variables are nonbasic) is found, a basic feasible
solution is available. If the minimal (optimal) accumulated infeasibility is not zero (has basic
artificial variables), no basic feasible solution exists.
2.5.1 The M-method

The first method to find a feasible basic solution is to include sufficiently large penalties for the
artificial variables in the objective function. For minimisation problems, we use large positive
penalties, for maximisation, large negative penalties.
Example 2.24 (continued). As we are considering a minimisation problem, we add large positive
penalties, M > 0. For the remainder of the example, we set M = 100.
min z = 4x1 + x2 + M r1 + M r2
s.t. 3x1 + x2 + r1 = 3
4x1 + 3x2 − x3 + r2 = 6
x1 + 2x2 + x4 = 4
x1 , x2 , x3 , x4 , r1 , r2 ≥ 0.
Note that we cannot simply build a feasible initial tableau from this formulation, as now there are
basic variable, for which the negative reduced costs are not zero.
x1 x2 x3 r1 r2 x4 Sol.
−c̄ -4 -1 0 -100 -100 0 0
r1 3 1 0 1 0 0 3
r2 4 3 -1 0 1 0 6
x4 1 2 0 0 0 1 4
Thus, we first have to do some row operations, to get the following correct initial tableau.
x1 x2 x3 r1 r2 x4 Sol. Operation
−c̄ 696 399 -100 0 0 0 900 + 100 × Rr1 + 100 × Rr2
r1 3 1 0 1 0 0 3 (Rr1 )
r2 4 3 -1 0 1 0 6 (Rr2 )
x4 1 2 0 0 0 1 4
Now, we can proceed with the method as before. Note that we are solving a minimisation problem.
−c̄ 696 399 -100 0 0 0 900
r1 3 1 0 1 0 0 3
r2 4 3 -1 0 1 0 6
x4 1 2 0 0 0 1 4
−c̄ 0 167 -100 -232 0 0 204
x1 1 1/3 0 1/3 0 0 1
r2 0 5/3 -1 -4/3 1 0 2
x4 0 5/3 0 -1/3 0 1 3
Eventually we obtain: x∗ = (2/5, 9/5), z ∗ = 17/5.
Feasible region
4 3x1 + x2 = 3
4x1 + 3x2 6
x1 + 2x2 4
3 z = 3x1 + 2x2
x2
2 x ** = (2/5, 9/5)
z = 17/5
0
0 1 2 3 4
x1
Figure 2.3: Optimal solution.
2.5.2 Two-phase method
Alternatively, we can use the two-phase method to find a feasible basic solution. The advantage of
the two-phase method is that it does not need parametrisation. It is more often used in modern
solvers and uses an artificial objective function measuring infeasibility.
Example 2.24 (continued). Consider again the example from before. instead of augmenting
the objective function, we replace it by a minimisation problem that only minimises the artificial
variables and thus need no parametrisation.
min z = r1 + r2
s.t. 3x1 + x2 + r1 = 3
4x1 + 3x2 − x3 + r2 = 6
x1 + 2x2 + x4 = 4
x1 , x2 , x3 , x4 , r1 , r2 ≥ 0.
Note that we cannot simply build a feasible initial tableau from this formulation, as now there are
basic variable, for which the negative reduced costs are not zero.
−c̄ 0 0 0 -1 -1 0 0
r1 3 1 0 1 0 0 3
r2 4 3 -1 0 1 0 6
x4 1 2 0 0 0 1 4
Thus, we first have to do some row operations, to get the following correct initial tableau.
x1 x2 x3 r1 r2 x4 Sol. Operation
−c̄ 7 4 -1 0 0 0 9 + Rr1 + Rr2
r1 3 1 0 1 0 0 3 (Rr1 )
r2 4 3 -1 0 1 0 6 (Rr2 )
x4 1 2 0 0 0 1 4
A few iterations of simplex methods takes us from this tableau to this optimal tableau, in which
the total infeasibility is zero, see Figure 2.4.
−c̄ 0 0 0 -1 -1 0 0
x1 1 0 1/5 3/5 -1/5 0 3/5
x2 0 1 -3/5 -4/5 3/5 0 6/5
x4 0 0 1 1 -1 1 1
As a basic feasible solution is available, the second phase proceeds. The second phase consists of
applying the simplex method from the basic feasible solution obtained from the first-phase. We
can remove all artificial variables and reintroduce the objective function, rewriting it accordingly.
−c̄ 0 0 0 -1 -1 0 0
x1 1 0 1/5 3/5 -1/5 0 3/5
x2 0 1 -3/5 -4/5 3/5 0 6/5
x4 0 0 1 1 -1 1 1
By reintroducing the objective function, we again construct a tableau that is not corresponding to
a feasible initial tableau.
x1 x2 x3 x4 Sol.
−c̄ -4 -1 0 0 0
x1 1 0 1/5 0 3/5
x2 0 1 -3/5 0 6/5
x4 0 0 1 1 1
Thus, we have to apply some row operations to get the following correct initial tableau.
x1 x2 x3 x4 Sol.
−c̄ 0 0 1/5 0 18/5
x1 1 0 1/5 0 3/5
x2 0 1 -3/5 0 6/5
x4 0 0 1 1 1
Applying the simplex method reaches the same solution as before.
Feasible region
4 3x1 + x2 = 3
4x1 + 3x2 = 6
x1 + 2x2 = 4
3
x2
2
x = (3/5, 6/5)
z = 18/5
1
0
0 1 2 3 4
x1
Figure 2.4: Feasible region and optimal solution of Phase 1.
2.6 Special cases

When solving linear programs, we have to consider four types of special cases: degenerate basic
solutions, non-unique optimal solutions, infeasibility and unboundedness.
First we consider degeneracy. This occurs when more than m quality constraints form a vertex.
In this case, we have a feasible basic solution where some basic variables also have value zero. We
can identify these cases by a tie in the ratio test.
2.6. Special cases 39
Example 2.25. Consider the following linear program
max z = 3x1 + 9x2

s.t. x1 + 4x2 ≤ 8
x1 + 2x2 ≤ 4
x1 , x2 ≥ 0
as represented in Figure 2.5.

4
Feasible region
x1 + 4x2 = 8
3 x1 + 2x2 = 4
x * = (0, 2); z * = 18
x ** = (0, 2)
z = 18
2
x2
0
0 1 2 3 4
x1
Figure 2.5: Feasible region and optimal solution.
By introducing slack variables x3 and x4 , we get the following tableau.
x1 x2 x3 x4 Sol.
−c̄ -3 -9 0 0 0
x3 1 4 1 0 8
x4 1 2 0 1 4
The ratio test for pivot column x2 leads to a tie, such that both x3 and x4 could be chosen as
leaving basic variables. Here, we arbitrarily choose x3 .
−c̄ -3/4 0 9/4 0 18

x2 1/4 1 1/4 0 2
x4 1/2 0 -1/2 1 0
This results in a basic variable x4 = 0. By continuing the process, the solution does not change,
i.e., the point (x1 , x2 ) = (0, 2) is “visited twice”.
−c̄ 0 0 3/2 3/2 18

x2 0 1 1/2 -1/2 2
x1 1 0 -1 2 0
Remark. Degeneracy can cause cycling of the simplex method, i.e., the algorithm does not ter-
minate. Simple rules (see Bland’s rule, for example) can prevent it at the cost of performance.
Modern codes interject conditional basis perturbation and shifting to prevent cycles.
Degeneracy can a symptom of redundancy in model specification.
5
Feasible region
x1 + 2x2 = 5
4 x1 + x2 = 4
z=2x_1 + 4x_2
3
x2
2
z=6
1
z=3
0 1 4 5
0 2 3
x1
Figure 2.6: Feasible region and level sets.
Next, we consider the case where multiple solutions are optimal. This is the case when the objective
function is parallel to an active constraint. While this means that there are infinitely many optimal
solutions, the method only visit the extreme points of the optimal set.
max z = 2x1 + 4x2

s.t. x1 + 2x2 ≤ 5
x1 + x2 ≤ 4
x1 , x2 ≥ 0.
as represented in Figure 2.6.

The solution is given by:
x1 x2 x3 x4 Sol.
−c̄ -2 -4 0 0 0
x3 1 2 1 0 5
x4 1 1 0 1 4
−c̄ 0 0 2 0 10
x2 1/2 1 1/2 0 5/2
x4 1/2 0 -1/2 1 3/2
−c̄ 0 0 2 0 10
x2 0 1 1 -1 1
x1 1 0 -1 2 3
In iteration 2 a nonbasic variable has negative reduced costs zero and thus can be made basic
without changing the objective value. For λ ∈ [0, 1], any (x1 , x2 ) = λ(0, 5/2) + (1 − λ)(3, 1) is
optimal, see also Figure 2.7.
Next, we consider unboundedness. In this case, the improvement of a solution is not constrained.
2.6. Special cases 41
5
Feasible region
x1 + 2x2 = 5
4 x1 + x2 = 4
z = 2x1 + 4x2
3 x * = (0, 5/2)
z * = 10
x2
2
z=6
x ** = (3, 1)
1 z = 10
z=3
0
0 1 2 3 4 5
x1
Unboundedness is often a caused by a model specification issue. It occurs when the feasible solution
contains an extreme ray along which the objective value improves.
max z = 2x1 + x2
s.t. x1 − x2 ≤ 10
2x1 ≤ 40
x1 , x2 ≥ 0
as illustrated in Figure 2.8.
30
Feasible region
x1 x2 = 10
2x1 = 40
20
x2
10
0
0 10 20 30
x1
Figure 2.8: Feasible region.
The corresponding initial tableau is the following.

x1 x2 x3 x4 Sol.
−c̄ -2 -1 0 0 0
x3 1 -1 1 0 10
x4 2 0 0 1 40
Consider the column of nonbasic variable x2 . Here, the negative reduced costs are negative, i.e.,
making x2 basic would improve the objective value. However, this column does not induce any
restriction on how much the x2 can be increased: ratio test “fails” as there is no positive entry in
the column. Thus, the objective value can be arbitrarily high and the problem is unbounde, see
also Figure 2.9.
30
Feasible region
x1 x2 = 10
2x1 = 40
z = 2x1 + x2
20
x2
10
z = 20 z = 40 z = 60
0
0 10 20 30
x1
Lastly, we consider infeasibility. Here, the feasible region is empty, which is usually also due to
poorly specified models. We can identify infeasible model using the two-phase method or the
M-method by adding artificial variables.
Note that infeasibility does not occur for model in ≤-form when b ∈ Rm
+.
max z = 3x1 + 2x2

s.t. 2x1 + x2 ≤ 2
3x1 + 4x2 ≥ 12
x1 , x2 ≥ 0.
as illustrated in Figure 2.8.

By adding slack and surplus variables x3 , x4 as well as an artificial variable r1 for the ≥-constraint,
we get the following program
2.7. Duality in linear programming 43
4
2x1 + x2 2
3x1 + 4x2 12
3
x2
1
0 1 4
0 2 3
x1
Figure 2.10: Constraints, empty feasible region.
max z = 3x1 + 2x2

s.t. 2x1 + x2 ≤ 2
3x1 + 4x2 ≥ 12
x1 , x2 ≥ 0.
Solving the problem is with artificial objective min r1 we get the following tableaus.
x1 x2 x3 x4 r1 Sol.
−c̄ 3 4 0 -1 0 12
x3 2 1 1 0 0 2
r1 3 4 0 -1 1 12
−c̄ -5 0 -4 -1 0 4
x2 2 1 1 0 0 2
r1 -5 0 -1 -4 1 4
In the optimal solution, the artificial variable r1 is basic and strictly positive. Thus, the objective
value is positive as well, indicating that no feasible solution exists.
2.7 Duality in linear programming

Duality is a theoretical framework that allows us to study properties of a primal problem using a
dual counterpart. While there are several types of duality, we focus on Lagrangian duality. Note
that various advanced techniques in mathematical programming rely to some extent on duality.
Some efficient numerical methods exploit primal-dual relationships, e.g. the dual simplex and the
primal-dual interior point method.

2024 03 07 Lecture Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2024 03 07 Lecture Notes

Uploaded by

Copyright:

Available Formats

Introduction to Optimisation

based on lecture notes by

1.1 What is optimisation?

Figure 1.1: Algorithm engineering cycle.

• variables, representing decisions. These can be business decisions, parameter definitions,

• analysing properties of functions/extreme points or

Figure 1.2: Example for a numerical optimization method.

1.2 Types of optimisation problems/ mathematical program-

Definition 1.1 (Optimization problem in general form). Let x ∈ Rn be a vector of (decision)

Remark. Depending on the problem at hand, we might be interested in maximising or minimising

• x ∈ X is called feasible solution if

gi (x) ≤ 0, for all i = 1, . . . , m

• The feasible region F of P is the set of all feasible solutions, i.e.,

F = {x : gi (x) ≤ 0, i = 1, . . . , m and hj (x) = 0, j = 1, . . . , l} .

• We call P infeasible if no feasible solution exists, i.e., if F = ∅.

f (x∗ ) ≤ f (y) for all y ∈ F.

• Linear programming (LP): linear f (x) := c⊤ x with c ∈ Rn ; constraint functions gi (x)

1.3 Modelling real-world problems using optimisation

1. Determine what needs to be decided (decision variables)

Table Chair Available (per week)

2. How solutions are assessed (objective function)

3x1 + 5x2 ≤ 40 (available labour)

The complete model then is:

max z = 800x1 + 600x2 (profit)

• fractional number of chairs/ tables;

• is the model linear ?

14 3x1 + 5x2 = 40 14 3x1 + 5x2 40

Figure 1.3: Carpentry example.

Figure 1.4: Problem data: factories and clients.

1. Determine what needs to be decided (decision variables)

where cij is the unit transportation cost from i to j.

3. The requirements that must be satisfied (constraints)

where Ci is the production capacity of factory i and Dj is the demand of client j.

The complete model:

Or, more compactly, in the so called algebraic (symbolic) form

The next example is a calssification problem used in machine learning (ML).

(a) Input. (b) Optimal solution.

Figure 1.5: Classification problem.

f (xi ) < 0, xi ∈ I − and f (xi ) > 0, xi ∈ I + .

Thus, we obtain the following non-linear model

However, we can translate this to a linear model, by introducing slack variables.

If xi ∈ I − is correctly classified, we get the following.

If xi ∈ I − is not correctly classified, we get

a⊤ xi − b > 0 ⇒ a⊤ xi − b − (a⊤ xi − b) ≤ 0 ⇒ e− (xi ; a, b) = a⊤ xi − b =: ui ≥ 0.

If xi ∈ I + is correctly classified, we get the following.

If xi ∈ I − is not correctly classified, we get

We therefore get the following complete linear model.

An optimal solution is given in Figure 1.5b.

1.4 Modelling problems as LP and graphical solution ap-

• metal alloy production;

Feedstuff Protein Fibre Cost($/lb)

Table 1.3: Protein, fibre and cost per lb of feedstuff

Our goal is to determine the optimal feed mix composition.

1. Determine what needs to be decided (decision variables)

2. How solutions are assessed (objective function)

Minimise the costs

3. The requirements that must be satisfied (constraints)

x1 + x2 ≥ 800 (min. feed amount)

The complete model is:

min z = 0.30x1 + 0.90x2

The reformulated model is:

min z = 0.30x1 + 0.90x2 (1.5)

Let us solve it graphically in the decision-variable space (x1 , x2 ).