Supplementary Notes to
Networks and Integer Programming
Jesper Larsen Jens Clausen
Kongens Lyngby 2009
Technical University of Denmark
Department of Management Engineering
DK2800 Kongens Lyngby, Denmark
www.man.dtu.dk
ISSN 09093192
Preface
This is a draft version. This volume is a supplement to Wolseys ”Integer Pro
gramming” for the course Networks and Integer Programming (42113) at the
Department of Management Engineering at the Technical University of Den
mark.
Whereas Wolsey does an excellent job of describing the intricacies and challenges
of general integer program and the techniques necessary to master the ﬁeld,
he basically ignores the ﬁeld on network optimization. We do not claim to
master this in total but we address the most important problems in network
optimization in relation to Wolseys text.
We have included two supplementary readings in form of a chapter on duality
and branch and bound. Finally an appendix contains a small introduction to
the interactive part of CPLEX.
Kgs. Lyngby, February 2009
Jesper Larsen Jens Clausen
ii
Contents
Preface i
1 Introduction 1
1.1 Graphs and Networks . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Teaching Duality in Linear Programming – The Multiplier Ap
proach 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 A blending example . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 General formulation of dual LP problems . . . . . . . . . . . . . 12
2.4 Discussion: Pros and Cons of the Approach . . . . . . . . . . . . 18
I Network Optimization 21
3 The Minimum Spanning Tree Problem 23
iv CONTENTS
3.1 Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Kruskal’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Prim’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Supplementary Notes . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 The Shortest Path Problem 43
4.1 The BellmanFord Algorithm . . . . . . . . . . . . . . . . . . . . 45
4.2 Shortest Path in Acyclic graphs . . . . . . . . . . . . . . . . . . . 48
4.3 Dijkstra’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Relation to Duality Theory . . . . . . . . . . . . . . . . . . . . . 55
4.5 Applications of the Shortest Path Problem . . . . . . . . . . . . . 57
4.6 Supplementary Notes . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Project Planning 69
5.1 The Project Network . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2 The Critical Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3 Finding earliest start and ﬁnish times . . . . . . . . . . . . . . . 76
5.4 Finding latest start and ﬁnish times . . . . . . . . . . . . . . . . 77
5.5 Considering TimeCost tradeoﬀs . . . . . . . . . . . . . . . . . . 81
5.6 Supplementary Notes . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
CONTENTS v
6 The Max Flow Problem 93
6.1 The Augmenting Path Method . . . . . . . . . . . . . . . . . . . 97
6.2 The PreﬂowPush Method . . . . . . . . . . . . . . . . . . . . . . 104
6.3 Applications of the max ﬂow problem . . . . . . . . . . . . . . . 110
6.4 Supplementary Notes . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7 The Minimum Cost Flow Problem 115
7.1 Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . 119
7.2 Network Simplex for The Transshipment problem . . . . . . . . . 122
7.3 Network Simplex Algorithm for the Minimum Cost Flow Problem 127
7.4 Supplementary Notes . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
II General Integer Programming 137
8 Dynamic Programming 139
8.1 The FloydWarshall Algorithm . . . . . . . . . . . . . . . . . . . 139
9 Branch and Bound 143
9.1 Branch and Bound  terminology and general description . . . . 146
9.2 Personal Experiences with GPP and QAP . . . . . . . . . . . . . 163
9.3 Ideas and Pitfalls for Branch and Bound users. . . . . . . . . . . 169
9.4 Supplementary Notes . . . . . . . . . . . . . . . . . . . . . . . . . 169
vi CONTENTS
9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
A A quick guide to GAMS – IMP 175
B Extra Assignments 181
Chapter 1
Introduction
1.1 Graphs and Networks
Network models are build from two major building blocks: edges (sometimes
called arcs) and vertices (sometimes called nodes). Edges are lines connecting
two vertices. A graph is a structure that is composed of vertices and edges.
A directed graph (sometime denoted digraph) is a graph in which the edges
have been assigned an orientation – often shown by arrowheads on the lines.
Finally, a network is a (directed) graph in which the edges have an associated
ﬂow. Table 1.1 gives a number of examples of the use of graphs.
Vertices Edges Flow
cities roads vehicles
switching centers telephone lines telephone calls
pipe junctions pipes water
rail junctions railway lines trains
Table 1.1: Simple examples of networks
A graph G = (V (G), E(G)) consists of a ﬁnite set of vertices V (G) and a set of
edges E(G) – for short denoted G = (V, E). Only where it is necessary we will
use the ”extended” form to describe a graph. E is a subset of ¦(v, w) : v, w ∈ V ¦.
2 Introduction
The number of vertices resp. edges in G is denoted n resp. m: [V (G)[ = n and
[E(G)[ = m.
Figure 1.1 shows a picture of a graph consisting of 9 vertices and 16 edges. In
stead of a graphical representation is can be stated as G = (V, E), where V =
¦1, 2, 3, 4, 5, 6, 7, 8, 9¦ and E = ¦(1, 2), (1, 3), (1, 4), (2, 5), (2, 6), (2, 7), (2, 8), (3, 4),
(3, 6), (4, 5), (4, 8), (5, 7), (5, 9), (6, 7), (6, 8), (8, 9)¦.
1
4
2
5
8
9
3
6
7
Figure 1.1: Graphical illustrations of graphs
An edge (i, j) ∈ E is incident with the vertices i and j. i and j are called
neighbors (or adjacent). In a complete (or fully connected) graph all
possible pairs of vertices are connected with an edge, i.e. E = ¦(i, j) : i, j ∈ V ¦.
We deﬁne V
+
(i) and V
−
(i) to be the set of edges out of i resp. into i, that is,
V
+
(i) = ¦(i, j) ∈ E : i ∈ V ¦ and V
−
(i) = ¦(j, i) ∈ E : j ∈ V ¦.
A subgraph H of G has V (H) ⊂ V (G) and E(H) ⊂ E(G), where E(H) ⊂
¦(i, j) : i, j ∈ V (H)¦. When A ⊂ E, G ` A is given by V (G ` A) = V (G)
and E(G ` A) = E ` A. When B ⊂ V , G ` B (or G[V ` B]) is given by
V (G ` B) = V (G) ` B and E(G ` B) = E ` ¦(i, j) : i ∈ B ∨ j ∈ B¦. G ` B
is also called the subgraph induced by V ` B. The subgraph H of G is
spanning if V (H) = V (G). An example of a subgraph in Figure 1.1 can be
V (H) = ¦1, 2, 3, 4¦ then the edges in the subgraphs are the edges of G where
both endpoints are in V (H) i.e. E(H) = ¦(1, 2), (1, 3), (1, 4), (3, 4)¦.
A path P in G is a sequence v
0
, e
1
, v
1
, e
2
, ..., e
k
, v
k
, in which each v
i
is a vertex,
each e
i
an edge, and where e
i
= (v
i−1
, v
i
), I = 1, ..., k. P is called a path from v
0
to v
k
or a (v
0
, v
k
)path. The path is said to be closed if v
0
= v
k
, edgesimple,
if all e
i
are diﬀerent, and simple, if all v
i
are diﬀerent. A chain is a simple
path. If we look at our graph in Figure 1.1 then 1, 4, 5, 9, 8 denotes a path, as
all vertices on the path are distinct it is a simple path (or chain). Furthermore
the path 1, 3, 6, 2, 7, 6, 2, 1 is a closed path.
1.2 Algorithms 3
A circuit or cycle C is a path, which is closed and where v
0
, ..., v
k−1
are all
diﬀerent. A dicircuit is a circuit, in which all edges are forward. An example
of a circuit in Figure 1.1 is 1, 3, 6, 2, 1 or 2, 6, 8, 2.
A graph G is connected, if a path from i to j exists for all pairs (i, j) of vertices.
If G is connected, v ∈ V , and G` v is not connected, v is called a cutvertex.
A circuitfree graph G is called a forest; if G is also connected G is a tree.
Let us return to the concept of orientations on the edges. Formally a directed
graph or digraph, G = (V (G), E(G)) consists of a ﬁnite set of vertices V (G) and
a set of edges E(G)  often denoted G = (V, E). E is a subset of ¦(i, j) : i, j ∈ V ¦.
Each edge has a startvertex (tail  t(i, j) = i) and an endvertex (head 
h(i, j) = j). The number of vertices resp. edges in G is denoted n resp. m :
[V (G)[ = n and [E(G)[ = m.
A edge (i, j) ∈ E is incident with the vertices i and j, and j is called a
neighbor to i and they are called adjacent.
In a complete (or fully connected) graph all edges are “present”, i.e E =
¦(i, j) : i, j ∈ V ¦. Any digraph has a underlying graph, which is found
by ignoring the direction of the edges. When graphterminology is used for
digraphs, these relates to the underlying graph.
A path P in the digraph G is a sequence v
0
, e
1
, v
1
, e
2
, ..., e
k
, v
k
, in which each
v
i
is a vertex, each e
i
an edge, and where e
i
is either (v
i−1
, v
i
) or (v
i
, v
i−1
),
I = 1, ..., k. If e
i
is equal to (v
i−1
, v
i
), e
i
is called forward, otherwise e
i
is
backward. P is called a directed path, if all edges are forward.
A graph G can be strongly connected, which means that any vertex is reachable
from any other vertex. When talking about directed graphs it means that
directions of the edges should be obeyed.
1.2 Algorithms
1.2.1 Concepts
An algorithm is basically a recipe that in a deterministic way describes how to
accomplish a given task.
show some pseudo code, explain and give a couple of small examples.
4 Introduction
1.2.2 Depthﬁrst search
1.2.3 Breadthﬁrst search
Chapter 2
Teaching Duality in Linear
Programming – The
Multiplier Approach
Duality in LP is often introduced through the relation between LP problems
modelling diﬀerent aspects of a planning problem. Though providing a good
motivation for the study of duality this approach does not support the general
understanding of the interplay between the primal and the dual problem with
respect to the variables and constraints.
This paper describes the multiplier approach to teaching duality: Replace the
primal LPproblem P with a relaxed problem by including in the objective
function the violation of each primal constraint multiplied by an associated
multiplier. The relaxed problem is trivial to solve, but the solution provides
only a bound for the solution of the primal problem. The new problem is hence
to choose the multipliers so that this bound is optimized. This is the dual
problem of P.
LP duality is described similarly in the work by A. M. Geoﬀrion on Lagrangean
Relaxation for Integer Programming. However, we suggest here that the ap
proach is used not only in the technical parts of a method for integer program
ming, but as a general tool in teaching LP.
6 Teaching Duality in Linear Programming – The Multiplier Approach
2.1 Introduction
Duality is one of the most fundamental concepts in connection with linear pro
gramming and provides the basis for better understanding of LP models and
their results and for algorithm construction in linear and in integer program
ming. Duality in LP is in most textbooks as e.g. [2, 7, 9] introduced using
examples building upon the relationship between the primal and the dual prob
lem seen from an economical perspective. The primal problem may e.g. be a
blending problem:
Determine the contents of each of a set of available ingredients (e.g. fruit or
grain) in a ﬁnal blend. Each ingredient contains varying amounts of vitamins
and minerals, and the ﬁnal blend has to satisfy certain requirements regarding
the total contents of minerals and vitamins. The costs of units of the ingredients
are given, and the goal is to minimize the unit cost of the blend.
The dual problem here turns out to be a problem of prices: What is the maxi
mum price one is willing to pay for “artiﬁcial” vitamins and minerals to substi
tute the natural ones in the blend?
After such an introduction the general formulations and results are presented
including the statement and proof of the duality theorem:
Theorem 2.1 (Duality Theorem) If feasible solutions to both the primal and
the dual problem in a pair of dual LP problems exist, then there is an optimum
solution to both systems and the optimal values are equal.
Also accompanying theorems on unboundedness and infeasibility, and the Com
plementary Slackness Theorem are presented in most textbooks.
While giving a good motivation for studying dual problems this approach has
an obvious shortcoming when it comes to explaining duality in general, i.e.
in situations, where no natural interpretation of the dual problem in terms of
primal parameters exists.
General descriptions of duality are often handled by means of symmetrical dual
forms as introduced by von Neumann. Duality is introduced by stating that two
LPproblems are dual problems by deﬁnition. The classical duality theorems are
then introduced and proved. The dual of a given LPproblem can then be found
by transforming this to a problem of one of the two symmetrical types and
deriving it’s dual through the deﬁnition. Though perfectly clear from a formal
point of view this approach does not provide any understanding the interplay
2.1 Introduction 7
between signs of variables in one problem and type of constraints in the dual
problem. In [1, Appendix II], a presentation of general duality trying to provide
this understanding is given, but the presentation is rather complicated.
In the following another approach used by the author when reviewing duality in
courses on combinatorial optimization is suggested. The motivating idea is that
of problem relaxation: If a problem is diﬃcult to solve, then ﬁnd a family of easy
problems each resembling the original one in the sense that the solution provides
information in terms of bounds on the solution of our original problem. Now
ﬁnd that problem among the easy ones, which provides the strongest bounds.
In the LP case we relax the primal problem into one with only nonnegativity
constraints by including in the objective function the violation of each primal
constraint multiplied by an associated multiplier. For each choice of multipliers
respecting sign conditions derived from the primal constraints, the optimal value
of the relaxed problem is a lower bound (in case of primal minimization) on the
optimal primal value. The dual problem now turns out to be the problem of
maximizing this lower bound.
The advantages of this approach are 1) that the dual problem of any given
LP problem can be derived in a natural way without problem transformations
and deﬁnitions 2) that the primal/dual relationship between variables and con
straints and the signs/types of these becomes very clear for all pairs of pri
mal/dual problems and 3) that Lagrangean Relaxation in integer programming
now becomes a natural extension as described in [3, 4]. A similar approach is
sketched in [8, 5].
The reader is assumed to have a good knowledge of basic linear programming.
Hence, concepts as Simplex tableau, basic variables, reduced costs etc. will
be used without introduction. Also standard transformations between diﬀerent
forms of LP problems are assumed to be known.
The paper is organized as follows: Section 2 contains an example of the ap
proach sketched, Section 3 presents the general formulation of duality through
multipliers and the proof of the Duality Theorem, and Section 4 discusses the
pros and cons of the approach presented. The main contribution of the paper is
not theoretical but pedagogical: the derivation of the dual of a given problem
can be presented without problem transformations and deﬁnitions, which are
hard to motivate to people with no prior knowledge of duality.
8 Teaching Duality in Linear Programming – The Multiplier Approach
Fruit type F1 F2 F3 F4
Preservative R 2 3 0 5
Preservative Q 3 0 2 4
Cost pr. ton 13 6 4 12
Table 2.1: Contents of R and Q and price for diﬀerent types of fruit
2.2 A blending example
The following example is adapted from [6]. The Pasta Basta company wants
to evaluate an ecological production versus a traditional one. One of their
products, the Pasta Basta lasagne, has to contain certain preservatives, R and
Q, in order to ensure durability. Artiﬁcially produced counterparts R’ and Q’
are usually used in the production – these are bought from a chemical company
PresChem, but are undesirable from the ecological point of view. R and Q can
alternatively be extracted from fresh fruit, and there are four types of fruit each
with their particular content (number of units) of R and Q in one ton of the
fruit. These contents and the cost of buying the fruit are speciﬁed in Table 2.1.
Pasta Basta has a market corresponding to daily needs of 7 units of R and 2 units
of Q. If the complete production is based on ecologically produced preservatives,
which types of fruit and which amounts should be bought in order to supply the
necessary preservatives in the cheapest way?
The problem is obviously an LPproblem:
(P)
min 13x
1
+6x
2
+4x
3
+12x
4
s.t. 2x
1
+3x
2
+5x
4
= 7
3x
1
+2x
3
+4x
4
= 2
x
1
, x
2
, x
3
, x
4
≥ 0
With x
2
and x
3
as initial basic variables the Simplex method solves the problem
with the tableau of Table 2.2 as result.
Natural questions when one knows one solution method for a given type of
problem are: Is there an easier way to solve such problems? Which LP problems
are trivial to solve? Regarding the latter, it is obvious that LP minimization
problems with no constraints but nonnegativity of x
1
, , x
n
are trivial to
solve:
2.2 A blending example 9
min c
1
x
1
+ +c
n
x
n
s.t. x
1
, , x
n
≥ 0
If at least one cost coeﬃcient is negative the value of the objective function
is unbounded from below (in the following termed “equal to −∞”) with the
corresponding variable unbounded (termed “equal to ∞”) and all other variables
equal to 0, otherwise it is 0 with all variables equal to 0.
The blending problem P of 2.2 is not of the form just described. However, such a
problem may easily be constructed from P: Measure the violation of each of the
original constraints by the diﬀerence between the righthand and the lefthand
side:
7 −(2x
1
+ 3x
2
+ 5x
4
)
2 −(3x
1
+ 2x
3
+ 4x
4
)
Multiply these by penalty factors y
1
and y
2
and add them to the objective
function:
(PR(y
1
, y
2
)) min
x
1
,··· ,x
4
≥0
13x
1
+ 6x
2
+ 4x
3
+ 12x
4
+ y
1
(7 −2x
1
−3x
2
−5x
4
)
+ y
2
(2 −3x
1
−2x
3
−4x
4
)
We have now constructed a family of relaxed problems, one for each value of
y
1
, y
2
, which are easy to solve. None of these seem to be the problem we actu
ally want to solve, but the solution of each PR(y
1
, y
2
) gives some information
regarding the solution of P. It is a lower bound for any y
1
, y
2
, and the maximum
of these lower bound turns out to be equal to the optimal value of P. The idea
of replacing a diﬃcult problem by an easier one, for which the optimal solution
provides a lower bound for the optimal solution of the original problem, is also
the key to understanding BranchandBound methods in integer programming.
x
1
x
2
x
3
x
4
Red. Costs 15/2 0 3 0 15
x
2
7/12 1 5/6 0 3/2
x
4
3/4 0 1/2 1 1/2
Table 2.2: Optimal Simplex tableau for problem LP
10 Teaching Duality in Linear Programming – The Multiplier Approach
Let opt(P) and opt(PR(.)) denote the optimal values of P and PR(.) resp.
Now observe the following points:
1. ∀y
1
, y
2
∈ {: opt(PR(y
1
, y
2
) ≤ opt(P)
2. max
y
1
,y
2
∈R
(opt(PR(y
1
, y
2
))) ≤ opt(P)
1) states that opt(PR(y
1
, y
2
) is a lower bound for opt(P) for any choice of y
1
, y
2
and follows from the fact that for any set of values for x
1
, , x
4
satisfying the
constraints of P, the values of P and PR are equal since the terms originating
in the violation of the constraints vanish. Hence opt(PR(y
1
, y
2
)) is found by
minimizing over a set containing all values of feasible solutions to P implying
1). Since 1) holds for all pairs y
1
, y
2
it must also hold for the pair giving the
maximum value of opt(PR(.)), which is 2). In the next section we will prove
that
max
y
1
,y
2
∈R
(opt(PR(y
1
, y
2
))) = opt(P)
The best bound for our LP problem P is thus obtained by ﬁnding optimal
multipliers for the relaxed problem. We have here tacitly assumed that P has
an optimal solution, i.e. it is neither infeasible nor unbounded  we return to
that case in Section 2.3.
Turning back to the relaxed problem the claim was that it is easily solvable for
any given y
1
, y
2
. We just collect terms to ﬁnd the coeﬃcients of x
1
, , x
4
in
PR(y
1
, y
2
)):
(PR(y
1
, y
2
)) min
x
1
,··· ,x
4
≥0
(13 −2y
1
−3y
2
) x
1
+ (6 −3y
1
) x
2
+ (4 −2y
2
) x
3
+ (12 −5y
1
−4y
2
) x
4
+ 7y
1
+2y
2
Since y
1
, y
2
are ﬁxed the term 7y
1
+2y
2
in the objective function is a constant.
If any coeﬃcient of a variable is less than 0 the value of PR is −∞. The lower
bound for opt(P) provided by such a pair of yvalues is of no value. Hence, we
concentrate on yvalues for which this does not happen. These pairs are exactly
those assuring that each coeﬃcient for x
1
, , x
4
is nonnegative:
2.2 A blending example 11
(13 −2y
1
−3y
2
) ≥ 0 ⇔ 2y
1
+3y
2
≤ 13
(6 −3y
1
) ≥ 0 ⇔ 3y
1
≤ 6
(4 −2y
2
) ≥ 0 ⇔ 2y
2
≤ 4
(12 −5y
1
−4y
2
) ≥ 0 ⇔ 5y
1
+4y
2
≤ 12
If these constraints all hold the optimal solution to PR(y
1
, y
2
) has x
1
, , x
4
all
equal to 0 with a value of 7y
1
+ 2y
2
which, since y
1
, y
2
are ﬁnite, is larger than
−∞. Since we want to maximize the lower bound 7y
1
+ 2y
2
on the objective
function value of P, we have to solve the following problem to ﬁnd the optimal
multipliers:
(DP)
max 7y
1
+2y
2
s.t. 2y
1
+3y
2
≤ 13
3y
1
≤ 6
2y
2
≤ 4
5y
1
+4y
2
≤ 12
y
1
, y
2
∈ {
The problem DP resulting from our reformulation is exactly the dual problem of
P. It is again a linear programming problem, so nothing is gained with respect to
ease of solution – we have no reason to believe that DP is any easier to solve than
P. However, the above example indicates that linear programming problems
appear in pairs deﬁned on the same data with one being a minimization and
the other a maximization problem, with variables of one problem corresponding
to constraints of the other, and with the type of constraints determining the
signs of the corresponding dual variables. Using the multiplier approach we
have derived the dual problem DP of our original problem P, and we have
through 1) and 2) proved the socalled Weak Duality Theorem – that opt(P) is
greater than or equal to opt(DP).
In the next section we will discuss the construction in general, the proof of the
Duality Theorem as stated in the introduction, and the question of unbound
edness/infeasibility of the primal problem. We end this section by deriving DP
as frequently done in textbooks on linear programming.
The company PresChem selling the artiﬁcially produced counterparts R’ and
Q’ to Pasta Basta at prices r and q is considering to increase these as much as
possible well knowing that many consumers of Pasta Basta lasagne do not care
about ecology but about prices. These customers want as cheap a product as
12 Teaching Duality in Linear Programming – The Multiplier Approach
possible, and Pasta Basta must also produce a cheaper product to maintain its
market share.
If the complete production of lasagne is based on P’ and Q’, the proﬁt of
PresChem is 7r + 2q. Of course r and q cannot be so large that it is cheaper
for Pasta Basta to extract the necessary amount of R and Q from fruit. For
example, at the cost of 13, Pasta Basta can extract 2 units of R and 3 units of
Q from one ton of F1. Hence
2r + 3q ≤ 13
The other three types of fruit give rise to similar constraints. The prices r and q
are normally regarded to be nonnegative, but the very unlikely possibility exists
that it may pay oﬀ to oﬀer Pasta Basta money for each unit of one preservative
used in the production provided that the price of the other is large enough.
Therefore the prices are allowed to take also negative values. The optimization
problem of PresChem is thus exactly DP.
2.3 General formulation of dual LP problems
2.3.1 Proof of the Duality Theorem
The typical formulation of an LP problem with n nonnegative variables and m
equality constraints is
min cx
Ax = b
x ≥ 0
where c is an 1 n matrix, A is an m n matrix and b is an n 1 matrix of
reals. The process just described can be depicted as follows:
2.3 General formulation of dual LP problems 13
min cx
Ax = b
x ≥ 0
→
max
y∈R
m¦min
x∈R
n
+
¦cx +y(b −Ax)¦¦ →
max
y∈R
m¦min
x∈R
n
+
¦(c −yA)x +yb¦¦ →
max yb
yA ≤ c
y free
The proof of the Duality Theorem proceeds in the traditional way: We ﬁnd a
set of multipliers which satisfy the dual constraints and gives a dual objective
function value equal to the optimal primal value.
Assuming that P and DP have feasible solutions implies that P can be neither
infeasible nor unbounded. Hence an optimal basis B and a corresponding opti
mal basic solution x
B
for P exists. The vector y
B
= c
B
B
−1
is called the dual
solution or the set of Simplex multipliers corresponding to B. The vector
satisﬁes that if the reduced costs of the Simplex tableau is calculated using y
B
as π in the general formula
¯ c = c −πA
then ¯ c
i
equals 0 for all basic variables and ¯ c
j
is nonnegative for all nonbasic
variables. Hence,
y
B
A ≤ c
holds showing that y
B
is a feasible dual solution. The value of this solution is
c
B
B
−1
b, which is exactly the same as the primal objective value obtained by
assigning to the basic variables x
B
the values deﬁned by the updated righthand
side B
−1
b multiplied by the vector of basic costs c
B
.
The case in which the problem P has no optimal solution is for all types of
primal and dual problems dealt with as follows. Consider ﬁrst the situation
where the objective function is unbounded on the feasible region of the problem.
Then any set of multipliers must give rise to a dual solution with value −∞
14 Teaching Duality in Linear Programming – The Multiplier Approach
(resp. +∞ for a maximization problem) since this is the only “lower bound”
(“upper bound”) allowing for an unbounded primal objective function. Hence,
no set of multipliers satisfy the dual constraints, and the dual feasible set is
empty. If maximizing (resp. minimizing) over an empty set returns the value
−∞ (resp.+∞), the desired relation between the primal and dual problem with
respect to objective function value holds – the optimum values are equal.
Finally, if no primal solution exist we minimize over an empty set  an operation
returning the value +∞. In this case the dual problem is either unbounded or
infeasible.
2.3.2 Other types of dual problem pairs
Other possibilities of combinations of constraint types and variable signs of
course exist. One frequently occurring type of LP problem is a maximization
problem in nonnegative variables with less than or equal constraints. The
construction of the dual problem is outlined below:
max cx
Ax ≤ b
x ≥ 0
→
min
y∈R
m
+
¦max
x∈R
n
+
¦cx +y(b −Ax)¦¦ →
min
y∈R
m
+
¦max
x∈R
n
+
¦(c −yA)x +yb¦¦ →
min yb
yA ≥ c
y ≥ 0
Note here that the multipliers are restricted to being nonnegative, thereby
ensuring that for any feasible solution, ˆ x
1
, , ˆ x
n
, to the original problem, the
relaxed objective function will have a value greater than or equal to that of the
original objective function since b−Aˆ x and hence y(b−Aˆ x) will be nonnegative.
Therefore the relaxed objective function will be pointwise larger than or equal
to the original one on the feasible set of the primal problem, which ensures
that an upper bound results for all choices of multipliers. The set of multipliers
minimizing this bound must now be determined.
Showing that a set of multipliers exists such that the optimal value of the re
laxed problem equals the optimal value of the original problem is slightly more
2.3 General formulation of dual LP problems 15
complicated than in the previous case. The reason is that the value of the re
laxed objective function no longer is equal to the value of the original one for
each feasible point, it is larger than or equal to this.
A standard way is to formulate an LP problem P
′
equivalent to the given prob
lem P by adding a slack variable to each of the inequalities thereby obtaining
a problem with equality constraints:
max cx
Ax ≤ b
x ≥ 0
=
max cx +0s
Ax +Is = b
x, s ≥ 0
Note now that if we derive the dual problem for P
′
using multipliers we end up
with the dual problem of P: Due to the equality constraints, the multipliers are
now allowed to take both positive and negative values. The constraints on the
multipliers imposed by the identity matrix corresponding to the slack variables
are, however,
yI ≥ 0
i.e. exactly the nonnegativity constraints imposed on the multipliers by the
inequality constraints of P. The proof just given now applies for P
′
and DP
′
,
and the nonnegativity of the optimal multipliers y
B
are ensured through the
sign of the reduced costs in optimum since these now satisfy
¯ c = (c 0) −y
B
(A I) ≤ 0 ⇔ y
B
A ≥ c ∧ y
B
≥ 0
Since P
′
and P are equivalent the theorem holds for P and DP as well.
The interplay between the types of primal constraints and the signs of the dual
variables is one of the issues of duality, which often creates severe diﬃculties in
the teaching situation. Using the common approach to teaching duality, often
no explanation of the interplay is provided. We have previously illustrated this
interplay in a number of situations. For the sake of completeness we now state
all cases corresponding to a primal minimization problem – the case of primal
maximization can be dealt with likewise.
First note that the relaxed primal problems provide lower bounds, which we
want to maximize. Hence the relaxed objective function should be pointwise
less than or equal to the original one on the feasible set, and the dual problem
16 Teaching Duality in Linear Programming – The Multiplier Approach
is a maximization problem. Regarding the signs of the dual variables we get the
following for the three possible types of primal constraints (A
i.
denotes the i’th
row of the matrix A):
A
i.
x ≤ b
i
For a feasible x, b
i
−A
i.
x is larger than or equal to 0, and y
i
(b
i
−A
i.
x)
should be nonpositive. Hence, y
i
should be nonpositive as well.
A
i.
x ≥ b
i
For a feasible x, b
i
−A
i.
x is less than or equal to 0, and y
i
(b
i
−A
i.
x)
should be nonpositive. Hence, y
i
should be nonnegative.
A
i.
x = b
i
For a feasible x, b
i
−A
i.
x is equal to 0, and y
i
(b
i
−A
i.
x) should be
nonpositive. Hence, no sign constraints should be imposed on y
i
.
Regarding the types of the dual constraints, which we previously have not ex
plicitly discussed, these are determined by the sign of the coeﬃcients to the
variables in the relaxed primal problem in combination with the sign of the
variables themselves. The coeﬃcient of x
j
is (c − yA)
j
. Again we have three
cases:
x
j
≥ 0 To avoid unboundedness of the relaxed problem (c − yA)
j
must be
greater than or equal to 0, i.e. the j’th dual constraint will be (yA)
j
≤ c
j
.
x
j
≤ 0 In order not to allow unboundedness of the relaxed problem (c −yA)
j
must be less than or equal to 0, i.e. the j’th dual constraint will be
(yA)
j
≥ c
j
.
x
j
free In order not to allow unboundedness of the relaxed problem (c−yA)
j
must be equal to 0 since no sign constraints on x
j
are present, i.e. the j’th
dual constraint will be (yA)
j
= c
j
.
2.3.3 The Dual Problem for Equivalent Primal Problems
In the previous section it was pointed out that the two equivalent problems
max cx
Ax ≤ b
x ≥ 0
=
max cx +0s
Ax +Is = b
x, s ≥ 0
give rise to exactly the same dual problem. This is true in general. Suppose
P is any given minimization problem in variables, which may be nonnegative,
nonpositive or free. Let P
′
be a minimization problem in standard form, i.e a
2.3 General formulation of dual LP problems 17
problem in nonnegative variables with equality constraints, constructed from P
by means of addition of slack variables to ≤constraints, subtraction of surplus
variables from ≥constraints, and change of variables. Then the dual problems
of P and P
′
are equal.
We have commented upon the addition of slack variables to ≤constraints in the
preceding section. The subtraction of slack variables are dealt with similarly. A
constraint
a
i1
x
1
+ +a
in
x
n
≥ b
i
⇔ (b
i
−a
i1
x
1
− −a
in
x
n
) ≤ 0
gives rise to a multiplier, which must be nonnegative in order for the relaxed
objective function to provide a lower bound for the original one on the feasible
set. If a slack variable is subtracted from the lefthand side of the inequality
constraint to obtain an equation
a
i1
x
1
+ +a
in
x
n
−s
i
= b
i
⇔ (b
i
−a
i1
x
1
− −a
in
x
n
) +s
i
= 0
the multiplier must now be allowed to vary over {. A new constraint in the
dual problem, however, is introduced by the column of the slack variable, cf.
Section 2.2:
−y
i
≤ 0 ⇔ y
i
≥ 0,
thereby reintroducing the sign constraint for y
i
.
If a nonpositive variable x
j
is substituted by x
′
j
of opposite sign, all signs in
the corresponding column of the Simplex tableau change. For minimization
purposes however, a positive sign of the coeﬃcient of a nonpositive variable is
beneﬁcial, whereas a negative sign of the coeﬃcient of a nonnegative variable
is preferred. The sign change of the column in combination with the change
in preferred sign of the objective function coeﬃcient leaves the dual constraint
unchanged.
Finally, if a free variable x
j
is substituted by the diﬀerence between two non
negative variables x
′
j
and x
′′
j
two equal columns of opposite sign are introduced.
These give rise to two dual constraints, which when taken together result in the
same dual equality constraint as obtained directly.
The proof of the Duality Theorem for all types of dual pairs P and DP of LP
problems may hence be given as follows: Transform P into a standard problem
P
′
in the well known fashion. P
′
also has DP as its dual problem. Since the
Duality Theorem holds for P
′
and DP as shown previously and P
′
is equivalent
to P, the theorem also holds for P and DP.
18 Teaching Duality in Linear Programming – The Multiplier Approach
2.4 Discussion: Pros and Cons of the Approach
The main advantages of teaching duality based on multipliers are in my opinion
• the independence of the problem modeled by the primal model and the
introduction of the dual problem, i.e. that no story has go with the dual
problem,
• the possibility to avoid problem transformation and “duality by deﬁnition”
in the introduction of general duality in linear programming,
• the clariﬁcation of the interplay between the sign of variables and the type
of the corresponding constraints in the dual pair of problems,
• the early introduction of the idea of getting information about the opti
mum of an optimization problem through bounding using the solution of
an easier problem,
• the possibility of introducing partial dualization by including only some
constraint violations in the objective function, and
• the resemblance with duality in nonlinear programming, cf. [3].
The only disadvantage in my view is one listed also as an advantage:
• the independence of the introduction of the dual problem and the problem
modelled by the primal model
since this may make the initial motivation weaker. I do not advocate that du
ality should be taught based solely on the multiplier approach, but rather that
it is used as a supplement to the traditional presentation (or vice versa). In
my experience, it oﬀers a valuable supplement, which can be used to avoid the
situation of frustrated students searching for an intuitive interpretation of the
dual problem in cases, where such an interpretation is not natural. The deci
sion on whether to give the traditional presentation of duality or the multiplier
approach ﬁrst of course depends on the particular audience.
Bibliography
[1] R. E. Bellman and S. E. Dreyfus, Applied Dynamic Programming, Prince
ton University Press, 1962.
[2] G. B. Dantzig, Linear Programming and Extensions, Princeton University
Press, 1963.
[3] A. M. Geoﬀrion, Duality in NonLinear Programming: A Simpliﬁed
ApplicationsOriented Approach, SIAM Review 13, 1 1  37, 1971.
[4] A. M. Geoﬀrion, Lagrangean Relaxation for Integer Programming, Math.
Prog. Study 2 82  114, 1974.
[5] M. X. Goemans, Linear Programming, Course Notes, 1994, available from
http://theory.lcs.mit.edu/ goemans.
[6] J. Krarup and C. Vanderhoeft, Belgium’s Best Beer and Other Stories,
VUB Press (to appear).
[7] K. G. Murty, Linear and Combinatorial Programming, John Wiley, 1976.
[8] H. P. Williams, Model Solving in Mathematical Programming, John Wiley,
1993.
[9] W. L. Winston, Operations Research  Applications and Algorithms, Int.
Thomson Publishing, 1994.
20 BIBLIOGRAPHY
Part I
Network Optimization
Chapter 3
The Minimum Spanning Tree
Problem
The minimum spanning tree problem is one of the oldest and most basic graph
problems in computer science and Operations Research. Its history dates back to
Boruvka’s algorithm developed in 1926 and still today it is an actively researched
problem. Boruvka, a Czech mathematician was investigating techniques to se
cure an eﬃcient electrical coverage of Bohemia.
Let an undirected graph G = (V, E) be given. Each of the edges are assigned a
cost (or weight or length) w, that is, we have a cost function w : E → R. The
network will be denoted G = (V, E, w). We will generally assume the cost to be
nonnegative, but as we will see it is more a matter of convenience.
Let us recall that a tree is a connected graph with no cycles, and that a spanning
tree is a subgraph in G that is a tree and includes all vertices of G i.e. V .
The minimum spanning tree (MST) problem calls for ﬁnding a spanning
tree whose total cost is minimum. The total cost is measured as the sum of the
costs of all the edges in the tree. Figure 3.1 shows an MST in a graph G. Note
that a network can have many MST’s.
24 The Minimum Spanning Tree Problem
1 2
3
4 5
6
4
4
1
2
3
6
3
2
(a) A network
1 2
3
4 5
6
4
4
1
2
3
6
3
2
(b) Not a spanning tree as
1, 4, 6 and 2, 3, 5 are not
connected
1 2
3
4 5
6
4
4
1
2
3
6
3
2
(c) Not a spanning tree as
the subgraph contains a cycle
2, 3, 5, 2
1 2
3
4 5
6
4
4
1
2
3
6
3
2
(d) A spanning tree with a
value of 18
1 2
3
4 5
6
4
4
1
2
3
6
3
2
(e) A minimum spanning tree
with a value of 12
Figure 3.1: Examples illustrating the concept of spanning and minimum span
ning tree.
3.1 Optimality conditions 25
We will denoted edges in the spanning tree as tree edges and those edges not
part of the spanning tree as nontree edges.
The minimum spanning tree problem arises in a wide number of applications
both as a “stand alone” problem but also as subproblem in more complex prob
lems.
As an example consider a telecommunication company planning to lay cables to
a new neighborhood. In its operations it is constrained to only lay cables along
certain connections represented by the edges. At the same time a network that
reaches all houses must be build (ie. a spanning tree). For each edge a cost of
digging down the cables have been computed. The MST gives us the cheapest
solution that connects all houses.
At some few occasions you might meet the maximum spanning tree problem
(or as Wolsey calls it the “maximum weight problem”). Solving the maximum
spanning tree problem, where we wish to maximize the total cost of the con
stituent edges, can be solved by multiplying all costs with −1 and the solve the
minimum spanning problem.
3.1 Optimality conditions
A central concept in understanding and proving the validity of the algorithms
for the MST is the concept of a cut in an undirected graph. A cut ( = ¦X, X
′
¦
is a partition of the set of vertices into two subsets. For X ⊂ V , denote the set
of arcs crossing the cut δX = ¦(u, v) ∈ E : u ∈ X, v ∈ V ` X¦. The notion of a
cut is shown in Figure 3.2. Edges that have one endpoint in the set X and the
other in the endpoint X
′
is said to cross the cut.
We say that a cut respects a set A of edges if no edge in A crosses the cut.
Let us use the deﬁnition of a cut to establish a mathematical model for the
problem.
We deﬁne a binary variable x
ij
for each edge (i, j). Let
x
ij
=
1 if (i, j) is in the subgraph
0 otherwise
For a spanning tree of n vertices we need n − 1 edges. Furthermore, for each
possible cut ( at least one of the edges crossing the cut must be in the solution
26 The Minimum Spanning Tree Problem
X
X’
Figure 3.2: The cut C deﬁned by the partitioning X ⊂ V and X
′
= V ` X is
shown for a graph G. The edges crossing the cut ¦X, X
′
¦ are shown in bold.
– otherwise the subgraph deﬁned by the solution is not connected. Therefore
we get the following model:
The Cutset formulation of the MST
min
¸
e∈E
w
e
x
e
s.t.
¸
e∈E
x
e
= n −1
¸
e∈δ(S)
x
e
≥ 1, ∅ ⊂ S ⊂ V
x
e
∈ ¦0, 1¦
Sub tour elimination formulation of the MST
min
¸
e∈E
w
e
x
e
s.t.
¸
e∈E
x
e
= n −1
¸
e∈γ(S)
x
e
≤ [S[ −1, ∅ ⊂ S ⊂ V
x
e
∈ ¦0, 1¦
where γ(S) = ¦(i, j) : i ∈ S, j ∈ S¦.
Note that with the deﬁnition of a cut if we delete any tree edge (i, j) from a
3.1 Optimality conditions 27
spanning tree it will partition the vertex set V into two subsets. Furthermore
if we insert an edge (i, j) into a spanning tree we will create exactly one cycle.
Finally, for any pair of vertices i and j, the path from i to j in the spanning
tree will be uniquely deﬁned (see Figure 3.3).
1
2
3
4
5
6
4
4
1
2
3
6
3
2
Figure 3.3: Adding the edge (1, 4) to the spanning tree forms a unique cycle
'1, 4, 5, 2, 1`
In this chapter we will look at two algorithms for solving the minimum spanning
tree problem. Although diﬀerent they both use the greedy principle in building
an optimal solution. The greedy principle advocates making the choice that is
best at the moment, and although it in general is not guaranteed to provide
optimal solutions it does in this case. In many textbooks on algorithms you will
also see the algorithms for the minimum spanning tree problem being used as
it is a classical application of the greedy principle.
The greedy strategy shared by both our algorithms can be described by the
following “generic” algorithm, which grows the minimum spanning tree one
edge at a time. The algorithm maintains a set A with the following condition:
Prior to each iteration, A is a subset of edges of some minimum
spanning tree.
At each iteration we determine an edge (i, j) that can be added to A without
breaking the condition stated above. So that if A is a subset of edges of some
minimum spanning tree before the iteration then A∪¦(i, j)¦ is a subset of edges
of some minimum spanning tree before the next iteration.
An edge with this property will be denoted a safe edge. So we get the following
selection rule for a “generic” algorithm.
28 The Minimum Spanning Tree Problem
Algorithm 1: The Selection rule
select a cut in G that does not contain any selected edges 1
determine a safe edge of the cutset and select it 2
if there are more than one safe edge select a random one of these 3
So how do we use the selection rule?
• The selection rule is used n − 1 times as there are n − 1 edges in a tree
with n vertices. Each iteration will select exactly one edge.
• Initially none of the edges in the graph are selected. During the process
one edge at a time will be selected and when the methods stops the selected
edges form a MST for the given graph G.
Now the interesting question is, of course, how do ﬁnd these safe edges? First,
a safe edge must exist. The condition deﬁnes that there exists a spanning tree
T such that A ⊆ T. As A must be a proper subset of T there must be an edge
(i, j) ∈ T such that (i, j) ∈ A. The edge (i, j) is therefore a safe edge.
In order to come up with a rule for recognizing safe edges we need to deﬁne a
light edge. A light edge crossing a cut is an edge with the minimum weight
of any edge crossing the cut. Note that there can be more than one light edge
crossing a cut. Now we can state our rule for recognizing safe edges.
Theorem 3.1 Let G = (V, E, w) be a connected, undirected graph (V, E) with
a weight function w : E → R. Let A be a subset of E that is part of some
minimum spanning tree for G. Let ¦S,
¯
S¦ be any cut of G that respects A, and
let (i, j) be a light edge crossing the cut ¦S,
¯
S¦. Then, edge (i, j) is safe for A.
Proof: Let T be a minimum spanning tree that includes A. If T contains the
light edge (i, j) we are done, so lets assume that this is not the case.
Now the general idea of the proof is to construct another minimum spanning
tree T
′
that includes A ∪ ¦(i, j)¦ by using a cutandpaste technique. This will
then prove that (i, j) is a safe edge.
If we look at the graph T ∪ ¦(i, j)¦ it contains a cycle (see Figure 3.4). Let us
call this cycle p. Since i and j are on opposite sides of the cut ¦S,
¯
S¦, there is
at least one edge in T on the path p that also crosses the cut. Let (k, l) be such
an edge. As the cut respects A (k, l) cannot be in A. Furthermore as (k, l) is
3.1 Optimality conditions 29
i
j
l
k
S
S’
p
Figure 3.4: The edges in the minimum spanning tree T is shown, but not the
edges in G. The edges already in A are drawn extra fat, and (i, j) is a light edge
crossing the cut ¦S, S
′
=
¯
S¦. The edge (k, l) is an edge on the unique path p
from i to j in T. A minimum spanning tree T
′
that contains (i, j) is formed by
removing the edge (k, l) from T and adding (i, j).
30 The Minimum Spanning Tree Problem
on the unique path in T from i to j T would decompose into two components
if we remove (k, l).
Adding (i, j) reconnects the two components thus forming a new spanning tree
T
′
= T ` ¦(k, l)¦ ∪ ¦(i, j)¦. The question is whether T
′
is a spanning tree?
Since (i, j) is a light edge (that was one of the initial reasons for picking it)
crossing the cut ¦S,
¯
S¦ and (k, l) also crosses the cut
w
ij
≤ w
kl
And therefore
w
T
′ = w
T
−w
kl
+w
ij
≤ w
T
and as T is a minimum spanning tree we have w
T
≤ w
T
′ . So T
′
must also be a
minimum spanning tree.
What we miss is just to prove that (i, j) is actually a safe edge for A. As A ⊆ T
and (k, l) ∈ A then A ⊆ T
′
, so A ∪ ¦(i, j)¦ ⊆ T
′
. Consequently, since T
′
is a
minimum spanning tree, (i, j) is safe for A.△
Now we can look at how an execution of the generic algorithm would look like.
The selection rule tells us to pick a cut respecting A and then choose an edge
of minimum weight in this cut.
Figure 3.5 illustrates the execution of the generic algorithm. Initially no edges
are selected. We can consider any cut we would like. Let us use the cut X = ¦6¦
and X
′
= V ` X (a). The cheapest edge crossing the cut is (4, 6). It is is added
to the solution. We can now pick any cut not containing the edge (4, 8). We
consider the cut X = ¦1, 4, 6¦ (b). Cheapest edge crossing the cut is (4, 5)
which is added to the solution. Next we consider the cut X = ¦2¦ (c). Here the
cheapest edge crossing the cut is (2, 3) which is added to the solution. Now let
us consider the cut X = ¦1, 2, 3¦ (d). The edge (3, 5) is now identiﬁed as the
cheapest and is added to the solution. The next cut we consider is X = ¦1¦ (e).
This will add (1, 4) to the solution, which has become a spanning tree. We can
not ﬁnd any cuts that does not contain edges in the tree crossing the cut (f).
More conceptually A deﬁnes a graph G
A
= (V, A). G
A
is a forest, and each
of the connected components of G
A
is a tree. When the algorithm begins A is
empty and the forests contains n trees, one for each vertex. Moreover, any safe
edge (i, j) for A connects distinct components of G
A
, since A∪¦(i, j)¦ must be
acyclic.
The two algorithms in section 3.2 and 3.3 use the following corollary to our
theorem.
3.1 Optimality conditions 31
(a) Starting graph
1 2
3
4 5
6
4
4
1
2
3
6
3
2
(b) Step 1
(c) Step 2 (d) Step 3
(e) Step 4
1 2
3
4 5
6
4
4
1
2
3
6
3
2
(f) Step 5
Figure 3.5: The execution of the generic algorithm on our sample graph.
Corollary 3.2 Let G = (V, E, w) be a connected, undirected graph with weight
function w : E → R. Let A be a subset of E that is included in some minimum
spanning tree for G, and let C = (V
C
, E
C
) be a component in the forest G
A
=
(V, A). If (i, j) is a light edge connecting C to some other component in G
A
,
then (i, j) is safe for A.
32 The Minimum Spanning Tree Problem
Proof: The cut ¦V
C
,
¯
V
C
¦ respects A, and (i, j) is a light edge for this cut.
Therefore (i, j) is safe for A. △
3.2 Kruskal’s Algorithm 33
3.2 Kruskal’s Algorithm
The condition of theorem 3.1 is a natural basis for the ﬁrst of the algorithms
for the MST that will be presented.
We build the minimum spanning tree from scratch by adding one edge at a
time. First we sort all edges in nondecreasing order of their cost. They are
then stored in a list called L. Furthermore, we deﬁne a set F that will contain
the edges that have been chose to be part of the minimum spanning tree being
constructed. Initially F will therefore be empty. Now we examine the edges one
at a time in the order they appear in L and check whether adding each edge
to the current set F will create a cycle. If it creates a cycle then one of the
edges on the cycle cannot be part or the minimum spanning tree and as we are
checking the edges in nondecreasing order, the currently considered is the one
with the largest cost. We therefore discard it. Otherwise the edge is added to
F. We terminate when [F[ = n −1. At termination, the edges in F constitute
a minimum spanning tree T. This algorithm is known as Kruskal’s algorithm.
Pseudocode for the algorithm is presented in Algorithm 2.
The argument for correctness follows from that if an edge is added, it is the ﬁrst
edge added across some cut, and due to the nondecreasing weight property, it
is a minimum weight edge across that cut. That is, it is a light edge but then it
is also a safe edge.
Algorithm 2: Kruskal’s algorithm
Data: Input parameters
Result: The set F containing the edges in the MST
F ← ∅ 1
L ← edges sorted by nondecreasing weight 2
for each (u, v) ∈ L do 3
if a path from u to v in F does not exist then 4
F ← F ∪ (u, v) 5
Running Kruskal’s algorithm on the example of Figure 3.1 results in the steps
depicted in Figure 3.6. First the edges are sorted in nondecreasing order: (2, 3),
(4, 6), (3, 5), (2, 5), (4, 5), (1, 4), (1, 2), (5, 6). First the edge (2, 3) is selected (b).
Then edge (4, 6) is selected (c), and afterwards the edge (3, 5) is selected (d).
The edge (2, 5) is discarded as it would create a cycle (2 → 3 → 5 → 2) (e).
Instead (4, 5) is selected (1, 4) is selected (f). Had the order between (1, 4) and
(1, 2) been diﬀerent we would have chosen diﬀerently. As 5 edges are selected
34 The Minimum Spanning Tree Problem
for a graph with 6 vertices we are done.
1 2
3
4 5
6
4
4
1
2
3
6
3
2
(a) Starting graph
1 2
3
4 5
6
4
4
1
2
3
6
3
2
(b) Step 1
(c) Step 2 (d) Step 3
(e) Step 4
1 2
3
4 5
6
4
4
1
2
3
6
3
2
(f) Step 5
Figure 3.6: The execution of Kruskal’s algorithm on our sample graph.
3.3 Prim’s algorithm 35
The time complexity of Kruskal’s algorithm consists of the time complexity of
ﬁrst sorting the edges in nondecreasing order and then checking whether we
produce a cycle or not. Sorting can most eﬃciently be done in O(mlog m) =
O(mlog n
2
) = O(mlog n) which is achieved by eg. Merge Sort (see e.g. [1]).
The time to detect a cycle depends on the method used for this step. A naive
implementation would look like this. The set F is a any stage of the execution
of the algorithm a forest. In step 3 in the execution of our example F contains
three trees, namely '1`, '2, 3, 5` and '4, 6`. The subsets deﬁning the trees in the
forest will be denoted F
1
, F
2
, . . . , F
f
. These can be stored as singly linked lists.
When we want to examine whether inserting (k, l) will create a cycle or not,
we scan through these linked lists and check whether both the vertices k and l
belongs to the same linked list. If that is the case adding (k, l) would produce a
cycle. Otherwise we add edge (k, l) and thereby merge the two lists containing
k and l into one list. This data structure requires O(n) time for each edge we
examine. So the overall time complexity will be O(mn)
A better performance can be achieved using more advanced data structures. In
this way we can get a time complexity of O(m+nlog n) plus the time it takes
to sort the edges, all in all O(mlog n).
3.3 Prim’s algorithm
The idea of the Prim’s algorithm is to grow the minimum spanning tree from
a single vertex. We arbitrarily select a vertex as the starting point. Then we
iteratively add one edge at a time. We maintain a tree spanning a subset of
vertices S and add a nearest neighbour to S. This is done by ﬁnding an edge
(i, j) of minimum cost with i ∈ S and j ∈
¯
S, that is, ﬁnding an edge of minimum
cost crossing the cut ¦S,
¯
S¦. The algorithm terminates when S = V .
An initial look at the time complexity before we state the algorithm in pseudo
code suggests that we select a minimum cost edge n −1 times, and in order to
ﬁnd the minimum cost edge in each iteration we must search through all the
m edges, giving in total the cost O(mn). Hence, the bottleneck becomes the
identiﬁcation of the minimum cost edge.
If we for each vertex adds two pieces of information we can improve the time
complexity:
1. l
i
represents the minimum cost of an edge connecting i and some vertices
in S, and
36 The Minimum Spanning Tree Problem
2. p
i
that identiﬁes the other endpoint of a minimum cost edge incident with
vertex i.
If we maintain these values, we can easily ﬁnd the minimum cost of an edge in
the cut. We simply compute arg min¦l
i
: i ∈
¯
S¦. The label p
i
identiﬁes the edge
(p
i
, i) as the minimum cost edge over the cut ¦S,
¯
S¦, and therefore it should be
added in the current iteration.
In order to maintain updated information in the labels we must update the
values of a label as it is moved from
¯
S to S. This reduces the time complexity
of the identiﬁcation from O(m) to O(n). Thus the total time complexity for
Prim’s algorithm will be O(n
2
) (see algorithm 3).
Algorithm 3: Prim’s algorithm
Data: A graph G = (V, E) with n vertices and m edges. A source vertex
is denoted r. For each edge (i, j) ∈ E an edge weight w
ij
is given.
Result: An nvector p the predecessor vertex for i in the minimum
spanning tree.
Q ← V 1
p
r
← 0, l
r
← 0 2
p
i
← nil, l
i
← ∞ for i ∈ V ` ¦r¦ 3
while Q = ∅ do 4
i ← arg min
j∈Q
¦l
j
¦ 5
Q ← Q` ¦i¦ 6
for (i, j) ∈ E where j ∈ Q do 7
if w
ij
< l
j
then 8
p
j
← i 9
l
j
← w
ij
10
Time complexity depends on the exact data structures, however, the basic im
plementations with list representation results in a running time of O(n
2
)
A more advanced implementation make use of the heap data structure. The
“classical” binary heap would give us a time complexity of O(mlog n). Further
information can be found in section 3.4.
Figure 3.7 show the execution of Prim’s algorithm on our example graph. Ini
tially we choose 1 to be the vertex we expand from. Each vertex will get a label
of +∞ except vertex 1 that will get the value 0. So initially vertex 1 is included
in S and the labels of vertex 4 are updated to 4 (due to the edge (1, 4)) and the
label of vertex 2 is also updated to 4 (due to the edge (1, 2)) (a). Now vertex 4
3.3 Prim’s algorithm 37
1 2
3
4 5
6
4
4
1
2
3
6
3
2
0
4
4
(a) Starting graph
1 2
3
4 5
6
4
4
1
2
3
6
3
2
0
4
4
3
2
(b) Step 1
(c) Step 2 (d) Step 3
(e) Step 4 (f) Step 5
Figure 3.7: The execution of Prims’s algorithm on our sample graph. We have
choosen vertex 1 to be the initial vertex in the tree. A vertex without a ”dis
tance” symbolizes a distance of +∞.
38 The Minimum Spanning Tree Problem
is selected for inclusion into S. The choice between 2 and 4 is actually arbitraty
as both vertices have a label of value 4 (the lowest value of all labels) (b). The
selection of vertex 4 updates the values of vertex 5 and 6 to 3 resp. 2. In the
next step we select vertex 6 and enter it into set S thereby adding the edge
(4, 6) to our solution (c). Next we select vertex 5 which leads to an update of
the label of vertex 3 but also vertex 2 (the value is lowered from 4 to 3) (d). We
then select vertex 3 and update the value of label 2 (e) before ﬁnally selecting
vertex 2. Now S = V and the algorithm terminates.
3.4 Supplementary Notes
For more information of sorting and also data structures for an eﬃcient imple
mentation of the algorithms described in this chapter see [1].
For a description of how the better time complexity can be established for
Kruskal’s algorithm we refer to [2].
There exist diﬀerent versions of heap data structures with diﬀerent theoretical
and practical properties (Table 3.1 shows some worst case running times for
Prim’s algorithm depending on the selected data structure). An initial discus
sion on the subject related to MST can be found in [2].
heap type running time
Binary heap O(mlog n)
dheap O(mlog
d
n)) with d = max¦2,
m
n
¦
Fibonacci heap O(m+nlog n)
Johnson’s data structure O(mlog log C)
Table 3.1: Running time of Prim’s algorithm depending on the implemented
data structure
Further information on the heap data structure can be found in [1].
3.5 Exercises
1. Construct the minimum spanning tree for the graph in Figure 3.8. Try to
use both Kruskals algorithm and Prims algorithm.
2. An alternative algorithm: Consider the following algorithm for ﬁnding
a MST H in a connected graph G = (V, E, w): At each step, consider for
3.5 Exercises 39
1 2
3
4
5
6
4
4
1
2
3
6
3
2
7
4
7
7
4
3
8
Figure 3.8: Find the minimum spanning tree for the graph using both Kruskals
and Prims algorithm
each edge e whether the graph remains connected if it is removed or not.
Out of the edges that can be removed without disconnecting the graph
choose the one with the maximum edge cost and delete e from H. Stop if
no edge can be removed without disconnecting the graph.
Show that the algorithm ﬁnds an MST of G.
Apply the algorithm to the example of exercise 1.
3. The MiniMax Spanning Tree problem  MMST: Suppose that
rather than identifying a MST wrt. the sum of the edge costs, we want
to minimize the maximum cost of any edge in the tree. The problem is
called the MiniMax Spanning Tree problem.
Prove that every MST is also an MMST. Does the converse hold?
4. The Secondbest Minimum Spanning Tree  SBMST: Let G =
(V, E, w) be an undirected, connected graph with cost function w. Suppose
that all edge costs are distinct, and assume that [E[ ≥ [V [.
A Secondbest Minimum Spanning Tree is deﬁned as follows: Let T
be the set of all spanning trees of graph G, and let T
′
be a MST of G.
Then a secondbest minimum spanning tree is a spanning tree T
′′
∈ T
such that w(T
′′
) = min
T∈T −{T
′
}
¦w(T)¦.
(a) Let T be a MST of G. Prove that there exists edges (i, j) ∈ T and
(k, l) ∈ T such that T
′
= T − ¦(i, j)¦ ∪ ¦(k, l)¦ is a secondbest
minimum spanning tree of G.
40 The Minimum Spanning Tree Problem
Hints: (1) Assume that T
′
diﬀers from T by more than two edges,
and derive a contradiction. (2) If we take an edge (k, l) ∈ T and
insert it in T we make a cycle. What can we say about the cost of
the (k, l) edge in relation to the other edges?
(b) Give an algorithm to compute the secondbest minimum spanning
tree of G. What is the time complexity of the algorithm?
5. Cycle detection for Kruskals algorithm. Given an undirected graph.
Describe an O(m+n) time algorithm that detects whether there exists a
cycle in the graph.
6. The endnode problem. ([3]). Suppose that you are given a graph
G = (V, E) and weights w on the edges. In addition you are also give a
particular vertex v ∈ V . You are asked to ﬁnd the minimum spanning tree
such that v is not an endnode. Explain brieﬂy how to modifya minimum
spaning tree algorithm to solve the same problem eﬃciently.
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clif
ford Stein. Introduction to Algorithms, Second Edition. The MIT Press
and McGrawHill, 2001.
[2] Ravinda K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network
Flows. Prentice Hall, 1993.
[3] George B. Dantzig, Mukund N. Thapa. Linear Programming. Springer,
1997.
42 BIBLIOGRAPHY
Chapter 4
The Shortest Path Problem
One of the classical problems in Computer Science and Operations Research is
the problem of ﬁnding the shortest path between two points. The problem can
in a natural way be extended to the problem of ﬁnding the shortest path from
one point to all other points.
There are many applications of the shortest path problem. It is used in route
planning services at diﬀerent webpages (in Denmark examples are www.krak.dk
and www.dgs.dk) and in GPS units. It is also used in www.rejseplanen.dk,
which generates itineraries based on public transportation in Denmark (here
shortest is with respect to time).
Given a lengthed directed graph G = (V, E). Each of the edges are assigned
a weight (or length or cost) w, which can be positive as well as negative. The
overall network is then denoted as G = (V, E, w). Finally a starting point,
denoted the root or source r is given. We now want to ﬁnd the shortest paths
from r to all other vertices in V ` ¦r¦. We will later in section 8.1 look at the
situation where we want to calculate the shortest path between any given pair
of vertices. Let us for convenience assume that all lengths are integer.
Note that if we have a negative length cycle in our graph then it does not make
sense to ask for the shortest path in the graph (at least not for all vertices).
Consider Figure 4.1. This graph contains a negative length cycle ('2, 5, 3, 2`).
44 The Shortest Path Problem
Although the shortest paths from r = 1 to 7, 8 and 9 are well deﬁned it means
that the shortest paths for the vertices 2, 3, 4, 5, 6 can be made arbitrary short
by using the negative length cycle several times. Therefore the shortest paths
from 1 to each of the vertices are undeﬁned.
1
7
4
14
2
3
4
1
2
3
7
5
6
1
8
9
10
6
4
Figure 4.1: An example of a graph G = (V, E, w) with root node 1 with a
negative length cycle '2, 5, 3, 2`
Like with the minimum spanning tree problem eﬃcient algorithms exist for
the problem, but before we indulge in these methods let us try to establish a
mathematical model for the problem.
Let vertex r be the root. For vertex r n−1 paths have to leave r. For any other
vertex v, the number of paths entering the vertex must be exactly 1 larger than
the number of paths leaving the vertex, as the path from 1 to v ends here. Let
x
e
denote the number of paths using each edge e ∈ E. This gives the following
mathematical model:
min
¸
e∈E
w
e
x
e
(4.1)
s.t.
¸
e∈V
−
(r)
x
e
−
¸
e∈V
+
(r)
x
e
= −(n −1) (4.2)
¸
e∈V
−
(i)
x
e
−
¸
e∈V
+
(i)
x
e
= 1 i ∈ V ` r (4.3)
x
e
∈ Z
+
e ∈ E (4.4)
4.1 The BellmanFord Algorithm 45
Let P = 'v
1
, v
2
, . . . , v
k
` be a path from v
1
to v
k
. A subpath P
ij
of P is then
deﬁned as P
ij
= 'p
i
, p
i+1
, . . . , p
j
`. Now if P is a shortest path from v
1
to v
k
then any subpath P
ij
of P will be a shortest path from v
i
to v
j
. If not it would
contradict the assumption that P is a shortest path.
Furthermore the subgraph deﬁned by the shortest paths from r to all other
vertices is a tree rooted at r. Therefore the problem of ﬁnding the shortest
paths from r to all other nodes is sometimes refereed to as the Shortest Path
Tree Problem. This means that the predecessor the each vertex is uniquely
deﬁned. Furthermore, if G is assumed to be strongly connected the tree is a
spanning tree.
In order to come up with an algorithm for the shortest path problem we deﬁne
potentials and feasible potentials.
Consider a vector of size n, that is, one entry for each vertex in the network
y = y
1
, . . . , y
n
. This is called a potential. Furthermore, if y satisﬁes that
1. y
r
= 0 and
2. y
i
+w
ij
≥ y
j
for all (i, j) ∈ E
then y is called a feasible potential.
Not surprisingly, y
j
can be interpreted as the length of a path from r to j.
y
i
+ w
ij
is also the length of a path from r to j. It is a path from r to i and
then ﬁnally using the arc (i, j). So if y
j
is equal to the length of a shortest path
from r to j then y
j
≤ y
i
+ w
ij
. If the potential is feasible for a shortest path
problem then we have an optimal solution.
4.1 The BellmanFord Algorithm
One way to use the deﬁnition of potentials in an algorithm is to come up with
an initial setting of the potentials and then to check if there exists an edge (i, j)
where y
i
+ w
ij
< y
j
. If that is not the case the potential is feasible and the
solution is therefore optimal. On the other hand if there is an edge (i, j) where
y
i
+ w
ij
< y
j
then potential is not feasible and the solution is therefore not
optimal – there exists a path from r to j via i that is shorter than the one
we currently have. This can be repaired by setting y
j
equal to y
i
+ w
ij
. Now
this error is ﬁxed and we can again check if the potential is now feasible. We
46 The Shortest Path Problem
r
i
j
Figure 4.2: As the potential in node i reﬂects the possible length of a path from
r to i and so for node j, the sum y
i
+ w
ij
also represents the length of a path
from node r to j and therefore it makes sense to compare the two values. The
dashed lines represents a path composed of at least one arc.
keep doing this until the potential is feasible. This is Fords algorithm for the
shortest path problem as shown more formally in algorithm 4. The operation
of checking and updating y
j
is in many text books known as “correcting” or
“relaxing” (i, j).
As the potentials always have to be an upper bound on the length of the shortest
path we can use y
r
= 0 and y
i
= +∞ for i ∈ V ` ¦r¦ as initial values.
Algorithm 4: Ford’s algorithm
Data: A distance matrix W for a digraph G = (V, E). If the edge
(i, j) ∈ E the w
ij
equals the distance from i to j, otherwise
w
ij
= +∞
Result: Two nvectors, y and p, containing the length of the shortest
path from 1 to i resp. the predecessor vertex for i on the path
for each vertex in ¦1, ..., n¦
y
r
← 0, y
i
← ∞ for all other i 1
p
r
← 0, p
i
← Nil for all other i 2
while an edge exists (i, j) ∈ E such that y
j
> y
i
+w
ij
do 3
y
j
← y
i
+w
ij
4
p
j
← i 5
Note that no particular sequence is required. This is a problem if the instance
contains a negative length circuit. It will not be possible to detect the negative
length cycle and therefore the algorithm will never terminate, so for Fords algo
rithm we need to be aware of negative length circuits, as these may lead to an
inﬁnite computation. A simple solution that quickly resolves the deﬁciency of
4.1 The BellmanFord Algorithm 47
Fords algorithm is to use the same sequence for the edges in each iteration. We
can be even less restrictive. The extension from Ford’s algorithm to Bellman
Ford’s algorithm is to go through all edges and relax the necessary ones before
we go through the edges again.
Algorithm 5: BellmanFord Algorithm
Data: A distance matrix W for a digraph G = (V, E) with n vertices. If
the edge (i, j) ∈ E the w
ij
equals the distance from i to j,
otherwise w
ij
= +∞
Result: Two nvectors, y and p, containing the length of the shortest
path from r to i resp. the predecessor vertex for i
y
r
← 0; y
i
← ∞ for i ∈ V ` ¦r¦ 1
p
r
← 0; p
i
← Nil for i ∈ V ` ¦r¦ 2
k ← 0 3
while k < n and (y feasible) do 4
k ← k + 1 5
foreach (i, j) ∈ E do /* correct (i, j) */ 6
if y
j
> y
i
+w
ij
then 7
y
j
← y
i
+w
ij
8
p
j
← i 9
Note that in step 6 in the algorithm we run through all edges in the iteration.
The time complexity can be calculated fairly easy by looking at the worst case
performance of the individual steps of the algorithm. The initialization (12) is
O(n). Next the outer loop in step 3 is executed n−1 times each time forcing the
inner loop in step 4 to consider each edge once. Each individual inspection and
prospective update can be executed in constance time. So step 4 takes O(m)
and this is done n −1 times so all in all we get O(nm).
Proposition 4.1 (Correctness of BellmanFord’s Algorithm) Given a connected,
directed graph G = (V, E, w) with a cost function w : E → R and a root r The
BellmanFord’s algorithm produces a shortest path tree T.
Proof: The proof is based on induction. The induction hypothesis is: In itera
tion k, if the shortest path P from r to i has ≤ k edges, then y
i
is the length of
P.
For the base case k = 0 the induction hypothesis is trivially fulﬁlled as y
r
= 0 =
shortest path from r to r.
48 The Shortest Path Problem
For the inductive step, we now assume that for any shortest path from r to i
with less than k edges y
i
is the length of the path, ie. the length of the shortest
path. In the k’th iteration we will perform a correct on all edges.
Given that the shortest path fromr to l consist of k edges P
l
= 'r, v
1
, v
2
, . . . , v
k−1
, l`
due to our induction hypothesis we know the shortest path fromr to v
1
, v
2
, . . . , v
k−1
.
In the k’th iteration we perform a correct operation on all edges we speciﬁcally
also correct the edge (v
k−1
, l). This will establish the shortest path from r to l
and therefore y
v
i
= shortest path from r to l.
If all distances are nonnegative, a shortest path containing at most (n − 1)
edges exists for each i ∈ V . If negative edge lengths are present, the algorithm
still works. If a negative length circuit exists, this can be discovered by an extra
iteration in the main loop. If any y values are changed in the n’th iteration it
indicates that at least one shorter path has been established. But with as the
graph contains n vertices a simple path can at most contain n − 1 edges, and
therefore no changes should be made in the n’th iteration. △
4.2 Shortest Path in Acyclic graphs
Before we continue with the general case let us make a small detour and look at
the special case of an acyclic graph. In case the graph is acyclic we can solve the
Shortest Path problem faster. Recall that an acyclic graph is a directed graph
without (directed) cycles.
In order to be able to reduce the running time we need to sort the vertices that
we will check in a speciﬁc order. A topological sorting of the vertices is a
numbering N : V → ¦1, ..., n¦ such that for any (v, w) ∈ V : N(v) < N(w).
10
1 2
4 5
3
6
7
4
14
2
3
4
4
4
1
Figure 4.3: An acyclic graph
The topological sorting of the vertices of the graph in Figure 4.3 is 1, 2, 5, 4, 6, 3.
4.2 Shortest Path in Acyclic graphs 49
As vertex 1 is the only vertex with only outgoing edges it will consequently
be the ﬁrst edge in the sorted sequence of vertices. If we look at vertex 5, it is
placed after vertices 1 and 2 as it has two incoming edge from those two vertices.
Algorithm 6: Shortest path for acyclic graph
Data: A topological sorting of v
1
, ..., v
n
y
1
← 0, y
v
← ∞ 1
p
1
← 0, p
v
← Nil for all other v 2
for i ← 1 to n −1 do 3
foreach j ∈ V
+
(v
i
) with y
j
> y
v
i
+w
v
i
j
do 4
y
j
← y
v
i
+w
v
i
j
5
p
j
← v
i
6
Let us assume we have done the topological sorting. In algorithm 6 each edge
is considered only once in the main loop due to the topological sorting. Hence,
the time complexity is O(m).
50 The Shortest Path Problem
10
1 2
4 5
3
6
7
4
14
2
3
4
4
4
1
0
(a) Init
10
1 2
4 5
3
6
7
4
14
2
3
4
4
4
1
0 7
14
4
(b) Step 1
10
1 2
4 5
3
6
7
4
14
2
3
4
4
4
1
0 7
14
4
17
11
(c) Step 2
10
1 2
4 5
3
6
7
4
14
2
3
4
4
4
1
0 7
7
4
5
8
(d) Step 3
10
1 2
4 5
3
6
7
4
14
2
3
4
4
4
1
0 7
7
4
5
8
(e) Step 4
10
1 2
4 5
3
6
7
4
14
2
3
4
4
4
1
0 7
7
4
4
8
(f) Step 5
10
1 2
4 5
3
6
7
4
14
2
3
4
4
4
1
0 7
7
4
4
8
(g) Step 6
Figure 4.4: The execution of the acyclic algorithm on our sample graph.
4.3 Dijkstra’s Algorithm 51
A topological sorting of the vertices in an acyclic graph can be achieved by the
algorithm below. Note that if the source vertex is not equal to the ﬁrst vertex
in the topological sorting it means that some vertex cannot be reached from the
source vertex and will remained labeled with distance +∞.
Algorithm 7: Topological sorting
Data: An acyclic digraph G = (V, E)
Result: A numbering N of the vertices in V so for each edge (i, j) ∈ E it
holds that N
v
< N
w
start with all edges being unmarked 1
N
v
← 0 for all v ∈ V 2
i ← 1 3
while i ≤ n do 4
ﬁnd v with all incoming edges marked 5
if no such v exists then 6
STOP 7
N
v
← i 8
i ← i + 1 9
mark all (v, w) ∈ E 10
Now with the time complexity of O(m) the overall time complexity of ﬁnding
the shortest path in an acyclic graph becomes O(m).
4.3 Dijkstra’s Algorithm
Now let us return to a general (lengthed) directed graph. We can improve
the time complexity of the BellmanFord algorithm if we assume that all edge
lengths are nonnegative. This assumption makes sense in many situations;
for example, if the lengths are road distances, travel times, or travelling costs.
Even though we could still use BellmanFord’s algorithm Dijkstra’s algorithm –
named after its inventor, the Dutch computer scientist Edgar Dijkstra – utilizes
the fact that the edges have nonnegative lengths to get a better running time.
52 The Shortest Path Problem
Algorithm 8: Dijkstra’s algorithm
Data: A digraph G = (V, E) with n vertices and m edges. A source
vertex is denoted r. For each edge (i, j) ∈ E an edge weight w
ij
is
given. Note: w
ij
≥ 0.
Result: Two nvectors, y and p, containing the length of the shortest
path from r to i resp. the predecessor vertex for i on the path
for each vertex in ¦1, ..., n¦.
S ← ¦r¦, U ← ∅ 1
p
r
← 0, y
r
← 0 2
p
i
← Nil, y
i
← ∞ for i ∈ V ` ¦r¦ 3
while S = ∅ do 4
i ← arg min
j∈S
¦y
j
¦ 5
for j ∈ V ` U ∧ (i, j) ∈ E do 6
if y
j
> y
i
+w
ij
then 7
y
j
← y
i
+w
ij
8
p
j
← i 9
S ← S ` ¦i¦; U ← U ∪ ¦i¦; 10
An example of the execution of Dijkstras algorithm is shown in Figure 4.5.
Initially all potentials are set to +∞ except the potential for the source vertex
that is set to 0. First the source vertex will be selected and we relax all outgoing
edges of vertex 1, that is, the edges (1, 2) and (1, 3). This results in updating y
2
and y
3
to 1 resp. 2. In the second step vertex 2 is selected, thereby determining
edge (1, 2) as part of the shortest path solution. All outgoing edges are inspected
and we relax (2, 7) but not (2, 3) as it is more expensive to get to vertex 3 via
vertex 2. In step 3 vertex 3 is selected making edge (1, 3) path of the solution.
In step 4 vertex 4 is chosen. This adds (3, 4) to the shortest path tree and ﬁnally
in step 5 we add vertex 5 and thereby (4, 5) to the solution.
4.3 Dijkstra’s Algorithm 53
1
2
3 4
5
1
2
2
1
7
2
3
0
(a) Starting graph
1
2
3 4
5
1
2
2
1
7
2
3
0
1
2
(b) Step 1
1
2
3 4
5
1
2
2
1
7
2
3
0
1
2
7
(c) Step 2
1
2
3 4
5
1
2
2
1
7
2
3
0
1
2
8
5
(d) Step 3
1
2
3 4
5
1
2
2
1
7
2
3
0
1
2
7
5
(e) Step 4
1
2
3 4
5
1
2
2
1
7
2
3
0
1
2
7
5
(f) Step 5
Figure 4.5: The execution of Dijkstra’s algorithm on a small example directed
graph. Vertex 1 is source vertex. A vertex without a ”distance” symbolizes a
distance of ∞
54 The Shortest Path Problem
Note that this algorithm is very similar to Prim’s algorithm for the MST (see
section 3.3). The only diﬀerence to Prim’s algorithm is in fact the update step
in the inner loop, and this step takes – like in the MST algorithm – O(1) time.
Hence the complexity of the algorithm is O(n
2
) if a list representation of the y
vector is used, and a complexity of O(mlogn) can be obtained if the heap data
structure is used instead.
Theorem 4.2 (Correctness of Dijkstras Algorithm) Given a connected, directed
graph G = (V, E, w) with length function w : E → R
+
and a root node r. The
Dijkstra algorithm will produce a shortest path tree T rooted at r.
Proof: The proof is made by induction. The induction hypothesis is that before
an iteration it holds that for each vertex i in U, the shortest path from the source
vertex r to i has been found and is of length y
i
, and for each vertex i not in U,
y
i
is the shortest path from r to i with all vertices except i belonging to U.
As U = ∅ this is obviously true initially. Also after the ﬁrst iteration where
U = ¦r¦ the assumption is true. (check for yourself)
Let v be the element with the smallest y value selected in the inner loop of
iteration k, that is, y
v
≤ y
u
for all u ∈ S. Now y
v
is the length of a path Q from
r to v passing only through vertices in U (see Figure 4.6).
Suppose that this is not the shortest path from r to v – then another path R
from r to v is shorter.
R
Q
v
w u
r
Figure 4.6: Proof of the correctness of the Dijkstra algorithm
Consider path R. R starts in r, which is in U. Since v is not in P, R has at
least one edge from a vertex in P to a vertex not in P. Let (u, w) be the ﬁrst
4.4 Relation to Duality Theory 55
edge of this type. The vertex w is a candidate for vertex choice in the inner loop
in the current iteration, but was discarded as v was picked. Hence y
w
≥ y
v
. All
edge lengths are assumed to be nonnegative, therefore the length of the part of
R from w to v is nonnegative, and hence the total length of R is at least the
length of Q. This is a contradiction – hence Q is a shortest path from r to v.
Furthermore, the update step in the inner loop ensures that after the current
iteration it again holds for u not in P (which is now the “old” P augmented
with v) that y
u
is the shortest path from from r to u with all vertices except u
belonging to P. △
4.4 Relation to Duality Theory
From our approach solving the Shortest Path Problem using potentials it is not
evident that there is a connection to our mathematical model (4.1)  (4.4). It
would seem like the model and our solution approaches are independent ways
of looking at the problem. In fact, they have a great deal in common, and in
this section we will elaborate more on that.
A classical result (dating back to the mid 50’s) is that the vertexedge incidence
matrix of a directed graph is totally unimodular. This means that the deter
minant of each square submatrix is equal to 0, 1 or −1. Since righthand sides
are integers it implies that all positive variables in any basic feasible solution to
such problems automatically take integer values.
This means that the integer requirement in our model (4.4) can be relaxed to
x
e
≥ 0 e ∈ E. (4.5)
Now our shortest path problem is actually an LP problem.
Let us now construct the dual. Let y
i
be the dual variable for constraint i. The
dual to the shortest path problem then becomes:
max −(n −1)y
r
+
¸
i=r
y
i
(4.6)
s.t. y
j
−y
i
≤ w
ij
(i, j) ∈ E (4.7)
y
i
free (4.8)
56 The Shortest Path Problem
Let us look at the primal and dual problem of the instance in Figure 4.7 with
r = 1. The dual variables y
1
, y
2
, . . . , y
5
are associated with the ﬁve vertices in
the graph. The primal and dual problems are conveniently set up in Table 4.1.
1
4
7
1
2
3
5
4
1
6
5
4
Figure 4.7: An example
Edges (1, 2) (1, 4) (2, 5) (4, 2) (4, 3) (5, 3) (5, 4) rel RHS
Variables x
12
x
14
x
25
x
42
x
43
x
53
x
54
y
1
−1 −1 = −4
y
2
1 −1 1 = 1
y
3
1 1 = 1
y
4
1 −1 −1 1 = 1
y
5
1 −1 −1 = 1
relation ≤ ≤ ≤ ≤ ≤ ≤ ≤
w
ij
4 7 6 1 4 5 1
Table 4.1: Derivation of primal and dual LP problem
For this instance the dual objective can be read from the RHS column and the
columns of the primal (one for each edge) constitute the constraints, so we get:
max −4y
1
+y
2
+y
3
+y
4
+y
5
(4.9)
s.t. y
2
−y
1
≤ 4 (4.10)
y
4
−y
1
≤ 7 (4.11)
.
.
.
.
.
. (4.12)
y
4
−y
5
≤ 1 (4.13)
y
1
, y
2
, . . . , y
5
free (4.14)
4.5 Applications of the Shortest Path Problem 57
It can easily be shown that the dual problem is determined up to an arbitrary
constant. We can therefore ﬁx one of the dual variables, and here we naturally
choose to ﬁx y
r
= 0.
Central in LP theory is that all simplextype algorithms for LP can be viewed
as interplay between primal feasibility, dual feasibility and the set of comple
mentary slackness conditions (CSC). In our case CSC will be:
(y
i
+w
ij
−y
j
)x
ij
= 0 (i, j) ∈ E (4.15)
CSC must be fullﬁlled by a pair of optimal solutions for the primal resp. dual
problem. The predecessor variables p can be seen as representing the primal
variables x in our model. If p
j
= i this implies that x
ij
> 0 in the model.
Initially our algorithm start by setting p
i
equal to Nil, basically stating that
all x
ij
’s are zero. So the CSC holds initially. Now in an iteration of any one of
our methods for solving the shortest path problem we ﬁnd an edge to correct,
that is, we ﬁnd a invalid constraint in our dual model. Then we set y
j
equal
to y
i
+ w
ij
and change the predecessor, which corresponds to setting x
ij
to a
nonnegative value. CSC was fullﬁlled before and after the operations it still
holds.
So in essence, what our algorithm do is to maintain primal feasibility and CSC,
and trying to obtain dual feasibility. When dual feasibility is obtained we have
primal and dual feasibility, and CSC. Now fundamental LP theory tells us that
theses solutions are optimal as well.
4.5 Applications of the Shortest Path Problem
Now we will look at some interesting applications of the shortest path problem.
However note that we may not be able to use Dijkstra’s algorithm in solutions
to some of these applications as the graphs involved may have negative edge
lengths.
58 The Shortest Path Problem
Swedish
Kroner
Danish
Kroner
Islandic
Kronur
 log 1.238
 log 0.808
 log 9.562
 log 0.084
 log 11.832
 log 0.105
Figure 4.8: A graph to solve the currency conversion problem. Currencies are
taken from www.xe.com at 11:17 on February 16th 2007.
The currency conversion problem
Consider a directed graph G = (V, E) where each vertex denotes a currency.
We want to model the possibilities of exchanging currency in order to establish
the best way to go from one currency to another.
If 1 unit of a currency u can buy r
uv
units of another currency v we model it
by an edge of weight −log r
uv
from u to v (see Figure 4.8).
If there is a path P = 'c
1
, c
2
, . . . , c
l
` from vertex c
1
to c
l
, the weight of the path
is given by
w(P) =
¸
l−1
i=1
w
c
i
c
i+1
=
¸
l−1
i=1
−log r
c
i
c
i+1
= −log
¸
l−1
i=1
r
c
i
c
i+1
So if we convert currencies along the path P, we can convert n units of currency
x to ne
−w
P
units of currency y. Thus, the best way to convert x to y is to
do it along the shortest path in this graph. Also, if there is a negative cycle
in the graph, we have found a way to increase our fortune by simply making a
sequence of currency exchanges.
4.5 Applications of the Shortest Path Problem 59
Diﬀerence constraints
While we in general rely on the simplex method or an interior point method to
solve a general linear programming problem, there are special cases that can be
treated more eﬃcient in another way.
In this section, we investigate a special case of linear programming that can be
reduced to ﬁnding shortest paths from a single source. The singlesource shortest
path problem can then be solve using the BellmanFord algorithm, thereby also
solving the linear programming problem.
Now sometimes we don’t really care about the objective function; we just wish
to ﬁnd an feasible solution, that is, any vector x that satisﬁes the constraints
(Ax ≤ b), or to determine that no feasible solution exists.
In a system of diﬀerence constraints each row of A contains exactly one 1, one
−1 and the rest 0’s. Thus, each of the constraints can be written on the form
x
j
−x
i
≤ b
k
. An example could be:
x
1
−x
2
≤ −3
x
1
−x
3
≤ −2
x
2
−x
3
≤ −1
x
3
−x
4
≤ −2
x
2
−x
4
≤ −4
This system of three constraints has a feasible solution, namely x
1
= 0, x
2
=
3, x
3
= 5, x
4
= 7.
Systems of diﬀerence constraints occur in many diﬀerent applications. For exam
ple, the variables x
i
may be times at which events are to occur. Each constraint
can be viewed as stating that there must be at least/most certain amount of
time (b
k
) between two events.
Another feasible solution to the system of diﬀerence constraints above is x
1
=
5, x
2
= 8, x
3
= 10, x
4
= 12. In fact, we have:
Theorem 4.3 Let x = (x
1
, x
2
, . . . , x
n
) be a feasible solution to a system of
diﬀerence constraints, and let d be any constant. Then x + d = (x
1
+ d, x
2
+
d, . . . , x
n
+ d) is a feasible solution to the system of diﬀerence constraints as
well.
Proof: Look at the general for of a constraint:
x
j
−x
i
≤ b
k
60 The Shortest Path Problem
2
7
3
0
2
1
4
4
0
0
0
3
0
1
2
Figure 4.9: The graph representing our system of diﬀerence constraints.
If we insert x
j
+d and x
i
+d instead of x
j
and x
i
we get
(x
j
+d) −(x
i
+d) = x
j
−x
i
≤ b
k
So if x satisﬁes the constraints so does x +d.△
The system of constraints can be viewed from a graph theoretic angle. If A is
an mn matrix then we represent it by a graph with n vertices and m directed
edges. Vertex v
i
will correspond to variable x
i
, and each edge corresponds to
one of the m inequalities involving two unknowns. Finally, we add a vertex v
0
and edges from v
0
to every other vertex to guarantee that every other vertex is
reachable. More formally we get a set of vertices
V = ¦v
0
, v
1
, . . . , v
n
¦
and
E = ¦(v
i
, v
j
) : x
j
−x
i
≤ b
k
is a constraint¦ ∪ ¦(v
0
, v
1
), (v
0
, v
2
), . . . (v
0
, v
n
)¦
Edges of the type (v
0
, v
i
) will get the weight 0, while the edges (v
i
, v
j
) get a
weight of b
k
. The graph of our example is shown in Figure 4.9
Theorem 4.4 Given a system Ax ≤ b of diﬀerence constraints. Let G be the
corresponding constraint graph constructed as described above vertex 0 being the
root. If G contains no negative weight cycles, then
x
1
= y
1
, x
2
= y
2
, . . . , x
n
= y
n
4.5 Applications of the Shortest Path Problem 61
is a feasible solution to the system. Here y
i
is the length of the shortest path
from 0 to i.
Proof: Consider any edge (i, j) ∈ E. By the triangle inequality, y
j
≤ y
i
+w
ij
,
or equivalently y
j
− y
i
≤ w
ij
. Thus letting x
i
= y
i
and x
j
= y
j
satisﬁes the
diﬀerence constraint x
j
−x
i
≤ w
ij
that corresponds to edge (i, j). △
Theorem 4.5 Given a system Ax ≤ b of diﬀerence constraints. Let G be the
corresponding constraint graph constructed as described above vertex 0 being the
root. If G contains a negative weight cycle then there is no feasible solution for
the system.
Proof: Suppose the negative weight cycle is '1, 2, . . . , k, 1`. Then
x
2
−x
1
≤ w
12
x
3
−x
2
≤ w
23
.
.
.
x
k
−x
k−1
≤ w
k−1,k
x
1
−x
k
≤ w
k1
Adding left hand sides together and right hand sides together results in 0 ≤
weight of cycle, that is, 0 < 0. That is obviously a contradiction. △
A system of diﬀerence constraints with m constraints and n unknowns produces
a graph with n +1 nodes and n +m edges. Therefore BellmanFord will run in
O(n +1)(n +m)) = O(n
2
+nm). In [1] it is established that you can achieve a
time complexity of O(nm).
Using the BellmanFord algorithm we can now calculate the following solution
x
1
= −7, x
2
= −4, x
3
= −2, x
4
= 0. By adding 7 to all variables we get the ﬁrst
solution we saw.
Swapping applications in a daily airline ﬂeet assignment
Brug Talluris artikel som har brug for korteste vej som endnu et eksempel p˚a
brugen af korteste vej. [6] er nummer 832 i mit paperbibliotek.
62 The Shortest Path Problem
4.6 Supplementary Notes
A nice exposition to the relationship between the shortest path problem and LP
theory can be found in [3]. The section on the subject in these notes are based
on that text.
More information about totally unimodular (TU) matrixes can be found in [4].
Besides TU there exists other classes of matrices, Balanced and Perfect, that
also have the same integer property. More information on those can be found
in [5].
4.7 Exercises
1. Use Dijkstra’s algorithm to ﬁnd the solution the shortest paths from vertex
1 to all other vertices in the graph in Figure 4.10.
Afterwards verify the result by constructing the shortest path integer pro
gramming model for the graph. Solve it in your MIP solver. What happens
if we relax the integrality constraint and solve it using an LP solver?
Also verify the potentials by establishing the dual problem and solve it in
your LP solver.
2. Given two vertices i and j. Is the path between i and j in a minimum
spanning tree necessarily a shortest path between the two vertices? Give
a proof or counterexample.
3. You are employed in a consultancy company dealing mainly with rout
ing problems. The company has been contacted by a customer, who is
planning to produce a traﬃc information system. The goal of the system
is to be able to recommend the best path between two destinations in a
major city taking into account both travel distance and time. The queries
posed to the system will be of the type “what is the shortest path (in kilo
meters) from Herlev to Kastrup, for which the transportation time does
not exceed 25 minutes?” You have been assigned the responsibility of the
optimization component of the system.
As a soft start on the project, you ﬁnd diﬀerent books in which the usual
single source shortest path problem is considered.
(a) Apply Dijkstra’s algorithm to the digraph in Figure 4.11 and show
the resulting values of y
1
, . . . , y
4
and the corresponding shortest path
tree.
4.7 Exercises 63
Figure 4.10: Find the shortest paths from the source vertex 1 to all other vertices
(b) The Shortest Path Problem can be formulated as a transshipment
problem, in which the source has capacity 4 and each other vertex is
a demand vertex with demand 1. State the problem corresponding
to the digraph in Figure 4.11 and compute the vertex potentials with
the shortest path tree found in question (a) as basis tree. Using the
potentials, show that the distances determined in question (a) are
not the correct shortest distances. Then ﬁnd the correct distances
(and correct shortest path tree) using i) BellmanFord algorithm for
shortest paths; ii) the transshipment algorithm as described in chap
ter 7.
(c) For the sake of completeness, you decide also to look at alltoall
shortest path algorithms. You apply the algorithm of FloydWarshall.
Show the ﬁrst iteration of the algorithm for the graph of Figure 4.11.
(d) As indicated one can use Dijkstra’s algorithm repeatedly to solve the
allpairsshortest path problem – given that the graph under consid
eration has nonnegative edge costs.
In case G does not satisfy this requirement, it is, however, possible
to change the cost of the edges so all costs become nonnegative and
the shortest paths with respect to the new edge costs are the same as
64 The Shortest Path Problem
s
3
1
2
3
4
1
1
4
3
2
3
2
Figure 4.11: Digraph with source vertex s
those for the old edge costs (although the actual lengths of the paths
change). The method is the following:
An artiﬁcial vertex K is added to G, and for each vertex v in G, an
edge of cost 0 from K to v is added, cf. Figure 4.12, which illustrates
the construction for the digraph of Figure 4.11.
The shortest path from K to each vertex i ∈ V is now calculated
– this is denoted π
i
. The new edge lengths w
′
ij
are now deﬁned by
w
′
ij
= w
ij
+π
i
−π
j
.
Which shortest path algorithm would you recommend for ﬁnding the
shortest paths from K to the other vertices, and why? Show the
application of this algorithm on the digraph of Figure 4.12.
(e) Explain why it holds that w
′
ij
is nonnegative for all i, j ∈ V , such
that Dijkstra’s algorithm may now be applied to G with the modiﬁed
edge costs.
Present an argument showing that for all s, t ∈ V , the shortest st
path with respect to the original edge costs w
ij
, v, w ∈ V is also the
shortest st path with respect to the modiﬁed edge costs w
′
ij
.
(f) You are now in possession of two alltoall shortest path algorithms:
The FloydWarshall algorithm, and the repeated application of Di
jkstra’s algorithm once per vertex of the given graph. What is the
computational complexity for each of the methods? For which graphs
would you recommend FloydWarshall’s algorithm, and for which
would you use [V[ applications of Dijkstra’s algorithm?
4.7 Exercises 65
s
0
1
2
3
4
1
1
4
3
2
3
2
3
K
0
0
0
0
Figure 4.12: Digraph with additional vertex K and with 0length edges added.
66 The Shortest Path Problem
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clif
ford Stein. Introduction to Algorithms, Second Edition. The MIT Press
and McGrawHill, 2001.
[2] Ravinda K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network
Flows. Prentice Hall, 1993.
[3] Jakob Krarup and Malene N. Rørbech, LP formulations of the shortest path
tree problem, 4OR 2:259–274 (2004).
[4] Laurence A. Wolsey. Integer Programming. Wiley Interscience, 1998.
[5] George L. Nemhauser and Laurence A. Wolsey. Integer and Combinatorial
Optimization. Wiley Interscience, 1998.
[6] Kalyan T. Talluri, Swapping Applications in a Daily Airline Fleet Assign
ment, Transportation Science 30(3):237–248.
68 BIBLIOGRAPHY
Chapter 5
Project Planning
The challenging task of managing largescale projects can be supported by two
Operations Research techniques called PERT (Program Evaluation and Review
Technique) and CPM (Critical Path Method). These techniques can assist the
project manager in breaking down the overall project into smaller parts (called
activities or tasks), coordinate and plan the diﬀerent activities, develop a real
istic schedule, and ﬁnally monitor the progress of the project.
The methods were developed independently of each other in the late 50’s and the
original versions of PERT and CPM had some important diﬀerences, but are now
often used interchangeably and are combined into one acronym: PERT/CPM.
In fact PERT was developed in 1958 to help measure and control the progress
of the Polaris Fleet Ballistic Missile program for the US Navy. Around the same
time the American chemical company DuPont was using CPM to manage its
construction and repair of factories.
Today project management software based on these methods are widely available
in many software packages such as eg. Microsoft Projects.
PERT/CPM has been used for many diﬀerent projects including:
1. Construction of a new building or road.
70 Project Planning
2. Movie productions.
3. Construction of ITsystems.
4. Ship building.
5. Research and development of a new product.
The role of the project manager in project management is one of great re
sponsibility. In the job one needs to direct and supervise the project from the
beginning to the end. Some of the roles are:
1. The project manager must deﬁne the project, reduce the project to a set
of manageable tasks, obtain appropriate and necessary resources.
2. The project manager must deﬁne the ﬁnal goal for the project and must
motivate the workers to complete the project on time.
3. A project manager must have technical skills. This relates to ﬁnancial
planning, contract management, and managing innovation and problem
solving within the project.
4. No project ever goes 100% as planned. It is the responsibility of the project
manager to adapt to changes such as reallocation of resources, redeﬁning
activities etc.
In order to introduce the techniques we will look at an example. As project
manager at for GoodStuﬀ Enterprises we have the responsibility for the devel
opment of a new series of advanced intelligent toys for kids called MasterBlaster.
Based on a preliminary idea top management has given us green light to a more
thorough feasibility study. As the toy should be ready before the Christmas sale
we have been asked to investigate if we can ﬁnish the project within 30 weeks.
The tasks that needs to be carried out during the project is broken down into
a set of individual “atomic” tasks called activities. For each activity we need
to know the duration of the activity and its immediate predecessors. For the
MasterBlaster project the activities and their data can be seen in Table 5.1.
5.1 The Project Network
As project managers we ﬁrst and foremost would like to get a visualization of
the project that reveal independencies and the overall “ﬂow” of the project. As
the saying goes one picture is worth a thousand words.
5.1 The Project Network 71
Activity Description Immediate Duration
predecessor (weeks)
A Product design – 10
B Market research – 4
C Production analysis A 8
D Product model A 6
E Marketing material A 6
F Cost analysis C 5
G Product testing D 4
H Sales training B, E 7
I Pricing F, H 2
J Project report F, G, I 3
Table 5.1: Activities in project MasterBlaster
The ﬁrst issue we will look at is how to visualize the project ﬂow, that is, how
do we visualize the connection of the diﬀerent tasks in the project and their
relative position in the overall plan.
Project networks consist of a number of nodes and a number of arcs. Since the
ﬁrst inventions of the project management methods PERT and CPM there have
been two alternatives for presenting project networks:
AOA (Activityonarc): Each activity is represented as a an arc. A node is
used to separate an activity from each of its immediate predecessors.
AON (Activityonnode: Each activity is represented by a node. The arcs
are used to show the precedence relationships.
AON have generally been regarded as considerably easier to construct than
AOA. AON is also seen as easier to revise than AOA when there are changes
in the network. Furthermore AON are easier to understand than AOA for
inexperienced users. Let us look at the diﬀerences on a small example. In Table
5.2 we have a breakdown of a tiny project into activities (as duration is not
important for visualizing the project it is omitted here).
First we build a ﬂow network as an AON network. In an AON network nodes
are equal to activities and arcs will represent precedence relations. Note that
this will result in a acyclic network (why?). In order to start the process two
“artiﬁcial” activities “St” (for start) and “Fi” (for ﬁnish) are added. St is the
ﬁrst activity of the project and Fi is the ﬁnal activity of the project.
We start of by drawing the nodes St, A, B, C, D, and Fi. Now for each prede
72 Project Planning
Activity Immediate
predecessor
A –
B –
C A, B
D B
Table 5.2: A breakdown of activities for a tiny project
cessor relation we draw an arc, that is, we draw an arc (A, C), a (B, C) arc and
a (B, D) arc. As A and B do not have any predecessors we introduce an (St, A)
arc and an (St, B) arc. Finally we introduce a (C, Fi) arc and a (D, Fi) arc as
they do not occur as predecessors to any activity. We have now constructed an
AON network representation of our small example shown in Figure 5.1
A
B
C
D
St Fi
Figure 5.1: An AON network for our little example
Where the activities are represented by nodes in the AON network they are
represented by arcs in an AOA network. The nodes merely describes an “state
change” or a checkpoint. In the small example we need to be able to show that
whereas activity C is preceded by A and B, D is only preceded by B. The only
way to do this is by creating a “dummy” activity d, and we then get the network
as shown in Figure 5.2
Although the AOA network is more compact than the AON network the prob
lem with the AOA network is the use of dummy activities. The larger and more
complex the project becomes the more dummy activities will have to be intro
duced to construct the AOA network, and these dummy activities will blur the
picture making it harder to understand.
We will therefore focus on AON networks in the following as they are more
intuitive and easy to construct and comprehend.
Let us construct the project network for our MasterBlaster project. We begin
by making the start node (denoted St). This node represents an activity of zero
5.1 The Project Network 73
St
Fi
A
B
C
D
d
Figure 5.2: An AOA network for out little example
duration that starts the project. All nodes in the project that do not have an
immediate predecessor will have the start node as their immediate predecessor.
After the St node the nodes A and B, an arc from St to A, and one from St to B
are generated. This is shown in Figure 5.3 where the duration of the activities
are indicated next to the nodes.
Figure 5.3: Start building the project network by making the start node and
connecting nodes that does not have an immediate predecessor to the start node.
In this case it is node A and B.
Then we make the nodes for activities C, D and E and making an arc from A
to each of these nodes. This way we continue to build the AON network. After
having made node J and connected it to activities F and I, we conclude the
construction of the project network by making the ﬁnish node, Fi. All activities
that are not immediate predecessor to an activity is connected to Fi. Just as
the start node the ﬁnish node represents an imaginary activity of duration zero.
For our MasterBlaster project the project network is shown in Figure 5.4.
In relation to our project a number of questions are quite natural:
74 Project Planning
Figure 5.4: A AON network of our MasterBlaster project. The duration of the
individual activities are given next to each of the nodes.
• What is the total time required to complete the project if no delays occur?
• When do the individual activities need to start and ﬁnish (at the latest)
to meet this project completion time?
• When can the individual activities start and ﬁnish (at the earliest) if no
delays occur?
• Which activities are critical in the sense that any delay must be avoided
to prevent delaying project completion?
• For other activities, how much delay can be tolerated without delaying
the project completion?
5.2 The Critical Path 75
5.2 The Critical Path
A path through a project network starts at the St node, visits a number of
activities, and ends at the Fi node. Such a path cannot contain a cycle. An
example of a path is 'St, A, C, F, J, Fi`. The MasterBlaster project contains a
total of 5 paths which are enumerated below:
'St, A, C, F, J, Fi, `
'St, A, C, F, I, J, Fi`
'St, A, D, G, J, Fi`
'St, A, E, H, I, J, Fi`
'St, B, H, I, J, Fi`
To each of the paths we assign a length, which is the total duration of all activities
on the path if they where executed with no breaks in between. The length of a
path is calculated by adding the durations of the activities on the path together.
For the ﬁrst path 'St, A, C, F, J, Fi` we get 10 +8 +5 +3 = 26. Below we have
listed all paths and their length.
'St, A, C, F, J, Fi` 26
'St, A, C, F, I, J, Fi` 28
'St, A, D, G, J, Fi` 23
'St, A, E, H, I, J, Fi` 28
'St, B, H, I, J, Fi` 16
The estimated project duration equals the length of the longest path through
the project network. The length of a path identiﬁes a minimum time required
to complete the project. Therefore we cannot complete a project faster than
the length of the longest path.
This longest path is also called the critical path. Note that there can be more
than one critical path if several paths have the same maximum length. If we run
a project we must be especially careful about keeping the time on the activities
on a critical path. For the other activities there will be some slack, making it
possible to catch up on delays without forcing a delay in the entire project.
In the case of our MasterBlaster project there are in fact two critical paths,
namely: 'St, A, C, F, I, J, Fi` and 'St, A, E, H, I, J, Fi` with a length of 28
weeks. Thus the project can be completed in 28 weeks, two weeks less than
the limit set by top management.
76 Project Planning
5.3 Finding earliest start and ﬁnish times
As good as it is to know the project duration it is necessary in order to con
trol and manage a project to know the scheduling of the individual activities.
The PERT/CPM scheduling procedure begins by determining start and ﬁnish
time for the activities. They need to start and end maintaining the calculated
duration of the project. No delays means that
1. actual duration equals estimated duration, and
2. each activity begins as soon as all its immediate predecessors are ﬁnished.
For each activity i in our project we deﬁne:
• ES
i
as the earliest start for activity i, and
• EF
i
to be Earliest ﬁnish for activity i
Let us denote the duration of activity i as d
i
. Now clearly EF
i
can be computed
as ES
i
+d
i
.
ES and EF is computed in a top down manner for the project graph. Initially
we assign 0 to ES
st
and EF
st
. Now we can set ES
A
and ES
B
to 0 as they
follow immediately after the start activity. EF
A
must then be 10, and EF
B
4.
In the same way earliest start and ﬁnish can easily be computed for C, D and E
based on an earliest start of 10 (as this is the earliest ﬁnish of their immediate
predecessor, activity A). Activities F and G can be computed now that we know
the earliest ﬁnish of their immediate predecessors.
When we need to compute ES and EF for activity H we need to apply the
Earliest Start Time Rule. It states that the earliest start time of an activity is
equal to the largest of the earliest ﬁnish times of its immediate predecessors, or
ES
i
= max¦EF
j
: j is an immediate predecessor of i¦
Applying this rule for activity H means that ES
H
= max¦EF
B
, EF
E
¦ = max¦4, 16¦ =
16 and then EF
H
= 23. We continue this way from top to bottom computing
ES and EF for activities. The values for the project can be seen in Figure 5.6
5.4 Finding latest start and ﬁnish times 77
Figure 5.5: A AON network of our MasterBlaster project with earliest start and
ﬁnish times.
5.4 Finding latest start and ﬁnish times
The latest start time for an activity i is the latest possible time that the activity
can start without delaying the completion of the project. We deﬁne:
• LS
i
to be the latest start time for activity i, and
• LF
i
to be latest ﬁnish time for for activity i.
Obviously LS
i
= LF
i
−d
i
.
78 Project Planning
Figure 5.6: A AON network of our MasterBlaster project with earliest start and
ﬁnish times.
We want to compute LS and LF for each of the activities in our problem. To
do so we use a bottom up approach given that we have already established the
completion time of our project. This automatically becomes then latest ﬁnish
time of the ﬁnish node. Initially LS
ﬁ
and LF
ﬁ
is set to 28.
Now LS and LF for activity J is computed. Latest ﬁnish time must be equal
to the latest start time of the successor which is the ﬁnish activity. So we get
LS
J
= 28 and LF
J
= 25. Again LS and LF for activity I can easily be calculated
as it has a unique successor. So we get LS
I
= 25 and LF
I
= 23. For activity F
we need to take both activity I and J into consideration. In order to calculate
the latest ﬁnish time we need to use the Latest Finish Time Rule, which states
that the latest ﬁnish time of an activity is equal to the smallest of the latest
5.4 Finding latest start and ﬁnish times 79
start times of its immediate successors, or
LF
i
= min¦LS
j
: j is an immediate successor of i¦
because activity i has to ﬁnish before any of its successors can start.
With this in mind we can now compute LF
F
= min¦LS
I
, LS
J
¦ = min¦23, 25¦ =
23 and LS
F
will therefore be 18. In the same way the remaining latest start
and ﬁnish times can be calculated. For the MasterBlaster project we get the
numbers in Figure 5.7.
Figure 5.7: A AON network of the MasterBlaster project with latest start and
ﬁnish times.
80 Project Planning
In Figure 5.8 the latest and earliest start and ﬁnish times are presented in one
ﬁgure. These quite simple measures are very important in handling a project.
Figure 5.8: A AON network of the MasterBlaster project with earliest and latest
start and ﬁnish times.
When we eg. look at activity G we can see that the project can start as early as
week 16 but even if we start as late as in week 21 we can still ﬁnish the project
on time. On the contrary activity E has to be managed well as there is no room
for delays. It will cause a knockon eﬀect and thereby delay the entire project
it activity E gets delayed.
We can in fact for each of the activities compute the slack. The slack for an
activity is deﬁned as the diﬀerence between its latest ﬁnish time and its earliest
ﬁnish time. For our projects the slacks are given in Table 5.3
An activity with a positive slack has some ﬂexibility: it can be delayed (up to
a certain point) and not cause a delay of the entire project.
5.5 Considering TimeCost tradeoﬀs 81
Slack Activities
0 A, C, E, F, H, I, J
pos. B, D, G
Table 5.3: The slacks for the MasterBlaster project. The main issue is to know
whether the slack is zero or positive.
Note that each activity with zero slack is on a critical path through the project
network. Any delay for an activity along this path will delay project completion.
Therefore through the calculation of the slacks the project manager now has a
clear identiﬁcation of activities that needs his utmost attention.
5.5 Considering TimeCost tradeoﬀs
Once we have identiﬁed the critical path and the timing on the activity the next
question is if we can shorten the project. This is often done in order to ﬁnish
within a certain deadline. In many building projects there are incentives made
by the developer to encourage the builder to ﬁnish before time.
Crashing an activity refers to taking (costly) measures to reduce the duration
of an activity below its normal time. This could be to use extra manpower or
using more expensive but also more powerful equipment. Crashing a project
refers to crashing a number of activities in order to reduce the duration of the
project below its normal time. Doing so we are targeting a speciﬁc time limit
and we would like to reach this time limit in the least cost way.
In the most basic approach we assume that the crashing cost is linear, that is,
for each activity the cost for each unit of reduction in time is constant. The
situation is shown in Figure 5.9.
Top management reviews our project plan and comes up with a oﬀer. They
think 28 weeks will make the product available very late in comparison with the
main competitors. They oﬀer an incentive of 40000 Euros if the project can be
ﬁnish in 25 weeks or earlier.
Quickly we evaluate every activity in our project to estimate the unit cost of
crashing each activity and by how much we can crash each of the activities. The
numbers for our project is shown in Figure 5.4.
Now based on these ﬁgures we can look at the problem of ﬁnding the least cost
82 Project Planning
normal time
crash time
crash cost
normal cost
Normal
Crash
Figure 5.9: Crashing an activity
way of reducing the project duration from the current 28 weeks to 25 weeks.
One way is the Marginal Cost Approach. The Marginal Cost Approach was
developed in conjunction with the early attempts of project management. Here
we use the unit crash cost to determine a way of reducing the project duration
by 1 week at a time. The easiest way to do this is by constructing a table like
the one shown in Table 5.5.
As the table clearly demonstrates, a weakness of the method is that even with
a few routes it can be quite diﬃcult to keep an overview (and to ﬁt it on one
piece of paper!). And many projects will have signiﬁcantly more paths than 5.
In the Marginal Cost Approach we repeatedly ask the question “What is the
cheapest activity to crash in a critical path?”. Initially there are two crit
ical paths in the MasterBlaster project, which are 'St, A, C, F, I, J, Fi` and
'St, A, E, H, I, J, Fi`. We can run through Table 5.4 and ﬁnd the cheapest
activity to crash which is activity H. We now crash activity H by one week and
our updated table will look like Table 5.6.
As a consequence of crashing activity H by one week all paths that contains
activity H are reduced by one in length.
The length of the critical path is still 28 weeks but there is not only one critical
path in the project namely 'St, A, C, F, I, J, Fi`. So we have not yet reached the
25 weeks and therefore perform yet another iteration. We look at the critical
path(s). We ﬁnd the cheapest activity that can be crashed, which is activity F,
5.5 Considering TimeCost tradeoﬀs 83
Activity Duration Duration Unit Crashing
(normal) (crashing) cost
A 10 7 8000
B 4 3 4000
C 8 7 9000
D 6 4 16000
E 6 3 11000
F 5 3 6000
G 4 3 7000
H 7 3 5000
I 2 2 —
J 3 2 9000
Table 5.4: Crashing activities in project MasterBlaster.
Activity Crash Path length
to crash cost ACFJ ACFIJ ADGJ AEHIJ BHIJ
26 28 23 28 16
Table 5.5: The initial table for starting the marginal cost approach
and crash it by one week we then get the situation in Table 5.7.
As we have still not reached the required 25 weeks we continue. The ﬁnal result
can be seen in Table 5.8
So overall it costs us 35000 Euros to get the project length down to 25 weeks.
Given that the overall reward is 40000 Euros the “surplus” is 5000 Euros. As the
project must be considered more unstable after the crashing we should deﬁnitely
think twice before going for the bonus. The wise project manager will probably
resist the temptation as the risk is to high and the payoﬀ to low.
Another way of determining the minimal crashing cost is to state the problem
as an LP model and solve it using an LP solver.
Activity Crash Path length
to crash cost ACFJ ACFIJ ADGJ AEHIJ BHIJ
26 28 23 28 16
H 5000 26 28 23 27 15
Table 5.6: Crashing activity H by one week and update the paths accordingly.
84 Project Planning
Activity Crash Path length
to crash cost ACFJ ACFIJ ADGJ AEHIJ BHIJ
26 28 23 28 16
H 5000 26 28 23 27 15
F 6000 25 27 23 27 15
Table 5.7: Crashing activity D by one week for the second time and update the
paths accordingly.
Activity Crash Path length
to crash cost ACFJ ACFIJ ADGJ AEHIJ BHIJ
26 28 23 28 16
H 5000 26 28 23 27 15
F 6000 25 27 23 27 15
H 5000 25 27 23 26 14
F 6000 24 26 23 26 14
H 5000 24 26 23 25 13
A 8000 23 25 22 24 13
Table 5.8: After crashing activities H (three times), F (twice) and A (once).
We need to present the problem as an LP problem with objective function,
constraints and variable deﬁnitions.
The decisions that we need to take are by how much we crash an activity and
when the activity should start. So for each activity we deﬁne two variables x
i
and y
i
, where x
i
is the reduction in the duration of the activity and y
i
is the
starting time of the activity.
How will our objective function look like? The objective function is to minimize
the total cost of crashing activities: min 8000x
A
+ 4000x
B
+. . . + 9000x
J
.
With respect to the constraints we have to impose the constraint that the project
must be ﬁnished in less than or equal to a certain number of weeks. A variable,
y
Fi
, is introduced and the project duration constraint y
Fi
≤ 25 is added to the
model.
Furthermore we need to add constraints that correspond to the fact that the
predecessor of an activity needs to be ﬁnished before the successor can start.
Since the start time of each activity is directly related to the start time and
duration of each of its immediate predecessors we get:
5.5 Considering TimeCost tradeoﬀs 85
start time of this activity ≥ (start time  duration) for this immediate
predecessor
In other words if we look at the relationship between activity A and C we get
y
C
≥ y
A
+ (10 −x
A
). For each arc in the AON project network we get exactly
one of these constraints.
Finally we just need to make sure that we do not crash more than actually
permitted so we get a series of upper bounds on the xvariables, that is, x
A
≤
7, x
B
≤ 1, . . . , x
J
≤ 1.
For our project planning problem we arrive at the following model:
min 8000x
A
+ 4000x
B
+ 9000x
C
+ 16000x
D
+ 11000x
E
+
6000x
F
+ 7000x
G
+ 5000x
H
+ 9000x
J
s.t. y
C
≥ y
A
+ (10 −x
A
)
y
D
≥ y
A
+ (10 −x
A
)
y
E
≥ y
A
+ (10 −x
A
)
. . .
y
Fi
≥ y
J
+ (3 −x
J
)
y
Fi
≤ 25
x
A
≤ 3
. . .
x
J
≤ 1
This model can now be entered into CPLEX or another LP solver. If we do that
we get a surprising result. The LP optimium is 24000 Euros (simply crashing
activity A three times) which is diﬀerent from the Marginal Cost Approach.
How can that be? The Marginal Cost Approach said that the total crashing
cost would be 35000 Euros and now we found a solution that is 11000 Euros
better.
In fact the Marginal Cost Approach is only a heuristic. Before we investigate
why we only got a heuristic solution for our MasterBlaster project let us look
at Figure 5.10.
There are two activities with a unit crashing cost of 30000 Euros (activity B and
C) and one (activity D) with a unit crashing cost of 40000 Euros. Assume we
have two critical paths from A to F one going via B the other via C. In the ﬁrst
step of the marginal cost analysis we will choose either B or C to crash. This
will cost us 30000 Euros. If we crash B we will crash C in the next step or vice
versa. Again costing us 30000 Euros. In total both paths have been reduced by
86 Project Planning
30000 30000
40000
D
A
B C
E
F
Figure 5.10: An example of where the Marginal Cost Approach makes a subop
timal choice.
1 unit each. This could also have been achieved by crashing activity D by one
week, and that would only cost us 40000 Euros.
In the solution to the MasterBlaster project problem we crash activities H and
F once each to reduce the critical path by one. This costs us 11000 Euros. Now
crash activity A just once also reduces the length of the critical path by one,
but this is only at a cost of 8000 Euros. But because we very narrowly looks at
the cheapest activity a critical path to crash we miss this opportunity.
So now we have to reconsider our refusal to crash the project. Now the potential
surplus is 16000 Euros.
5.6 Supplementary Notes
In the material we have discussed here all durations times were static. Of
course in a real world project one thing is to estimate duration times another
is to achieve them. The estimate could be wrong from the beginning or due
to other reasons it could become wrong as the project is moving on. Therefore
5.7 Exercises 87
research has been conducted into looking at stochastic elements in relation to
project planning.
Further readings on project planning where we also consider stochasticity can
be found in [1] that introduces the ﬁrst simple approaches.
This very simple model on project planning can be extended in various ways
taking on additional constraints on common resources etc. For further reading
see [2].
5.7 Exercises
1. You are in charge of organizing a training seminar for the OR department
in your company. Remembering the project management tool in OR you
have come up with the following list of activities as in Table 5.9.
Activity Description Immediate Duration
predecessor (weeks)
A Select location – 1
B Obtain keynote speaker – 1
C Obtain other speakers B 3
D Coordinate travel for A, B 2
keynote speaker
E Coordinate travel for A, C 3
other speakers
F Arrange dinners A 2
G Negotiate hotel rates A 1
H Prepare training booklet C, G 3
I Distribute training booklet H 1
J Take reservations I 3
K Prepare handouts from speaker C, F 4
Table 5.9: Activities in planning the training seminar.
(a) Find all paths and path lengths through the project network. Deter
mine the critical path(s).
(b) Find earliest time, latest time, and slack for each of the activities.
Use this information to determine which of the paths is a critical
path?
(c) As the proposed date for the training seminar is being moved the
training seminar needs to be prepare in less time. Activities with a
88 Project Planning
duration of 1 week cannot be crashed any more, but those activities
that takes 2 or 3 weeks can be crashed by one week. Activity K can
be crashed by 2 weeks. Given a unit crashing cost of 3000 Euros for
the activities D and H, 5000 Euros for the activities J and K and
6000 Euros for activities E and F. Finally crashing activity C costs
7000 Euros. What does it cost to shorten the project by one week?
and how much to shorten it by two weeks?
2. You are given the following information about a project consisting of seven
activities (see Table 5.10).
Activity Immediate Duration
predecessor (weeks)
A – 5
B – 2
C B 2
D A, C 4
E A 6
F D, E 3
G D, F 5
Table 5.10: Activities for the design of a project network.
(a) Construct the project network for this project.
(b) Find earliest time, latest time, and slack for each of the activities.
Use this information to determine which of the paths is a critical
path?
(c) If all other activities take the estimated amount of time, what is the
maximum duration of activity D without delaying the completion of
the project?
3. Subsequently you are put in charge of the rather large project of imple
menting the intranet and the corresponding organizational changes in the
company. You immediately remember something about project manage
ment from you engineering studies. To get into the tools and the way of
thinking, you decide to solve the following test case using PERT/CPM:
The available data is presented in Table 5.11. Find the duration of the
project if all activities are completed according to their normaltimes.
In the LP model for ﬁnding the cheapest way to shorten a course, there is a
variable x
i
for each activity i in the project. There are also two constrains,
“x
i
≥ 0” and “x
i
≤ D
i
− d
i
”. Denote the dual variables of the latter g
i
.
Argue that if d
i
< D
i
, then either g
i
is equal to 0 or x
i
= D
i
−d
i
in any
optimal solution.
5.7 Exercises 89
Activity Imm. pred. Normaltime (D
i
) Crashtime (d
j
) Crash cost
e
1
5 1 4
e
2
3 1 4
e
3
e
1
4 2 1
e
4
e
1
, e
2
6 1 3
e
5
e
2
6 5 0
e
6
e
3
, e
4
4 4 0
Table 5.11: Data for the activities of the test case.
What is the additional cost of shortening the project time for the test case
to 13?
4. The linear relationship on the crashing cost is vital for the results. After a
thorough analysis a project manager has come to a situation where one of
his activities does not exhibit the linear behavior. After further analysis
it has been shown that the crashing costs can actually be modeled by a
piecewise linear function, in this simple case with only one “break point”.
So the situation for the given activity can be illustrated like in Figure 5.11.
crash
cost
normal
cost
crash
time
normal
time
intermediate
time
crash
normal
Figure 5.11: Crashing an activity
You may assume that the slopes are as depicted. But can we deal with
this situation in our LP modeling approach? If yes, how. If no, why not.
Same question if you do not know anything about the slopes beforehand.
5. Let us turn back to our MasterBlaster project. Let us assume that the
incentive scheme set forward by the top management instead of giving a
ﬁxed date and a bonus speciﬁed a daily bonus. Suppose that we will get
90 Project Planning
9500 Euros for each week we reduce the project length. Describe how this
can be solved in our LP model and what the solution in the MasterBlaster
project will be.
Construct the project crashing curve that presents the relationship be
tween project length and the cost of crashing until the project can no
longer be crashed.
Bibliography
[1] Hillier and Lieberman. Introduction to Operations Research. McGraw Hill,
2001.
[2] A. B. Badiru and P. S. Pulat. Comprehensive Project Management: In
tegrating Optimization Models, Management Principles, and Computers.
PrenticeHall, 1995.
92 BIBLIOGRAPHY
Chapter 6
The Max Flow Problem
Let us assume we want to pump as much water from point A to B. In order
to get the water from A to B we have a system of waterpipes with connections
and pipelines. For each pipeline we furthermore have a capacity identifying the
maximum throughput through the pipeline. In order to ﬁnd how much can be
pumped in total and through which pipes we need to solve the maximum ﬂow
problem.
In the maximum ﬂow problem (for short the max ﬂow problem) a directed graph
G = (V, E) is given. Each edge e has a capacity u
e
∈ R
+
, therefore we deﬁne
the graph as G = (V, E, u) . Furthermore two “special” vertices r and s are
given; these are called resp. the source and the sink. The objective is now to
determine how much ﬂow we can get from the source to the sink. We may
imagine the graph as a road network and we want to know how many cars we
can get through this particular road network. An alternative application as
described above the network is a network of water pipes and we now want to
determine the maximal throughput in the system.
In the max ﬂow problem we determine 1) the maximal throughput and 2) how
we can achieve that through the system. In order to specify how we want to
direct the ﬂow in a solution we ﬁrst need formally to deﬁne what a ﬂow actually
is:
94 The Max Flow Problem
Deﬁnition 6.1 A ﬂow x in G is a function x : E → R
+
satisfying:
∀i ∈ V ` ¦r, s¦ :
¸
(j,i)∈E
x
ji
−
¸
(i,j)∈E
x
ij
= 0
∀(i, j) ∈ E : 0 ≤ x
ij
≤ u
ij
Given the deﬁnition of a ﬂow we can deﬁne f
x
(i) =
¸
(j,i)∈E
x
ji
−
¸
(j,i)∈E
x
ij
.
So f
x
(i) is for a given ﬂow and a give node the diﬀerence in inﬂow and outﬂow
of that node where a surplus is counted positive. f
x
(i) called the net ﬂow into
i or the excess for x in i. Speciﬁcally f
x
(s) is called the value of the ﬂow x.
We can now state the max ﬂow problem as an integer programming problem.
The variables in the problem are the ﬂow variables x
ij
.
max f
x
(s)
s.t. f
x
(i) = 0 i ∈ V ` ¦r, s¦
0 ≤ x
e
≤ u
e
e ∈ E
x
e
∈ Z
+
e ∈ E
Notice that we restrict the ﬂow to being integer. The reason for this will become
apparent as the solution methods for the max ﬂow problem are presented.
Related to the deﬁnition of a ﬂow is the deﬁnition of a cut. There is a very tight
relationship between cuts and ﬂows that will also become apparent later. Let
us consider R ⊂ V . δ(R) is the set of edges incident from a vertex in R to a
vertex in R. δ(R) is also called the cut generated by R:
δ(R) = ¦(i, j) ∈ E [ i ∈ R, j ∈ R¦
Note the diﬀerence to the cut we deﬁned for an undirected graph in chapter 3.
In a directed graph the edges in the cut are only the edges going from R to R
and not those in the opposite direction.
So in an undirected graph we have δ(R) = δ(R), but this does not hold for the
deﬁnition of a cut in directed graphs. As the cut is uniquely deﬁned by the
vertexset R we might simply call R the cut. A cut where r ∈ R and s / ∈ R is
called an r, scut. Furthermore we deﬁne the capacity of the cut R as:
u(δ(R)) =
¸
(i,j)∈E,i∈R,j∈R
u
ij
95
that is, the capacity of a cut is the sum of capacities of all edges in the cut.
Theorem 6.2 For any r, scut δ(R) and any ﬂow x we have: x(δ(R))−x(δ(
¯
R)) =
f
x
(s).
Proof: Let δ(R) be an (r, s)cut and f a ﬂow. Now we know that f
x
(i) = 0
for all nodes i ∈ R ` ¦s¦. Trivially we have f
x
(s) = f
x
(s). If we now add these
equations together we get
¸
i∈R\{s}
f
x
(i) +f
x
(s) = f
x
(s) that is
¸
i∈R
f
x
(i) = f
x
(s). (6.1)
Consider the contribution of the diﬀerent edges to the right hand side of the
equation (the four diﬀerent cases are illustrated in Figure 6.1).
Figure 6.1: Illustration of the four diﬀerent cases for accessing the contribution
to a ﬂow based on the nodes relationship to R and R.
1. (i, j) ∈ E, i, j ∈ R: x
ij
occurs in none of the equations added, so it does
not occur in (6.1).
2. (i, j) ∈ E, i, j ∈ R: x
ij
occurs in the equation for i with a coeﬃcient of
−1 and a +1 in the equation for j so in (6.1) it will result in a coeﬃcient
of 0.
3. (i, j) ∈ E, i ∈ R, j ∈ R: x
ij
occurs in (6.1) with a coeﬃcient of 1 and
therefore the sum will have a contribution of +1 from this edge.
96 The Max Flow Problem
4. (i, j) ∈ E, i ∈ R, j ∈ R: x
ij
occurs in (6.1) with a coeﬃcient of −1 and
therefore the sum will have a contribution of −1 from this edge.
So the left hand side is equal to x(δ(R)) −x(δ(R)). △
Corollary 6.3 For any feasible ﬂow x and any r, scut δ(R), we have: f
x
(s) ≤
u(δ(R)).
Proof: Given a feasible ﬂow x and a r, scut δ(R). From theorem 6.2 we have
x(δ(R)) −x(δ(R)) = f
x
(s)
Since all values are positive (or zero) this gives us
f
x
(s) ≤ x(δ(R)).
Furthermore we cannot have a ﬂow larger than the capacity, that is, x(δ(R)) ≤
u(δ(R)). So we have f
x
(s) ≤ x(δ(R)) ≤ u(δ(R)). △
The importance of the corollary is that it gives an upper bound on any ﬂow
value and therefore also the maximum ﬂow value. In words it states – quite
intuitively – that the capacity of the minimum cut is an upper bound on the
value of the maximum ﬂow. So if we can ﬁnd a cut and a ﬂow such that the
value of the ﬂow equals the capacity of the cut then the ﬂow is maximum, the
cut is a “minimum capacity” cut and the problem is solved. The famous Max
Flow  Min Cut theorem states that this can always be achieved.
The proof of the Max Flow  Min Cut Theorem is closely connected to the ﬁrst
algorithm to be presented for solving the max ﬂow problem. So we will already
now deﬁne some terminology for the solution methods.
Deﬁnition 6.4 A path is called xincrementing (or just incrementing) if for
every forward arc e x
e
< u
e
and for every backward arc x
e
> 0. Furthermore a
xincrementing path from r to s is called xaugmenting (or just augmenting).
Theorem 6.5 Max Flow  Min Cut Theorem. Given a network G = (V, E, u)
and a current feasible ﬂow x. The following 3 statements are equivalent:
1. x is a maximum ﬂow in G.
6.1 The Augmenting Path Method 97
2. A ﬂow augmenting path does not exist (an rsdipath in G
x
).
3. An r, scut R exists with capacity equal to the value of x, ie. u(R) = f
x
(s).
Proof:
1) ⇒ 2) Given a maximum ﬂow x we want to prove that there does not exist a
ﬂow augmenting path.
Assume there does exist a ﬂow augmenting path, and seek a contradiction.
If there exists a ﬂow augmenting path we can increase the ﬂow along the
augmenting path by at least 1. This means that the new ﬂow is at least
one larger than the existing one. But this is in contradiction with the
assumption that the ﬂow was maximum. So no ﬂow augmenting path can
exist when the ﬂow is maximum.
2) ⇒ 3) First deﬁne R as R = ¦i ∈ V : ∃ an xincrementing path from r to i¦.
Based on this deﬁnition we derive:
1. r ∈ R
2. s ∈ R
Therefore R deﬁnes an r, scut.
Based on the deﬁnition of R we derive (i, j) ∈ δ(R) x
ij
= u
ij
and for
(i, j) ∈ δ(R) we get x
ij
= 0. As x(δ(R)) − x(δ(R)) = f
x
(s) we insert
x(δ(R)) = u(δ(R)) and x(δ(R)) = 0 from above, and get: u(δ(R)) = f
x
(s).
3) ⇒ 1) We assume that there exists an r, scut R with f
x
(s) = u(δ(R)). From the
corollary we have f
x
(s) ≤ u(δ(R)). The corollary deﬁnes an upper bound
equal to the current feasible ﬂow, which therefore must be maximum.
This proves that the three propositions are equivalent.△
6.1 The Augmenting Path Method
Given the Max Flow  Min Cut theorem a basic idea that leads to our ﬁrst
solution approach for the problem is immediate. The augmenting path method.
This is the classical max ﬂow algorithm by Ford and Fulkerson from the mid
50’s.
98 The Max Flow Problem
The basic idea is to start with a feasible ﬂow. As a feasible ﬂow we can always
use x = 0 but in many situations it is possible to construct a better initial ﬂow
by a simple inspection of the graph.
Based on the current ﬂow we then try to ﬁnd an augmenting path in the network,
that is, a path from r to s that can accommodate an increase in ﬂow. If that
is possible we introduce the increased ﬂow. This updates the previous ﬂow by
adding the new ﬂow that we have managed to get from r to s. Now we again
look for a new augmenting path. When we cannot ﬁnd an augmenting path the
maximum ﬂow is found (due to Theorem 6.5) and the algorithm terminates.
The critical part of this approach is to have a systematic way of checking for
augmenting paths.
An important graph when discussing the max ﬂow problem is the residual graph.
In some textbooks you might see the term “auxiliary graph” instead of residual
graph. The residual graph identiﬁes based on the current ﬂow where excess can
be send to.
Deﬁnition 6.6 Given a feasible ﬂow x the residual graph G
x
for G wrt. x is
deﬁned by:
V
x
= V (G
x
) = V
E
x
= E(G
x
) = ¦(i, j) : (i, j) ∈ E ∧ x
ij
< u
ij
¦ ∪
¦(j, i) : (i, j) ∈ E ∧ x
ij
> 0¦
The residual graph can be built from the original graph and the current ﬂow
arc by arc. This is illustrated in Figure 6.2.
If we look at the arc (r, 1) there is a ﬂow of four and a capacity of four. Therefore
we cannot push more ﬂow from r to 1 along that arc as it is already saturated.
On the other hand we can lower the ﬂow from r to 1 along the (r, 1) arc. This
is represented by having an arc in the opposite direction from 1 to r. Therefore
the residual graph contains an (1, r) arc. As another example consider the (1, 3)
arc. There is room for more ﬂow from 1 to 3, so we can push excess ﬂow from
1 in the direction of 3 using the (1, 3). Therefore the arc (1, 3) is in the residual
graph. But (3, 1) is also included in the residual graph as excess ﬂow from node
3 can be pushed to 1 by reducing ﬂow along the (1, 3) arc.
The nice property in a residual graph is that an augmenting path consists of
forward arcs only. This is not necessarily the case in the original graph. If we
return to Figure 6.2 the path 'r, 3, 2, s` is in fact an augmenting path. We can
push one more unit along this path and thereby increase the value of the ﬂow by
one. But the path uses the (2, 3) arc in the “reverse” direction as we decrease
6.1 The Augmenting Path Method 99
(a) The graph G and a ﬂow x
(b) The residual graph G
x
Figure 6.2: The residual graph reﬂects the possibilities to push excess ﬂow from
from a node
100 The Max Flow Problem
the inﬂow from node 3 and instead route it to the sink. In the residual graph
the path is simply made up by forward arcs.
Now we can describe the augmenting path algorithm where we use the residual
graph in each iteration to check if there exists an augmenting path or not. The
algorithm is presented in algorithm 9.
Algorithm 9: The Augmenting Path algorithm
Data: A directed graph G = (V, E, u), a source r and sink s
Result: A ﬂow x of maximum value
x
ij
← 0 for all (i, j) ∈ E 1
repeat 2
construct G
x
3
ﬁnd an augmenting path in G
x
4
if s is reached then 5
determine max amount x
′
of ﬂow augmentation 6
augment x along the augmenting path by x
′
7
until s is not reachable from r in G
x
8
If s is reached then the augmenting path is given by the p. Otherwise no
augmenting path exists.
Consider the following example. In order not to start with x = 0 as the initial
ﬂow we try to establish a better starting point. We start by pushing 4 units of
ﬂow along 'r, 1, 2, s` and one unit of ﬂow along 'r, 4, s`. Furthermore we push
two units of ﬂow along the path 'r, 1, 2, 3, 4, s` and arrive at the situation shown
in Figure 6.3 and the corresponding residual graph. Based on the residual graph
we can identify an augmenting path 'r, 1, s`. The capacities on the path are 2
and 1. Hence, we can at most push one unit of ﬂow through the path. This is
done in the following ﬁgure and its corresponding residual graph is also shown.
Now it becomes more tricky to identify an augmenting path. From r we can
proceed to 1 but from this node there is no outgoing arc. Instead we look at
the arc (r, 3). From 3 we can continue to 2 and then to 4 and then can get to
s. Having identiﬁed the augmenting path 'r, 3, 2, 4, s` we update the ﬂow with
2 along the path and get to the situation shown in 6.3 (c).
In the residual graph is not possible to construct incrementing paths beyond
nodes 1 and 3. Consequently, there is not an augmenting path in the graph and
consequently we have found a maximum ﬂow. The value of the ﬂow is 11 and
the minimum cut is identiﬁed by R = ¦r, 1, 3¦. Note that forward edges are
6.1 The Augmenting Path Method 101
saturated, that is, ﬂow equal to capacity, and backward edges are empty in the
cut.
In the algorithm we need to determine the maximum amount that we can in
crease the ﬂow along the augmenting path we have found. When increasing the
ﬂow we need to be aware that in arcs that are forward arcs in the original graph
we cannot exceed the capacity. For arcs that are backward arcs in the original
graph the ﬂow will be reduced along the augmenting path, and we cannot get a
negative ﬂow. As a result we get the updated (augmented) ﬂow x
′
:
• for (i, j) ∈ P : (i, j) ∈ E : x
′
ij
← x
ij
+u
max
s
• for (i, j) ∈ P : (j, i) ∈ E : x
′
ji
← x
ji
−u
max
s
• for all other (i, j) ∈ E : x
′
ij
← x
ij
where u
max
s
is the maximum number of units that can be pushed all the way
from source to sink.
The next issue is how the search in the residual graph for an augmenting path
is made. Several strategies can be used, but in order to ensure a polynomial
running time we use a breath ﬁrst approach as shown in algorithm 10. To realize
the importance of the search strategy look at Figure 6.4.
A ﬁrst augmenting path could be 'r, 1, 2, s`. We can push 1 unit of ﬂow through
this path. Next iteration we get the augmenting path 'r, 2, 1, s` again increasing
the ﬂow by one unit. Two things should be noted: 1) the running time of the
algorithm is dependent on the capacities on the arcs (here we get a running time
of 2M times the time it takes to update the ﬂow) and 2) if M is large it will
take a long time to reach the maximal ﬂow although it is possible to ﬁnd the
same solution in only two iterations.
102 The Max Flow Problem
(a) The graph G with an initial ﬂow an the corresponding residual graph
(b) Updating based on the augmenting path r, 1, s
(c) Updating based on the augmenting path r, 3, 2, 4, s
Figure 6.3: An example of the iterations in the augmenting path algorithm for
the maximum ﬂow problem
6.1 The Augmenting Path Method 103
1
1
M
M
M
M
r s
2
Figure 6.4: Example of what can go wrong if we select the ”wrong” augmenting
paths in the augmenting path algorithm
Algorithm 10: Finding an augmenting path if it exists
Data: The network G = (V, E, u), a source vertex r, sink vertex s, and
the current feasible ﬂow x
Result: An augmenting path from r to s and the capacity for the ﬂow
augmentation, if it exists
all vertices are set to unlabeled 1
Q ← ¦r¦, S ← ∅ 2
p
i
← 0 for all i 3
u
max
i
← +∞ for i ∈ G 4
while Q = ∅ and s is not labeled yet do 5
select a i ∈ Q 6
scan (i, j) ∈ E
x
7
label j 8
Q ← Q∪ ¦j¦ 9
p
j
← i 10
if (i, j) ∈ E
x
is forward in G then 11
u
max
j
← min¦u
max
i
, u
ij
−x
ij
¦ 12
if (i, j) ∈ E
x
is backward in G then 13
u
max
j
← min¦u
max
i
, x
ji
¦ 14
Q ← Q` i 15
S ← S ∪ ¦i¦ 16
We call an augmenting path from r to s shortest if it has the minimum possible
number of arcs. The augmenting path algorithm with a breadthﬁrst search
104 The Max Flow Problem
solves the maximum ﬂow problem in O(nm
2
). Breadthﬁrst can be implemented
using the queue data structure for the set Q.
6.2 The PreﬂowPush Method
An alternative method to the augmenting path method is the preﬂowpush al
gorithm for the Max Flow problem. This method is sometimes also called Push
relabel.
In the augmenting ﬂow method a ﬂow was maintained in each iteration of the
algorithm. In the preﬂowpush method the balance constraint – inﬂow for a
vertex equals outﬂow for a vertex – is relaxed by introducing a preﬂow.
Deﬁnition 6.7 A preﬂow for a directed graph G with capacities u on the
edges, a source vertex r and a sink vertex s is a function x : E → {
+
satisfying:
∀i ∈ V ` ¦r, s¦ :
¸
(j,i)∈E
x
ji
−
¸
(i,j)∈E
x
ij
≥ 0
∀(i, j) ∈ E : 0 ≤ x
ij
≤ u
ij
In other words: for any vertex v except r and s, the ﬂow excess f
x
(i) is non
negative – “more runs into i than out of i”. If f
x
(i) > 0 then node i is called
active. A preﬂow with no active nodes is a ﬂow.
An important component of the preﬂowpush method is the residual graph which
remains unchanged. So for a given graph G and preﬂow x the residual graph
G
x
shows where we can push excess ﬂow.
The general idea in the preﬂowpush algorithm is to push as much ﬂow as
possible from the source towards the sink. However it is not possible to push
more ﬂow towards the sink s, and there are still active nodes we push the excess
back toward the source r. This restores the balance of ﬂow.
A push operation will move excess ﬂow from an active node to another node
along an forward arc in the residual graph. Figure 6.5 (a) illustrates the process.
The nodes 2 and 3 are active nodes. In the graph we could push additional ﬂow
from node 3 to node 2. In this case node 3 would be in balance and the only
remaining active node would be node 2 as illustrated in Figure 6.5 (b). If we
are not careful the next push operation could potentially take one units of ﬂow
6.2 The PreﬂowPush Method 105
from node 2 back to node 3 and thereby reestablish the preﬂow we just had (as
there is an (2, 3) edge in E
x
). It is therefore necessary be able to control the
direction of the pushes in order to avoid repeating pushes.
In order to control the direction of the pushes we introduce what will later
turn out to be estimates (in fact lower bounds) on the distances in G
x
called a
labeling.
Deﬁnition 6.8 A valid labeling of the vertices in V wrt. a preﬂow x is a
function d : V → Z satisfying:
1. d
r
= n ∧ d
s
= 0
2. ∀(i, j) ∈ E
x
: d
i
≤ d
j
+ 1
The idea is that as long as it is possible we would like to send the ﬂow “towards”
the sink. In order to estimate that we use the valid labeling.
Having deﬁned valid labeling it is natural to ask if any feasible preﬂow admits
a valid labeling. The answer is no. Look at the simple example in Figure 6.6.
First of all d
r
must be 3 and d
s
0 according to the deﬁnition. What value can
we assign to d
a
in order to get a valid labeling? According to rule 2 based on
the arc (r, a) we have that d
r
≤ d
a
+1 and using rule 2 on the (a, s) arc we get
d
a
≤ d
s
+ 1. This gives d
a
≥ 2 and d
a
≤ 1 which clearly does not leave room
for a feasible value.
It is however possible to construct an initial feasible preﬂow that permits a valid
labeling. We will call this procedure an initialization of x and d. In order to get
a feasible preﬂow with a valid labeling we:
1. set x
e
← u
e
for all outgoing arcs of r,
2. set x
e
← 0 for all remaining arcs, and
3. set d
r
← n and d
i
← 0 otherwise.
For the max ﬂow problem in Figure 6.5 the initialization will result in the labels
shown in Figure 6.7.
A valid labeling for a preﬂow implies an important property of the preﬂow, that
it “saturates a cut”, that is, all edges leaving the cut have ﬂow equal to capacity.
If we look at Figure 6.7 the cut deﬁned by R = ¦r¦ is saturated.
106 The Max Flow Problem
(a) Inititial preﬂow
(b) Pushing one unit from node 3 to node 2
Figure 6.5: Changing a preﬂow by applying a push operation
6.2 The PreﬂowPush Method 107
Figure 6.6: A preﬂow does not always admit a valid labeling as this small
example shows
Figure 6.7: Initialization of the preﬂow and a corresponding valid labeling. The
values on the nodes is the labeling.
In a way the augmenting path algorithm and the preﬂowpush algorithm are
“dual” to each other. While the augmenting algorithm maintains a ﬂow and
terminates as a saturated cut is found, the preﬂowpush algorithm maintains a
saturated cut and terminates as a feasible ﬂow has been detected.
Theorem 6.9 Given a feasible preﬂow x and a valid labeling d for x, then there
exists an r, scut δ(R) such that x
ij
= u
ij
for all (i, j) ∈ δ(R) and x
ij
= 0 for
all (i, j) ∈ δ(R).
Proof: Since there are n nodes, there exists a value k, 0 < k < n such that
d
i
= k for all i ∈ V . Deﬁne R = ¦i ∈ V : d
i
> k¦. Clearly r ∈ R (d
r
= n) and
s ∈ R (d
s
= 0). R therefore deﬁnes an r, scut δ(R).
Let us look at an edge (i, j) in the residual graph that crosses the cut, that is
i ∈ R and j ∈ R. As the labeling is valid we must have d
i
= d
j
+ 1. As i ∈ R
d
i
≥ k + 1 and as j ∈ R d
j
≤ k − 1. No edge can fulﬁll the valid labeling and
108 The Max Flow Problem
consequently not edge in E
x
is leaving R. △
Let us return to the relationship between the labeling and the distance from a
node to the sink. Let d
x
(i, j) denote the number of arcs in a shortest dipath
from i to j in G
x
. With the deﬁnition d we can prove that for any feasible
preﬂow x and any valid labeling d for x we have
d
x
(i, j) ≥ d
i
−d
j
Basically for every edge (i, j) in the shortest path P from i to j we have d
i
≤
d
j
+ 1. Adding those together gives the states result.
As a consequence from this result we have:
Corollary 6.10 Estimates on distances. As special cases we have:
• d
i
is a lower bound on the distance from i to s.
• d
i
−n is a lower bound on the distance from i to r.
• If d
i
≥ n this means that there is no path from i to s.
• d
i
< 2n for all nodes.
We try to push ﬂow towards nodes j having d
j
< d
i
, since such nodes are
estimated to be closer to the ultimate destination. Having d
j
< d
i
and a valid
labeling means that push is only applied to arcs (i, j) where i is active and
d
i
= d
j
+ 1. Such an edge (i, j) is called an admissible edge.
Algorithm 11: The Push operation
consider an admissible edge (i, j) 1
calculate the amount of ﬂow, which can be pushed to w j 2
(min¦f
x
(i), (x
ji
+ (u
ij
−x
ij
))¦)
push this by ﬁrst reducing x
ji
as much as possible and then increasing x
ij
3
as much as possible until the relevant amount has been pushed
If an admissible edge is available the preﬂowpush algorithm can push ﬂow on
that edge. If we return to our max ﬂow problem in Figure 6.7 it is obvious that
no admissible edges exist. So what should we do if there are no more admissible
edges and a node v is still active?
6.2 The PreﬂowPush Method 109
The answer is to relabel. Relabel means that we take an active node and assign
it a new label that should result in generating at least one admissible edge. This
is done by increasing the value of the label to the minimum of any label on an
outgoing edge from the node plus one.
Algorithm 12: The Relabel operation
consider an active vertex i, for which no edge (i, j) ∈ E
x
with d
i
= d
j
+ 1 1
d
i
← min¦d
j
+ 1 : (i, j) ∈ E
x
¦ 2
Notice that this will not violate the validity of the labeling (check for yourself!).
Now we can put the elements together and get the preﬂowpush algorithm.
Algorithm 13: The PreﬂowPush algorithm
initialize x, d 1
while x is not a ﬂow do 2
select an active node i 3
if no admissible arc (i, j) out of i exist then 4
relabel i 5
while there exists an admissible arc (i, j) do 6
push on (i, j) 7
If we return to our example from before we may choose any active node and
perform a relabel. Initially the nodes 1 and 3 are active, and we arbitrarily
choose node 1. As no admissible edges exist we relabel to 1, and now the single
unit of surplus in node 1 is pushed along (1, 2) (we could also have chosen edge
(1, 4)). Now nodes 2 and 3 are active nodes. We choose node 3 and need to
relabel. Now we can push the surplus of 7 along the edges (3, 2) (3 units), (3, s)
(1 unit) and (3, 4) (3 units). Now node 3 is no longer active but 4 is. First
we need to relabel to get a label of value 1 and push 2 units to s. As node 2
remains active we choose it again and following need to relabel to 2 and now
we can push one unit “back” to node 3. In the next iteration nodes 2 and 3 are
active and will be chosen by the preﬂow push algorithm for further operations.
Analysis of the running time of the preﬂowpush algorithm is based on an anal
ysis of the number of saturating pushes and nonsaturating pushes. It can be
shown that the pushrelabel maximum ﬂow algorithm can be implemented to
run in time O(n
2
m) or O(n
3
).
110 The Max Flow Problem
(a) Relabel and push from 1 (b) Relabel and push from 3
(c) Relabel and push from 4 (d) Relabel and push from 4
(again)
Figure 6.8: Initial iterations of the preﬂow push algorithm on our example graph.
6.3 Applications of the max ﬂow problem
6.3.1 Maximum Bipartite Matching
6.4 Supplementary Notes
6.5 Exercises
1. Consider the max ﬂow problem deﬁned by the graph G = (V, E, u) in
Figure 6.9 with node 1 as the source and node 8 as the sink. Solve the
problem using the augmenting path algorithm and the preﬂow push algo
rithm. Also ﬁnd a minimum cut.
In order to verify your results enter the integer programming model into
your favourite MIP solver.
2. Let G be a directed graph with two special vertices s and t. Any two
directed paths from s to t are called vertexdisjoint if they do not share
any vertices other than s and t. Prove that the maximum number of
directed vertexdisjoint paths from s to t is equal to the minimum number
6.5 Exercises 111
Figure 6.9: Example max ﬂow problem
of vertices whose removal ensures that there are no directed paths from s
to t.
3. Let P range over the set of s–t paths for two vertices s, t of a given graph.
Let C range over cuts that separate s and t. Then show that
max
P
min
e∈P
c
e
= min
C
max
e∈C
c
e
where c
e
is the capacity of edge e.
112 The Max Flow Problem
Bibliography
[1] William J. Cook, William H. Cunningham, William R. Pulleyblank and
Alexander Schrijver, Combinatorial Optimization, Wiley Interscience, 1998.
114 BIBLIOGRAPHY
Chapter 7
The Minimum Cost Flow
Problem
Recall the deﬁnition of f
x
(i) from the maximum ﬂow problem:
f
x
(i) =
¸
(j,i)∈E
x
ji
−
¸
(i,j)∈E
x
ij
In the minimum cost ﬂow problem a feasible ﬂow x of minimum cost has to be
determined. Feasibility conditions remains unchanged from the maximum ﬂow
problem. The minimum cost ﬂow problem can be deﬁned as:
min
¸
e∈E
w
e
x
e
(7.1)
s.t. f
x
(i) = b
i
i ∈ V
0 ≤ x
ij
≤ u
ij
(i, j) ∈ E
In the minimum cost ﬂow problem we consider a directed graph G = (V, E)
where each edge e has a capacity u
e
∈ {
+
∪¦∞¦ and a unit transportation
cost w
e
∈ {. Each vertex i has a demand b
i
∈ {. If b
i
≥ 0 then i is called a
sink, and if b
i
< 0 then i is called a source. We assume that b
V
=
¸
i∈V
b
i
= 0,
116 The Minimum Cost Flow Problem
that is, inﬂow in the graph is equal to outﬂow. This network can be deﬁned as
G = (V, E, w, u, b).
An example of a minimum cost ﬂow problem is shown in Figure 7.1. Vertices 1
and 2 are sources and 3, 4 and 6 are sinks. The objective is to direct the ﬂow
from 1 and 2 in the most cost eﬃcient way to 3, 4 and 6.
1 4
3
2 5
6
4,10
4,1
1,7
2,5
3,5
6,11
3,6
2,4
10
5
0
10
4
1
4,1
(a) Graph
1 4
3
2 5
6
4,10,10
4,1,0
1,7,5
2,5,5
3,5,0
6,11,0
3,6,4
2,4,4
10
5
0
10
4
1
4,1,0
(b) Feasible ﬂow
Figure 7.1: Example of a minimum cost ﬂow problem. Labels on edges on the
graph shown on Figure (a) are represented by w
e
, u
e
. Figure (b) shows a feasible
ﬂow (bold edges). Labels on the edges represent w
e
, u
e
, x
e
. The cost of the ﬂow
is 75
The minimum cost ﬂow problem is a very versatile problem. Among others, the
following ﬂow problems can be described as special cases, we will just note the
following:
1. The Transportation Problem
2. The Shortest Path Problem
3. The Max Flow Problem
A class of minimum cost ﬂow problems arises from transportation problems.
In a transportation problem G = (V, E) is a bipartite graph with partitions
¦I, J¦, and we are given positive numbers a
i
, i ∈ I and b
j
, j ∈ J, as well as costs
w
ij
, (i, j) ∈ E. The set I deﬁnes a set of factories and J a set of warehouses.
Given production quantities (a
i
), demands (b
j
), and unit transportation cost
from i to j (w
ij
) the objective is to ﬁnd the cheapest transportation of products
117
to the warehouses. The transportation can be stated as:
min
¸
e∈E
w
e
x
e
(7.2)
s.t.
¸
i∈I,(i,j)∈E
x
ij
= b
j
j ∈ J
¸
j∈J,(i,j)∈E
x
ij
= a
i
i ∈ I
x
e
≥ 0 e ∈ E
Adding constraints of the form x
e
≤ u
e
, where u
e
is ﬁnite, to (7.2), the problem
is called a capacitated transportation problem. In either case, multiplying
each of the second group of equations by −1, and setting b
i
= −a
i
for i ∈ I, we
get a problem on the same form as minimum cost ﬂow problem.
The shortest path problem can also be considered as a special case of the min
imum cost ﬂow problem. Deﬁne the root as a source and the remaining n − 1
other vertices as sinks. Each of the sinks will have d
i
= 1, while the root will
have d
r
= n − 1. This represents n − 1 paths are “ﬂowing” out of the root
and exactly one is ending in each of the other vertices. The unit transportation
cost will be equal to the cost on the edges and we have stated the shortest path
problem as a minimum cost ﬂow problem.
The maximum ﬂow problem can be modelled as a minimum cost ﬂow problem
by the following trick: Given a directed graph with capacities on the edges, a
source and a sink we deﬁne b
i
= 0 for all vertices, and set the unit transportation
cost to 0 for each of the edges. Furthermore we introduce a new edge from the
sink back to the source with unlimited capacity (at least larger than or equal
to
¸
e∈E
u
e
) and a unit cost of −1. Now in order to minimize cost the model
will try to maximize the ﬂow from sink to source which at the same time will
maximize the ﬂow from the source through the graph to the sink as these ﬂows
must be in balance. This is illustrated in Figure 7.2.
As several of the important ﬂow optimization problems can be stated as special
cases of the minimum cost ﬂow problem it is interesting to come up with an
eﬀective solution approach for the problem. Let us take a look at the constraint
matrix of the minimum cost ﬂow problem.
118 The Minimum Cost Flow Problem
1
4
3
2
5
1
3
5
4
3
4
(a) The graph of a maximum ﬂow
problem
1
4
3
2
5
0,1
0,3
0,5
0,4
0,3
0,4
0
0
0 0
0
1,M
(b) The graph “restated” as a mini
mum cost ﬂow problem
Figure 7.2: Figure (a) shows a maximum ﬂow problem with labels representing
u
e
. In Figure (b) the same maximum ﬂow problem has been reformulated as
a minimum cost ﬂow problem. Labels on the edges are w
e
, u
e
. Note that M
should be chosen to be at least 20
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
x
e
1
x
e
2
. . . x
ij
. . . x
e
m
w
e
1
w
e
2
. . . w
ij
. . . w
e
m
1 −1 . . . . . . . = b
1
2 . . . . . . . . . . = b
2
.
.
. . . . . . . . . . . =
.
.
.
i 1 . . . . −1 . . . . = b
i
.
.
. . . . . . . . . . . =
j . . . . . 1 . . . . = b
j
.
.
. . . . . . . . . . . =
.
.
.
n . . . . . . . . . . = b
n
e
1
−1 ≥ −u
1
e
2
−1 ≥ −u
2
.
.
. . . . . . . ≥
.
.
.
(i, j) −1 ≥ −u
ij
.
.
. . . . . . . ≥
.
.
.
e
m
−1 ≥ −u
m
¸
Proposition 3.2 from [2] gives a suﬃcient condition for a matrix being totally
unimodular. It states that a matrix A is totally unimodular if:
7.1 Optimality conditions 119
1. a
ij
∈ ¦+1, −1, 0¦ for all i, j.
2. Each column contains at most two nonzero coeﬃcients (
¸
m
i=1
[a
ij
[ ≤ 2).
3. There exists a partition ¦M
1
, M
2
¦ of the set M of rows such that each
column j containing two nonzeros satisﬁes
¸
i∈M
1
a
ij
−
¸
i∈M
2
a
ij
= 0.
Consider the matrix without the x
e
≤ u
e
constraints. If we let M
1
contain the
remaining constraints and M
2
= ∅ the conditions of proposition 3.2 is fulﬁlled.
The matrix is therefore totally unimodular. It is let to the reader to prove that
the entire matrix with the x
e
≤ u
e
constraints also is totally unimodular.
A consequence is the the minimum cost ﬂow problem can be solved by simply
removing integrality constraints from the model and solve it using an LP solver
like eg. CPLEX.
In order to come up with a more eﬃcient approach we take a look at optimality
conditions for the minimum cost ﬂow problem.
7.1 Optimality conditions
Having stated the primal LP of the minimum cost ﬂow problem in (7.1) it is
a straightforward task to formulate the dual LP. The dual LP has two sets
of variables: the dual variables corresponding to the ﬂow balance equations
are denoted y
i
, i ∈ V , and those corresponding to the capacity constraints are
denoted z
ij
, (i, j) ∈ E. The dual problem is:
max
¸
i∈V
b
i
y
i
−
¸
(i,j)∈E
u
ij
z
ij
(7.3)
s.t. −y
i
+y
j
−z
ij
≤ w
ij
⇔ (i, j) ∈ E
−w
ij
−y
i
+y
j
≤ z
ij
(i, j) ∈ E
z
ij
≥ 0 (i, j) ∈ E
Deﬁne the reduced cost ¯ w
ij
of an edge (i, j) as ¯ w
ij
= w
ij
+ y
i
− y
j
. Hence,
the constraint −w
ij
−y
i
+y
j
≤ z
ij
is equivalent to
−¯ w
ij
≤ z
ij
120 The Minimum Cost Flow Problem
When is the set of feasible solutions x, y and z optimal? Let us derive the opti
mality conditions based on the primal and dual LP. Consider the two diﬀerent
cases:
1. If u
e
= +∞ (i.e. no capacity constraints for the edges) then z
e
must be
0 and hence just ¯ w
e
≥ 0 has to hold. The constraint now corresponds to
the primal optimality condition for the LP.
2. If u
e
< ∞ then z
e
≥ 0 and z
e
≥ −¯ w
e
must hold. As z have negative
coeﬃcients in the objective function of the dual problem the best choice
for z is as small as possible, that is, z
e
= max¦0, −¯ w
e
¦. Therefore, the
optimal value of z
e
is uniquely determined from the other variables, and
hence z
e
is “unnecessary” in the dual problem.
Complementary slackness conditions is used to obtain optimality conditions.
Generally, complementary slackness states that 1) each primal variable times
the corresponding dual slack must equal 0, 2) and each dual variable times the
corresponding primal slack must equal 0 in optimum). In our case this results
in:
x
e
> 0 ⇒ −¯ w
e
= z
e
= max(0, −¯ w
e
)
i.e. x
e
> 0 ⇒ −¯ w
e
≥ 0, that is ¯ w
e
> 0 ⇒ x
e
= 0
and
z
e
> 0 ⇒ x
e
= u
e
i.e. − ¯ w
e
> 0 ⇒ x
e
= u
e
, that is ¯ w
e
< 0 ⇒ x
e
= u
e
Summing up: A primal feasible ﬂow satisfying demands respecting the capacity
constraints is optimal if and only if there exists a dual solution y
e
, e ∈ E such
that for all e ∈ E it holds that:
¯ w
e
< 0 implies x
e
= u
e
(= ∞)
¯ w
e
> 0 implies x
e
= 0
All pairs (x, y) of optimal solutions satisfy these conditions. The condition can
be used to deﬁne a solution approach for the minimum cost ﬂow problem.
To present the approach we initially deﬁne a residual graph. For a legal ﬂow x
in G, the residual graph is (like for maximum ﬂow problem) a graph, in which
the edges indicate how ﬂow excess can be moved in G given that the ﬂow
x already is present. The only diﬀerence to the residual graph of the maximum
ﬂow problem is that each edge additionally has a cost assigned. The residual
graph G
x
for G wrt. x is deﬁned by V (G
x
) = V and E(G
x
) = E
x
= ¦(i, j) ∈
E : (i, j) ∈ E ∧ x
ij
< u
ij
¦ ∪ ¦(i, j) : (j, i) ∈ E ∧ x
ji
> 0¦.
7.1 Optimality conditions 121
a q
p
b
a q
p
b
2,4,2
1,3,0
4,5,5
4,3,0
1,6,1
demand=2
capacity=7
demand=6
capacity=1
4
−4
1
1
−1
(G,c,u,x)
(G , c’)
x
2
−2
Figure 7.3: G = (V, E, c, u, x) and its residual graph
The unit transportation cost w
′
ij
of an edge with x
ij
< u
ij
is w
ij
, while w
′
ij
of
an edge with x
ji
> 0 is −w
ji
. The reason for this deﬁnition can be explained by
considering an xincrementing path P. P contains both forward and backward
edges. Suppose we now send one unit of ﬂow from one end to the other of the
path. Then x
e
will be raised by one on forward edges and lowered by one on
backward edges. Subsequently the cost must reﬂect the change, hence we have
w
′
ij
= w
ij
for forward edges and w
′
ij
= −w
ji
for backward edges.
Note that a dicircuit with negative cost in G
x
corresponds to a negative cost
circuit in G, if costs are added for forward edges and subtracted for backward
edges, see Figure 7.3.
Note that if a set of potentials y
i
, i ∈ V are given, and the cost of a circuit wrt.
the reduced costs for the edges ( ¯ w
ij
= w
ij
+ y
i
− y
j
) are calculated, the cost
remains the same as the original costs as the potentials are “telescoped” to 0.
A primal feasible ﬂow satisfying demands and respecting the capacity constraints
is optimal if and only if an xaugmenting circuit with negative wcost (or
negative ¯ wcost, there is no diﬀerence), does not exist.
This is the idea behind the identiﬁcation of optimal solutions in the network
simplex algorithm that will be presented later.
An initial and simple idea is to ﬁnd a feasible ﬂow and then in each iteration test
122 The Minimum Cost Flow Problem
for the existence a negative circuit by trying to ﬁnd a negative cost circuit in the
residual graph. This test could be done using the BellmannFord algorithm. It
has a running time of O(mn). This algorithm is called the augmenting circuit
algorithm
Algorithm 14: The Augmenting Circuit Algorithm
Find a feasible ﬂow x 1
while there exists an augmenting circuit do 2
Find an augmenting circuit C 3
if C has no reverse edge, and no forward edge of ﬁnite capacity then 4
STOP 5
Augment x on C 6
When performing a complete O(nm) computation in every iteration it is paramount
to keep the number of iterations low. Several implementations of the augment
ing circuit algorithm exist. Most often these algorithms does not perform as
well as the most eﬀective methods. We will therefore turn our attention to one
of the methods that is currently most used, namely the Network Simplex
Method.
7.2 Network Simplex for The Transshipment prob
lem
In order to get a better understanding of the network simplex approach we ini
tially look at the minimum cost ﬂow problem without capacities. This problem
is also called the transshipment problem. Given a directed graph G = (V, E)
with demands d
i
for each vertex i ∈ V and unit transportation cost w
e
for each
edge e ∈ E ﬁnd the minimum cost ﬂow that satisﬁes the demand constraints.
Assume G is connected. Otherwise the problem simply reduces to a transship
ment problem for each of the components in G. A tree in a directed graph
is a set T ⊆ E, such that, T is a tree in the underlying undirected graph. A
tree solution for the transshipmentproblem given by G = (V, E), demands
b
i
, i ∈ V and costs w
e
, e ∈ E, is a ﬂow x ∈ {
E
satisfying:
∀i ∈ V : f
x
(i) = b
i
∀e / ∈ T : x
e
= 0
So there is no ﬂow on edges not in T, and all demands are satisﬁed. Note that
7.2 Network Simplex for The Transshipment problem 123
1
4
3
2
5
1
3
5
4
3
4
6
7 8
9
6
8
8
8
7 7
1
3
h
R P
h
h
T
T
0
4
4
0
0
5
0
5
2
Figure 7.4: Let vertex 1 be the root vertex. Consider edge h as shown it deﬁnes
two sets R
T
h
and P
T
h
tree solutions are normally not feasible, since edges with negative ﬂow may exist
(cf. basic solutions and feasible basic solutions in linear programming).
Initially we may ask whether there exists a tree solution for any tree T. The
answer is yes, and can be seen by the following constructive argument. Select
a vertex r and call it the root. It can be any vertex. Consider an edge h in
T. The edge h partitions V into two sets, one containing r denoted R
T
h
and the
remainder P
T
h
= V ` R
T
h
(see Figure 7.4). If h starts in R
T
h
and (consequently)
ends in P
T
h
then
x
h
=
¸
i∈P
T
h
b
i
= −
¸
i∈R
T
h
b
i
. (7.4)
Alternatively if h starts in P
T
h
and ends in R
T
h
then
x
h
=
¸
i∈R
T
h
b
i
. (7.5)
So a tree T uniquely deﬁnes a tree solution. In the example in Figure 7.4 x
h
= 8.
It is essential to note that any feasible solution can be transformed into a tree
solution. Consequently this will allow us to only look at tree solutions.
Theorem 7.1 If the transshipment problem deﬁned by G = (V, E, w, b) has a
feasible solution, then it also has a feasible tree solution.
124 The Minimum Cost Flow Problem
Proof: Let x be a feasible solution in G. If x is not a tree solution then there
must exist a circuit C where all edges have a positive ﬂow. We can assume that C
has at least one backward edge. Let α = min¦x
e
: e ∈ C, e is a backward edge¦.
Now replace x
e
by x
e
+ α if e is a forward edge in C, and by x
e
− α if e is a
backward edge in C. The new x is feasible, and moreover, has one circuit less
that the initial ﬂow. If the ﬂow is a tree solution we are done, if not, we continue
the procedure. Note that each time we make an “update” to the ﬂow, at least
one edge with get ﬂow equal to 0 and that we never put ﬂow onto an edge with
a ﬂow of 0. Therefore we must terminate after a ﬁnite number of iterations. △
If the ﬂow x is an optimal solution and there exists a circuit C with positive
ﬂow, C must have zero cost. This means that the approach used in the proof
above can be used to generate a new optimal solution in which fewer edges have
a positive ﬂow. So it can be proved that:
Theorem 7.2 If the transshipment problem deﬁned by G = (V, E, w, b) has an
optimal solution then it also has an optimal tree solution.
The results of these two theorems are quite powerful. From now on we can
restrict ourselves to look only on tree solutions. Consequently, the network
simplex method moves from tree solution to tree solution using negative cost
circuits C
T
e
consisting of tree edges and exactly one nontree edge e (think of
the tree edges as basic variables and the nontree edge as the nonbasic variable
of a simplex iteration). Remember that the nontree edge e uniquely deﬁnes a
circuit with T. Each edge e will deﬁne a unique circuit with the characteristics:
• C
T
e
⊆ T ∪ ¦e¦
• e is an forward edge in C
T
e
This is illustrated in Figure 7.5.
Consider the vector of potentials y ∈ {
V
. Set the vertex potential y
i
, i ∈ V
to the cost of the (simple) path in T from r to i (counted with sign: plus for a
forward edge, minus for a backward edge). It now holds that for all i, j ∈ V , the
cost of the path from i to j in T equals y
j
−y
i
(why ?). But then the reduced
costs ¯ w
ij
(deﬁned by w
ij
+y
i
−y
j
) satisfy:
∀e ∈ T : ¯ w
e
= 0
∀e / ∈ T : ¯ w
e
= w(C
T
e
)
If T determines a feasible tree solution x and C
T
e
has nonnegative cost ∀e / ∈ T
7.2 Network Simplex for The Transshipment problem 125
1
4
3
2
6
5 7
8
Figure 7.5: An example of the deﬁnition of the unique circuit C
T
e
by the edge e.
The edge (6, 5) deﬁnes the circuit '3, 4, 6, 5, 3` and the anti clockwise orientation
(i.e. ¯ w
e
≥ 0), then x is optimal. Now the network simplex algorithm for the
transshipment problem can be stated:
Algorithm 15: The Network Simplex Algorithm for the Transshipment
Problem
ﬁnd a tree T with a corresponding feasible tree solution x 1
select an r ∈ V as root for T 2
compute y
i
as the length of the r −i–path in T, costs counted with sign 3
while ∃(i, j) : ¯ w
ij
= w
ij
+y
i
−y
j
< 0 do 4
Select an edge with ¯ w
ij
< 0 5
if all edges in C
T
e
are forward edges then 6
STOP /* the problem is ‘‘unbounded’’ */ 7
ﬁnd θ = min¦x
j
: j backward in C
T
e
¦ and an edge h with x
h
= θ 8
increase x with θ along C
T
e
9
T ← (T ∪ ¦e¦) ` ¦h¦ 10
update y 11
Remember as ﬂow is increased in the algorithm it is done with respect to the
orientation deﬁned by the edge that deﬁnes the circuit, that is, increase ﬂow in
forward edges and decrease ﬂow in backward edges.
Testing whether T satisﬁes the optimality conditions is relatively easy. We can
compute y in time O(n) and thereafter ¯ w can be computed in O(m). With
the simple operations involved and the modest time complexity this is certainly
going to be faster than running BellmanFords algorithm.
An issue that we have not touched upon is how to construct the initial tree. One
126 The Minimum Cost Flow Problem
1
2
6
5 4
1
3
10
5
5 0
5 2
3
4
1 1
10
0
(a) Initial graph
1
2,5
6
5,10 4,10
1
3,5
10
5
5 0
5 2
3
4
1 1,0
10
0
(b) Initial tree solution
1
2,15
6
5 4,10
1,10
3,5
10
5
5 0
5 2
3
4
1 1,10
10
0
(c) Step 1
1
2,15
6
5 4
1
3,15
10
5
5 0
5 2
3
4
1,10 1,10
10
0
(d) Step 2
Figure 7.6: The solution of an transshipment problem using the network simplex
algorithm. The labels on the edges are u
e
, x
e
. Bold edges identify the tree
solution
approach is the following: We can use the tree T whose edges are (r, i) where
i ∈ V ` ¦r¦ and b
i
≥ 0 and edges (i, r) where i ∈ V ` ¦r¦ and b
i
< 0. Not all
these edges may exist. If they do not exist we add them to G but assign them
a large enough cost to secure they will not be part of an optimal solution.
Let us demonstrate the network simplex algorithm by the example in Figure 7.6.
First, we need to ﬁnd an initial feasible tree solution. Having determined the
tree as T = ¦(2, 1), (1, 3), (3, 6), (2, 4), (4, 5)¦ we let vertex 2 be the root vertex.
Now the feasible solution wrt. tree solution is uniquely deﬁned, and can be
computed by (7.4) and (7.5) deﬁned earlier. So we get the tree solution in the
graph in the upper right of Figure 7.6. The ﬂow corresponds to a value of 115.
7.3 Network Simplex Algorithm for the Minimum Cost Flow Problem 127
Now we compute the potential vector y = (1, 0, 3, 5, 9, 6). Let us check if we have
a negative cost circuit. The edges (3, 4) and (6, 5) both uniquely determine two
circuits with cost:
¯ w
34
= y
3
+w
34
−y
4
= 3 + 1 −5 = −1
¯ w
65
= y
6
+w
65
−y
5
= 6 + 1 −9 = −2
So both edges deﬁne a negative cost circuit. We can choose either one of them.
Let us select the circuit C
1
deﬁned by (3, 4). Every time we push one unit of
ﬂow around the circuit the ﬂow on all edges but (2, 4) in C
1
increase by one,
while the ﬂow decreases by one on the (2, 4) edge. As the current ﬂow is 10 on
(2, 4) we can increase the ﬂow on the other edges by 10. (2, 4) is discarded from
the tree solution and (3, 4) is added.
The resulting ﬂow is shown in the lower left graph. Here the ﬂow has a value of
105. We update y and get (1, 0, 3, 4, 8, 6). Now we check for potential negative
cost circuits:
¯ w
24
= y
2
+w
24
−y
4
= 0 + 5 −4 = 1
¯ w
65
= y
6
+w
65
−y
5
= 6 + 1 −8 = −1
Naturally edge (2, 4) cannot be a candidate for a negative cost circuit as we have
just removed it from the tree solution, but the edge (6, 5) deﬁnes a negative
cycle C
2
. In C
2
the forward edges are (6, 5) and (3, 6), while (4, 5) and (3, 4)
are backward edges. The backward edges both have a ﬂow of 10 so we decrease
both to zero, while ﬂows on the forward edges are increased by 10. One of the
backward edges must leave the tree solution. The choice is really arbitrary, we
will remove (4, 5) from the tree solution and add (6, 5). This gets us the solution
in the lower right graph. With the updated y being (1, 0, 3, 4, 7, 6) there are no
more negative cost circuits and the algorithm terminates. The optimum ﬂow
has a value of 95.
7.3 Network Simplex Algorithm for the Mini
mum Cost Flow Problem
We now consider the general minimum cost ﬂow problem given by the network
G = (V, E) with demands b
i
, i ∈ V , capacities u
e
, e ∈ E and costs w
e
, e ∈
E. In general, the algorithm for the general case of the minimum cost ﬂow
problem is a straightforward extension of the network simplex algorithm for
the transshipment problem as described in the previous section. It can also be
viewed as an interpretation of the bounded variable simplex method of linear
programming.
128 The Minimum Cost Flow Problem
Remember the optimality conditions for a feasible ﬂow x ∈ {
E
:
¯ w
e
< 0 ⇒ x
e
= u
e
(= ∞)
¯ w
e
> 0 ⇒ x
e
= 0
The concept of a tree solution is extended to capacity constrained problems. A
nontree edge e may now have ﬂow either zero or u
e
. E ` T is hence partitioned
into two sets of edges, L and U. Edges in L must have ﬂow zero and edges in
U must have ﬂow equal to their capacity. The tree solution x must satisfy:
∀i ∈ V : f
x
(i) = b
i
∀e ∈ L : x
e
= 0
∀e ∈ U : x
e
= u
e
As before, the tree solution for T is unique: Select a vertex r called the root of
T. Consider an edge h in T. Again h partitions V into two: a part containing
r denoted R
T
h
and a remainder P
T
h
= V ` R
T
h
.
Consider now the modiﬁed demand B(P
T
h
) in P
T
h
given that a tree solution is
sought:
B(P
T
h
) =
¸
i∈P
T
h
b
i
−
¸
{(i,j)∈U:i∈R
T
h
,j∈P
T
h
}
u
ij
+
¸
{(i,j)∈U:i∈P
T
h
,j∈R
T
h
}
u
ij
If h starts in R
T
h
and ends in P
T
h
then set
x
h
= B(P
T
h
)
and if h starts in P
T
h
and ends in R
T
h
then set
x
h
= −B(P
T
h
).
Now the algorithm for the general minimum cost ﬂow problem is based on the
following theorem, that will be presented without proof:
Theorem 7.3 If G = (V, E, w, u, b) has a feasible solution, then it has a feasible
tree solution. If it has an optimal solution, then it has an optimal tree solution.
7.3 Network Simplex Algorithm for the Minimum Cost Flow Problem 129
The network simplex algorithm will move from tree solution to tree solution only
using circuits formed by adding a single edge to T. However, if the nontree edge
being consider has ﬂow equal to capacity, then we consider sending ﬂow in the
opposite direction. We now search for either a nontree edge e with x
e
= 0 and
negative reduced cost or a nontree edges e with x
e
= u
e
and positive reduced
cost. Given e and T, L and , U, the corresponding circuit is denoted C
T,L,U
e
.
This circuit the satisﬁes:
• Each edge of C
T,L,U
e
is an element if T ∪ ¦e¦.
• If e ∈ L it is a forward edge of C
T,L,U
e
, and otherwise it is a backward
edge.
With ﬁnite capacities it is not so easy to come up with an initial feasible tree
solution. One approach is to add a “super source” ¯ r vertex and a “super sink”
vertex ¯ s to G. Now edges from ¯ r to a source vertex i get capacity equal to
the absolute value of the demand of i, that is, u
¯ ri
= −b
i
and correspondingly
we get u
i¯ s
= d
i
for a sink vertex i. The capacity on edges from are set to
the supply/demand of that node. On the “extended” G we solve a maximum
ﬂow problem. This will constitute a feasible solution and will present us with a
feasible tree solution.
In this, the ﬂow must be increased in forward edges respecting capacity con
straints and decreased in backward edges respect the nonnegativity constraints.
We can now state the algorithm for the minimum cost ﬂow problem.
Let us ﬁnish of the discussion on the network simplex by demonstrating it on
an example. Figure 7.7 show the iterations.
In this case it is easy to come up with a feasible tree solution. Both nontree
edges (3, 4) and (6, 5) are in L as their ﬂow is equal to zero. Now we compute
y = (1, 0, 3, 5, 6, 9) and for each of the two nontree edges we get:
¯ w
34
= y
3
+w
34
−y
4
= 3 + 1 −5 = −1
¯ w
65
= y
6
+w
65
−y
5
= 6 + 1 −9 = −2
Both will improve the solution. Let us take (3, 4). This deﬁnes a unique cycle
with forward edges (3, 4), (2, 1) and (1, 3) and (2, 4) being a backward edges.
The bottleneck in this circuit becomes (1, 3) as the ﬂow on this edge only can
be increased by ﬁve units. We increase the ﬂow on forward edges by ﬁve and
decrease it by ﬁve on (2, 4) which gets us to step 1. Note now that the nontree
edge (1, 3) belongs to U whereas (6, 5) remains in L. We recompute y to be
130 The Minimum Cost Flow Problem
1
2,10
6
5,15 4,15
1,15
3,8
10
5
5 0
5 2
3
4
1,15 1,15
10
0
(a) Initial graph
1
2,10,5
6
5,15,10 4,15,10
1,15,0
3,8,5
10
5
5 0
5 2
3
4
1,15,0 1,15,0
10
0
(b) Initial tree solution
1
2,10,10
6
5,15,5 4,15,10
1,15,5
3,8,5
10
5
5 0
5 2
3
4
1,15,0 1,15,5
10
0
(c) Step 1
1
2,10,10
6
5,15,5 4,15,7
1,15,2
3,8,8
10
5
5 0
5 2
3
4
1,15,3 1,15,5
10
0
(d) Step 2
Figure 7.7: An example of the use of the network simplex algorithm. Labels on
the edges represent w
e
, u
e
respectively w
e
, u
e
, x
e
. Bold edges identify the tree
solution
7.3 Network Simplex Algorithm for the Minimum Cost Flow Problem 131
Algorithm 16: The Network Simplex Algorithm for the Minimum Cost
Flow Problem
Find a tree T with a corresponding feasible tree solution x 1
Select a vertex r ∈ V as root for T 2
Compute y
i
as the length of the r −i–path in T, costs counted with sign 3
(plus for forward edges, minus for backward edges)
while ∃(i, j) ∈ L s.t ¯ w
ij
= w
ij
+y
i
−y
j
< 0) ∨ (∃(i, j) ∈ U s.t ¯ w
ij
= 4
w
ij
+y
i
−y
j
> 0) do
Select one of these 5
if no edge in C
T,L,U
e
is backward and no forward edge has limited 6
capacity then
STOP 7
/* the problem is unbounded */
ﬁnd θ
1
← min¦x
j
: j backward in C
T,L,U
e
¦, θ
2
← min¦u
j
−x
j
: j 8
forward in C
T,L,U
e
¦, and an edge h giving rise to θ ← min¦θ
1
, θ
2
¦
increase x by θ along C
T,L,U
e
9
T ← (T ∪ ¦e¦) ` ¦h¦ 10
Update L, U by removing e and inserting h in the relevant one of L 11
and U
Update y 12
(1, 0, 4, 5, 7, 9). For the two nontree edges we no get:
¯ w
13
= y
1
+w
13
−y
3
= 1 + 2 −4 = −1
¯ w
65
= y
6
+w
65
−y
5
= 7 + 1 −9 = −1
Even though ¯ w
13
is negative we can only choose (6, 5) because (1, 3) ∈ U. We
choose (6, 5) and in the circuit deﬁned by that edge the ﬂow can be increase by
at most three units. The edge that deﬁnes the bottleneck is edge (3, 6) which
is included in U, whereas (1, 3) is included in T. This gets us to step 2 in the
ﬁgure.
¯ w
13
= y
1
+w
13
−y
3
= 1 + 2 −4 = −1
¯ w
36
= y
3
+w
36
−y
6
= 4 + 3 −8 = −1
Because both nontree edges are in U and have negative reduced cost the tree
solution is optimal.
132 The Minimum Cost Flow Problem
7.4 Supplementary Notes
There is (naturally) a close relationship between the network simplex algorithm
and the simplex algorithm. An indepth discussion of these aspects of the net
work simplex algorithm can be found in [1].
7.5 Exercises
1. ([1])For the minimumcost ﬂow problem of Figure 7.8 determine whether
the indicated vector x is optimal. Numbers on edges are in order w
e
, u
e
, x
e
,
numbers at vertices are demands. If it is optimal, ﬁnd a vector y that
certiﬁes its optimality.
6,3,2 5,2,0
3,5,4
0,4,0
−2,4,3
5,4,4
6,2,2
−2,2,0 2,3,0
3,8,8
2,4,1
−3,2,1
3,2,1
4,6,3
2,3,3 0,3,0
Figure 7.8: Given x, ﬁnd y
2. Bandwidth packing problem. Consider a communications network,
G = (V, E), where there are n vertices and m edges. Each edge has a
bandwidth, b
e
, and unit cost w
e
. A call i is deﬁned by its starting vertex
(s
i
), destination vertex (d
i
), bandwidth demand (w
i
) and revenue (r
i
).
(Bandwidth demands are additive: if calls 1 and 2 both use the same
edge, the total bandwidth requirement is w
1
+ w
2
.) Let N denote the
number of calls, and we seek to maximize proﬁt subject to logical ﬂow
requirements and bandwidth limits. Formulate this as a 01 Integer Linear
Programming Problem.
7.5 Exercises 133
3. Piecewise linear objective function. Suppose that the linear objective
function for the minimum cost ﬂow problem is replaced with a separable
piecewise linear function. Show how to convert this problem to the stan
dard minimum costﬂow problem.
4. The Caterer Problem. ([3]) A caterer has booked his services for the
next T days. He requires r
t
fresh napkins on the t’th day, t = 1, 2, . . . , T.
He sends his soiled napkins to the laundry, which has three speeds of
service f = 1, 2, or 3 days. The faster the service the higher the cost c
f
of laundering a napkin. He can also purchase new napkins at the cost c
o
.
He has an initial stock of s napkins. The caterer wishes to minimize the
total outlay.
(a) Formulate the problem as a network problem.
(b) Solve the problem for the next 14 days. Daily requirement of napkins
is given by Table 7.1. In addition the price of the three diﬀerent
laundry solutions are f = 1 costs 8 Danish kroner, f = 2 costs 4
Danish kroner and f = 3 costs 2 Danish kroner. The unit price of
buying new napkins are 25 Danish kroner.
Day 1 2 3 4 5 6 7
Req. 500 600 300 800 2200 2600 1200
Day 8 9 10 11 12 13 14
Req. 300 1000 400 600 1400 3000 2000
Table 7.1: Daily requirement of napkins for the next 14 days.
5. Flow restrictions. ([3]) Suppose that in a minimum cost ﬂow problem
restrictions are placed on the total ﬂow leaving a node k, i.e.
θ
k
≤
¸
(k,j)∈E
x
kj
≤ θ
k
Show how to modify these restrictions to convert the problem into a stan
dard minimum cost ﬂow problem.
134 The Minimum Cost Flow Problem
Bibliography
[1] William J. Cook, William H. Cunningham, William R. Pulleyblank and
Alexander Schrijver. Combinatorial Optimization. Wiley Interscience, 1998.
[2] Laurence A. Wolsey. Integer Programming. Wiley Interscience, 1998.
[3] George B. Dantzig, Mukund N. Thapa. Linear Programming. Springer,
1997.
136 BIBLIOGRAPHY
Part II
General Integer
Programming
Chapter 8
Dynamic Programming
8.1 The FloydWarshall Algorithm
Another interesting variant of the Shortest Path Problem is the case where we
want to ﬁnd the Shortest Path between all pairs of vertices i, j in the problem.
This can be resolved by repeated use one of our “one source” algorithms from
earlier in this chapter. But there are faster and better ways to solve the “all
pairs” problem.
This problem is relevant in many situations, most often as subproblems to other
problems. One case is routing of vehicles. Having a ﬂeet of vehicles and a set of
customers that needs to be visited often we want to build a set of routes for the
vehicles in order to minimize the total distance driven. In order to achieve this
we need to know the shortest distances between every pair of customers before
we can start to compute the best set of routes.
In the all pairs problem we no longer maintain just a vector of potentials and a
vector of predecessors instead we need to maintain two matrices one for distance
and one for predecessor.
One of the algorithms for the all pairs problem is the FloydWarshall algorithm.
The algorithm is based on dynamic programming and considers “intermediate”
140 Dynamic Programming
vertices of a shortest path. An intermediate vertex of a simple path is any of
the vertices in the path except the ﬁrst and the last.
Let us assume that the vertices of G are V = ¦1, 2, . . . , n¦. The algorithm is
based on an observation that we, based on shortest paths between (i, j) con
taining only vertices in the set ¦1, 2, . . . , k − 1¦, can construct shortest paths
between (i, j) containing only vertices in the set ¦1, 2, . . . , k¦. In this way we
may gradually construct shortest paths based on more and more vertices until
they are based on the entire set of vertices V .
Algorithm 17: FloydWarshall’s algorithm
Data: A distance matrix C for a digraph G = (V, E, w). If the edge
(i, j) ∈ E w
ij
is the distance from i to j, otherwise w
ij
= ∞. w
ii
equals 0 for all i
Result: Two n nmatrices, y and p, containing the length of the
shortest path from i to j resp. the predecessor vertex for j on
the shortest path for all pairs of vertices in ¦1, ..., n¦ ¦1, ..., n¦
y
ij
← w
ij
, p
ij
← i for all (i, j) with w
ij
= ∞, p
ij
← 0 otherwise. 1
for k ← 1 to n do 2
for i ← 1 to n do 3
for j ← 1 to n do 4
if i = k ∧ j = k ∧ y
ij
> y
ik
+y
kj
then 5
y
ij
← y
ik
+y
kj
6
p
ij
← p
kj
7
The time complexity of FloydWarshall’s algorithm is straightforward. The
initialisation sets up the matrices by initialising each entry in each matrix, which
takes O(n
2
). The algorithm has three nested loops each of which is performed
n times. The overall complexity is hence O(n
3
).
Theorem 8.1 (Correctness of FloydWarshall’s Algorithm) Given a connected,
directed graph G = (V, E, w) with length function w : E → R. The Floyd
Warshall algorithm will produce a “shortest path matrix” such that for any two
given vertices i and j y
ij
is the shortest path from i to j.
Proof: This proof is made by induction. Our induction hypothesis is that prior
to iteration k it holds that for i, j ∈ v y
ij
contains length of the shortest path
Q from i to j in G containing only vertices in the vertex set ¦1, ..., k −1¦, and
that p
ij
contains the immediate predecesor of j on Q.
8.1 The FloydWarshall Algorithm 141
This is obviously true after the initialisation. In iteration k, the length of Q is
compared to the length of a path R composed of two subpaths, R1 and R2 (see
Figure 8.1).
R1 is a path from i to k path, that is, R1 = 'i, v
1
, v
2
, . . . , v
s
, k` with “interme
diate vertices” only in ¦1, ..., k −1¦ so v
a
∈ ¦1, ..., k −1¦, and R2 is a path from
k to j path with “intermediate vertices” only in ¦1, ..., k − 1¦. The shorter of
these two is chosen.
contains only vertices from {1,..., k−1}
Q
R2
R1
p[i,j]
j
p[k,j]
k
i
Figure 8.1: Proof of the FloydWarshall algorithm
The shortest path from i to j in G containing only vertices in the vertex set
¦1, ..., k¦ either
1. does not contain k  and hence is the one found in iteration k −1, or
2. contains k  and then can be decomposed into an i, k followed by a k, j
path, each of which, by the induction hypothesis, has been found in iter
ation k −1.
Hence the update ensures the correctness of the induction hypothesis after iter
ation k. △
Finally, we note that the FloydWarshall algorithm can detect negative length
cycles. It namely computes the shortest path from i to i (the diagonal of the W
matrix) and if any of these values are negative it means that there is a negative
cycle in the graph.
142 Dynamic Programming
Chapter 9
Branch and Bound
A large number of realworld planning problems called combinatorial optimiza
tion problems share the following properties: They are optimization problems,
are easy to state, and have a ﬁnite but usually very large number of feasible
solutions. While some of these as e.g. the Shortest Path problem and the Min
imum Spanning Tree problem have polynomial algorithms, the majority of the
problems in addition share the property that no polynomial method for their
solution is known. Examples here are vehicle routing, crew scheduling, and
production planning. All of these problems are ^{hard.
Branch and Bound is by far the most widely used tool for solving large scale
^{hard combinatorial optimization problems. Branch and Bound is, however,
an algorithm paradigm, which has to be ﬁlled out for each speciﬁc problem type,
and numerous choices for each of the components exist. Even then, principles
for the design of eﬃcient Branch and Bound algorithms have emerged over the
years.
In this chapter we review the main principles of Branch and Bound and illus
trate the method and the diﬀerent design issues through three examples: the
symmetric Travelling Salesman Problem, the Graph Partitioning problem, and
the Quadratic Assignment problem.
Solving ^{hard discrete optimization problems to optimality is often an im
144 Branch and Bound
mense job requiring very eﬃcient algorithms, and the Branch and Bound paradigm
is one of the main tools in construction of these. A Branch and Bound algo
rithm searches the complete space of solutions for a given problem for the best
solution. However, explicit enumeration is normally impossible due to the ex
ponentially increasing number of potential solutions. The use of bounds for the
function to be optimized combined with the value of the current best solution
enables the algorithm to search parts of the solution space only implicitly.
At any point during the solution process, the status of the solution with respect
to the search of the solution space is described by a pool of yet unexplored
subset of this and the best solution found so far. Initially only one subset exists,
namely the complete solution space, and the best solution found so far is ∞.
The unexplored subspaces are represented as nodes in a dynamically generated
search tree, which initially only contains the root, and each iteration of a classical
Branch and Bound algorithm processes one such node. The iteration has three
main components: selection of the node to process, bound calculation, and
branching. In Figure 9.1, the initial situation and the ﬁrst step of the process
is illustrated.
The sequence of these may vary according to the strategy chosen for selecting
the next node to process. If the selection of next subproblem is based on the
bound value of the subproblems, then the ﬁrst operation of an iteration after
choosing the node is branching, i.e. subdivision of the solution space of the node
into two or more subspaces to be investigated in a subsequent iteration. For
each of these, it is checked whether the subspace consists of a single solution, in
which case it is compared to the current best solution keeping the best of these.
Otherwise the bounding function for the subspace is calculated and compared
to the current best solution. If it can be established that the subspace cannot
contain the optimal solution, the whole subspace is discarded, else it is stored
in the pool of live nodes together with it’s bound. This is in [2] called the eager
strategy for node evaluation, since bounds are calculated as soon as nodes are
available. The alternative is to start by calculating the bound of the selected
node and then branch on the node if necessary. The nodes created are then
stored together with the bound of the processed node. This strategy is called
lazy and is often used when the next node to be processed is chosen to be a live
node of maximal depth in the search tree.
The search terminates when there is no unexplored parts of the solution space
left, and the optimal solution is then the one recorded as “current best”.
The chapter is organized as follows: In Section 9.1, I go into detail with ter
minology and problem description and give the three examples to be used suc
ceedingly. Section 9.1.1, 9.1.2, and 9.1.3 then treat in detail the algorithmic
components selection, bounding and branching, and Section 9.1.4 brieﬂy com
145
S
S1
S2
S3
S4
S1 S2 S3 S4
S24
(a)
(b)
(c)
S1
S4
S21
S22
S31
S32
*
*
* = does not contain
optimal solution
S1 S2 S3 S4
S21 S22 S23
Figure 9.1: Illustration of the search space of Branch and Bound.
146 Branch and Bound
ments upon methods for generating a good feasible solution prior to the start of
the search. I then describe personal experiences with solving two problems using
parallel Branch and Bound in Section 9.2.1 and 9.2.2, and Section 9.3 discusses
the impact of design decisions on the eﬃciency of the complete algorithm.
9.1 Branch and Bound  terminology and gen
eral description
In the following I consider minimization problems  the case of maximization
problems can be dealt with similarly. The problem is to minimize a function
f(x) of variables (x
1
x
n
) over a region of feasible solutions, S :
z = min¦f(x) : x ∈ S¦
The function f is called the objective function and may be of any type. The set
of feasible solutions is usually determined by general conditions on the variables,
e.g. that these must be nonnegative integers or binary, and special constraints
determining the structure of the feasible set. In many cases, a set of potential
solutions, P, containing S, for which f is still well deﬁned, naturally comes to
mind, and often, a function g(x) deﬁned on S (or P) with the property that
g(x) ≤ f(x) for all x in S (resp. P) arises naturally. Both P and g are very
useful in the Branch and Bound context. Figure 9.2 illustrates the situation
where S and P are intervals of reals.
I will use the terms subproblem to denote a problem derived from the originally
given problem through addition of new constraints. A subproblem hence corre
sponds to a subspace of the original solution space, and the two terms are used
interchangeably and in the context of a search tree interchangeably with the
term node. In order to make the discussions more explicit I use three problems
as examples. The ﬁrst one is one of the most famous combinatorial optimization
problems: the Travelling Salesman problem. The problem arises naturally in
connection with routing of vehicles for delivery and pickup of goods or persons,
but has numerous other applications. A famous and thorough reference is [11].
Example 1: The Symmetric Travelling Salesman problem. In Figure 9.3, a
map over the Danish island Bornholm is given together with a distance table
showing the distances between major cities/tourist attractions. The problem of
a biking tourist, who wants to visit all these major points, is to ﬁnd a tour of
9.1 Branch and Bound  terminology and general description 147
g
S
P
f
Figure 9.2: The relation between the bounding function g and the objective
function f on the sets S and P of feasible and potential solutions of a problem.
A
B
C
D
E
F G
H
C D E F G B A
A
B
C
D
E
F
G
11 24 25 30 29 15 0
11 0 13 20
H
32 37 17
24 13 0 16 30 39 29
H
15
17
22
25 20 16 0 15 23 18 12
30 32 30 15 0 9 23 15
29 37 39 23 9 0 14 21
15 17 29 18 23 14 0 7
15 17 22 12 15 21 7 0
Figure 9.3: The island Bornholm and the distances between interesting sites
minimum length starting and ending in the same city, and visiting each other
city exactly once. Such a tour is called a Hamilton cycle. The problem is called
the symmetric Travelling Salesman problem (TSP) since the table of distances
is symmetric.
In general a symmetric TSP is given by a symmetric n n matrix D of non
negative distances, and the goal is to ﬁnd a Hamilton tour of minimum length.
148 Branch and Bound
In terms of graphs, we consider a complete undirected graph with n vertices K
n
and nonnegative lengths assigned to the edges, and the goal is to determine a
Hamilton tour of minimum length. The problem may also be stated mathemat
ically by using decision variables to describe which edges are to be included in
the tour. We introduce 01 variables x
ij
, 1 ≤ i < j ≤ n, and interpret the value
0 (1 resp.) to mean ”not in tour” (”in tour” resp.) The problem is then
min
n−1
¸
i=1
n
¸
j=i+1
d
ij
x
ij
such that
i−1
¸
k=1
x
ki
+
n
¸
k=i+1
x
ik
= 2, i ∈ ¦1, ..., n¦
¸
i,j∈Z
x
ij
< [Z[ ∅ ⊂ Z ⊂ V
x
ij
∈ ¦0, 1¦, i, j ∈ ¦1, ..., n¦
The ﬁrst set of constraints ensures that for each i exactly two variables corre
sponding to edges incident with i are chosen. Since each edge has two endpoints,
this implies that exactly n variables are allowed to take the value 1. The second
set of constraints consists of the subtour elimination constraints. Each of these
states for a speciﬁc subset Z of V that the number of edges connecting ver
tices in Z has to be less than [Z[ thereby ruling out that these form a subtour.
Unfortunately there are exponentially many of these constraints.
The given constraints determine the set of feasible solutions S. One obvious way
of relaxing this to a set of potential solutions is to relax (i.e. discard) the subtour
elimination constrains. The set of potential solutions P is then the family of all
sets of subtours such that each i belongs to exactly one of the subtours in each
set in the family, cf. Figure 9.4. In Section 9.1.1 another possibility is decribed,
which in a Branch and Bound context turns out to be more appropriate.
A subproblem of a given symmetric TSP is constructed by deciding for a subset
A of the edges of G that these must be included in the tour to be constructed,
while for another subset B the edges are excluded from the tour. Exclusion of
an edge (i, j) is usually modeled by setting c
ij
to ∞, whereas the inclusion of an
9.1 Branch and Bound  terminology and general description 149
A
B
C
D
E
F G
H
Figure 9.4: A potential, but not feasible solution to the biking tourist’s problem
edge can be handled in various ways as e.g. graph contraction. The number of
feasible solutions to the problem is (n−1)!/2, which for n = 50 is appr. 310
62
△
The following descriptions follow [4, 5].
Example 2: The Graph Partitioning Problem. The Graph Partitioning problem
arises in situations, where it is necessary to minimize the number (or weight
of) connections between two parts of a network of prescribed size. We consider
a given weighted, undirected graph G with vertex set V and edge set E , and
a cost function c : E → ^. The problem is to partition V into two disjoint
subsets V
1
and V
2
of equal size such that the sum of costs of edges connecting
vertices belonging to diﬀerent subsets is as small as possible. Figure 9.5 shows
an instance of the problem:
The graph partitioning problem can be formulated as a quadratic integer pro
gramming problem. Deﬁne for each vertex v of the given graph a variable x
v
,
which can attain only the values 0 and 1. A 11 correspondence between parti
tions and assignments of values to all variables now exists: x
v
= 1 (respectively
= 0) if and only if v ∈ V
1
(respectively v ∈ V
2
). The cost of a partition is then
¸
v∈V
1
,u∈V
2
c
uv
x
v
(1 −x
u
).
150 Branch and Bound
A
B
C
D
E
F
G
H
10
1
8
2
6
4
5
3
7
1
5
12
Figure 9.5: A graph partitioning problem and a feasible solution.
A constraint on the number of variables attaining the value 1 is included to
exclude infeasible partitions.
¸
v∈V
x
v
= [V [/2.
The set of feasible solutions S is here the partitions of V into two equalsized
subsets. The natural set P of potential solutions are all partitions of V into two
nonempty subsets.
Initially V
1
and V
2
are empty corresponding to that no variables have yet been
assigned a value. When some of the vertices have been assigned to the sets (the
corresponding variables have been assigned values 1 or 0), a subproblem has
been constructed.
The number of feasible solutions to a GPP with 2n vertices equals the binomial
coeﬃcient C(2n, n). For 2n = 120 the number of feasible solutions is appr.
9.6 10
34
. △
Example 3: The Quadratic Assignment Problem. Here, I consider the Koop
mansBeckman version of the problem, which can informally be stated with
reference to the following practical situation: A company is to decide the as
signment of n of facilities to an equal number of locations and wants to minimize
9.1 Branch and Bound  terminology and general description 151
A
B
C
D
8
2 5
12 11
8
2
1
2
3
4
2 1
2
3 3
Figure 9.6: A Quadratic Assignment problem of size 4.
the total transportation cost. For each pair of facilities (i, j) a ﬂow of commu
nication f
i,j
is known, and for each pair of locations (l, k) the corresponding
distance d
l,k
is known. The transportation cost between facilities i and j, given
that i is assigned to location l and j is assigned to location k, is f
i,j
dl, k,
and the objective of the company is to ﬁnd an assignment minimizing the sum
of all transportation costs. Figure 9.6 shows a small example with 4 facilities
and 4 locations. The assignment of facilities A,B,C, and D on sites 1,2,3, and 4
respectively has a cost of 224.
Each feasible solution corresponds to a permutation of the facilities, and let
ting S denote the group of permutations of n elements, the problem can hence
formally be stated as
min¦
n
¸
i=1
n
¸
j=1
f
i,j
d
π(i),π(j)
: π ∈ S¦.
A set of potential solutions is e.g. obtained by allowing more than one facility
on each location.
Initially no facilities have been placed on a location, and subproblems of the
original problem arise when some but not all facilities have been assigned to
locations.
Again the number of feasible solutions grows exponentially: For a problem with
n facilities to be located, the number of feasible solutions is n!, which for n = 20
is appr. 2.43 10
18
. △
The solution of a problem with a Branch and Bound algorithm is traditionally
152 Branch and Bound
described as a search through a search tree, in which the root node corresponds
to the original problem to be solved, and each other node corresponds to a
subproblem of the original problem. Given a node Q of the tree, the children
of Q are subproblems derived from Q through imposing (usually) a single new
constraint for each subproblem, and the descendants of Qare those subproblems,
which satisfy the same constraints as Qand additionally a number of others. The
leaves correspond to feasible solutions, and for all ^{hard problems, instances
exist with an exponential number of leaves in the search tree. To each node
in the tree a bounding function g associates a real number called the bound for
the node. For leaves the bound equals the value of the corresponding solution,
whereas for internal nodes the value is a lower bound for the value of any solution
in the subspace corresponding to the node. Usually g is required to satisfy the
following three conditions:
1. g(P
i
) ≤ f(P
i
) for all nodes P
i
in the tree
2. g(P
i
) = f(P
i
) for all leaves in the tree
3. g(P
i
) ≥ g(P
j
) if P
j
is the father of P
i
These state that g is a bounding function, which for any leaf agrees with the
objective function, and which provides closer and closer (or rather not worse)
bounds when more information in terms of extra constraints for a subproblem
is added to the problem description.
The search tree is developed dynamically during the search and consists initially
of only the root node. For many problems, a feasible solution to the problem
is produced in advance using a heuristic, and the value hereof is used as the
current best solution (called the incumbent). In each iteration of a Branch
and Bound algorithm, a node is selected for exploration from the pool of live
nodes corresponding to unexplored feasible subproblems using some selection
strategy. If the eager strategy is used, a branching is performed: Two or more
children of the node are constructed through the addition of constraints to the
subproblem of the node. In this way the subspace is subdivided into smaller
subspaces. For each of these the bound for the node is calculated, possibly
with the result of ﬁnding the optimal solution to the subproblem, cf. below. In
case the node corresponds to a feasible solution or the bound is the value of an
optimal solution, the value hereof is compared to the incumbent, and the best
solution and its value are kept. If the bound is no better than the incumbent,
the subproblem is discarded (or fathomed), since no feasible solution of the
subproblem can be better that the incumbent. In case no feasible solutions to
the subproblem exist the subproblem is also fathomed. Otherwise the possibility
of a better solution in the subproblem cannot be ruled out, and the node (with
9.1 Branch and Bound  terminology and general description 153
the bound as part of the information stored) is then joined to the pool of live
subproblems. If the lazy selection strategy is used, the order of bound calculation
and branching is reversed, and the live nodes are stored with the bound of their
father as part of the information. Below, the two algorithms are sketched.
Algorithm 18: Eager Branch and Bound
Data: specify
Result: specify
Incumbent ← ∞ 1
LB(P
0
) ← g(P
0
) 2
Live ← ¦(P
0
, LB(P
0
))¦ 3
repeat 4
Select the node P from Live to be processed 5
Live ← Live ` ¦P¦ 6
Branch on P generating P
1
, ...P
k
7
for 1 ≤ i ≤ k do 8
Bound P
i
: LB(P
i
) := g(P
i
) 9
if LB(P
i
) = f(X) for a feasible solution X and 10
f(X) < Incumbent then
Incumbent ← f(X) 11
Solution ← X 12
Live ← ∅ 13
if LB(P
i
) ≥ Incumbent then 14
fathom P
i
15
else 16
Live := Live ∪¦(P
i
, LB(P
i
))¦ 17
until Live = ∅ 18
OptimalSolution ← Solution 19
OptimalValue ← Incumbent 20
154 Branch and Bound
Algorithm 19: Lazy Branch and Bound
Data: specify
Result: specify
Incumbent ← ∞ 1
Live ← ¦(P
0
, LB(P
0
))¦ 2
repeat 3
Select the node P from Live to be processed 4
Live ← Live ` ¦P¦ 5
Bound P 6
LB(P) ← g(P) 7
if LB(P) = f(X) for a feasible solution X and f(X) < Incumbent 8
then
Incumbent ← f(X) 9
Solution ← X 10
Live ← ∅ 11
if LB(P
i
) ≥ Incumbent then 12
fathom P 13
else 14
Branch on P generating P
1
, ...P
k
15
for 1 ≤ i ≤ k do 16
Live := Live ∪¦(P
i
, LB(P
i
))¦ 17
until Live = ∅ 18
OptimalSolution ← Solution 19
OptimalValue ← Incumbent 20
A Branch and Bound algorithm for a minimization problem hence consists of
three main components:
1. a bounding function providing for a given subspace of the solution space a
lower bound for the best solution value obtainable in the subspace,
2. a strategy for selecting the live solution subspace to be investigated in the
current iteration, and
3. a branching rule to be applied if a subspace after investigation cannot be
discarded, hereby subdividing the subspace considered into two or more
subspaces to be investigated in subsequent iterations.
In the following, I discuss each of these key components brieﬂy.
9.1 Branch and Bound  terminology and general description 155
In addition to these, an initial good feasible solution is normally produced using
a heuristic whenever this is possible in order to facilitate fathoming of nodes as
early as possible. If no such heuristic exists, the initial value of the incumbent
is set to inﬁnity. It should be noted that other methods to fathom solution
subspaces exist, e.g. dominance tests, but these are normally rather problem
speciﬁc and will not be discussed further here. For further reference see [8].
9.1.1 Bounding function
The bounding function is the key component of any Branch and Bound algo
rithm in the sense that a low quality bounding function cannot be compensated
for through good choices of branching and selection strategies. Ideally the value
of a bounding function for a given subproblem should equal the value of the
best feasible solution to the problem, but since obtaining this value is usually
in itself ^{hard, the goal is to come as close as possible using only a limited
amount of computational eﬀort (i.e. in polynomial time), cf. the succeeding
discussion. A bounding function is called strong, if it in general gives values
close to the optimal value for the subproblem bounded, and weak if the values
produced are far from the optimum. One often experiences a trade oﬀ between
quality and time when dealing with bounding functions: The more time spent
on calculating the bound, the better the bound value usually is. It is normally
considered beneﬁcial to use as strong a bounding function as possible in order
to keep the size of the search tree as small as possible.
Bounding functions naturally arise in connection with the set of potential solu
tions P and the function g mentioned in Section 2. Due to the fact that S ⊆ P,
and that g(x) ≤ f(x) on P, the following is easily seen to hold:
min
x∈P
g(x) ≤
min
x∈P
f(x)
min
x∈S
g(x)
≤ min
x∈S
f(x)
If both of P and g exist there are now a choice between three optimization
problems, for each of which the optimal solution will provide a lower bound for
the given objective function. The “skill” here is of course to chose P and/or g
so that one of these is easy to solve and provides tight bounds.
Hence there are two standard ways of converting the ^{hard problem of solv
ing a subproblem to optimality into a {problem of determining a lower bound
for the objective function. The ﬁrst is to use relaxation  leave out some of the
constraints of the original problem thereby enlarging the set of feasible solu
tions. The objective function of the problem is maintained. This corresponds
156 Branch and Bound
to minimizing f over P. If the optimal solution to the relaxed subproblem
satisﬁes all constraints of the original subproblem, it is also optimal for this,
and is hence a candidate for a new incumbent. Otherwise, the value is a lower
bound because the minimization is performed over a larger set of values than
the objective function values for feasible solutions to the original problem. For
e.g. GPP, a relaxation is to drop the constraint that the sizes of V
1
and V
2
are
to be equal.
The other way of obtaining an easy bound calculation problem is to minimize
g over S, i.e. to maintain the feasible region of the problem, but modify the
objective function at the same time ensuring that for all feasible solutions the
modiﬁed function has values less than or equal to the original function. Again
one can be sure that a lower bound results from solving the modiﬁed problem
to optimality, however, it is generally not true that the optimal solution corre
sponding to the modiﬁed objective function is optimal for the original objective
function too. The most trivial and very weak bounding function for a given
minimization problem obtained by modiﬁcation is the sum of the cost incurred
by the variable bindings leading to the subproblem to be bounded. Hence all
feasible solutions for the subproblem are assigned the same value by the modiﬁed
objective function. In GPP this corresponds to the cost on edges connecting
vertices assigned to V
1
in the partial solution with vertices assigned to V
2
in
the partial solution, and leaving out any evaluation of the possible costs be
tween one assigned and one unassigned vertex, and costs between two assigned
vertices. In QAP, an initial and very weak bound is the transportation cost
between facilities already assigned to locations, leaving out the potential costs
of transportation between one unassigned and one assigned, as well as between
two unassigned facilities. Much better bounds can be obtained if these potential
costs are included in the bound, cf. the RoucairolHansen bound for GPP and
the GilmoreLawler bound for QAP as described e.g. in [4, 5].
Combining the two strategies for ﬁnding bounding functions means to minimize
g over P, and at ﬁrst glance this seems weaker than each of those. However,
a parameterized family of lower bounds may result, and ﬁnding the parameter
giving the optimal lower bound may after all create very tight bounds. Bounds
calculated by socalled Lagrangean relaxation are based on this observation 
these bounds are usually very tight but computationally demanding. The TSP
provides a good example hereof.
Example 4: The 1tree bound for symmetric TSP problems. As mentioned,
one way of relaxing the constraints of a symmetric TSP is to allow subtours.
However, the bounds produced this way are rather weak. One alternative is the
1tree relaxation.
9.1 Branch and Bound  terminology and general description 157
Here one ﬁrst identiﬁes a special vertex, “#1”, which may be any vertex of the
graph. “#1” and all edges incident to this are removed from G, and a minimum
spanning tree T
rest
is found for the remaining graph. Then the two shortest
edges e
1
, e
2
incident to “#1” are added to T
rest
producing the 1tree T
one
of G
with respect to “#1”, cf. Figure 9.7.
The total cost of T
one
is a lower bound of the value of an optimum tour. The
argument for this is as follows: First note that a Hamilton tour in G consists
of two edges e
′
1
, e
′
2
and a tree T
′
rest
in the rest of G. Hence the set of Hamilton
tours of G is a subset of the set of 1trees of G. Since e
1
, e
2
are the two shortest
edges incident to “#1” and T
rest
is the minimum spanning tree in the rest of
G, the cost of T
one
is less than or equal the cost of any Hamilton tour.
In case T
one
is a tour, we have found the optimal solution to our subproblem –
otherwise a vertex of degree at least 3 exists and we have to perform a branching.
The 1tree bound can be strengthened using the idea of problem transformation:
Generate a new symmetric TSP problem having the same optimal tour as the
original, for which the 1tree bound is tighter. The idea is that vertices of T
one
with high degree are incident with too many attractive edges, whereas vertices
of degree 1 have too many unattractive edges. Denote by π
i
the degree of vertex
i minus 2: π
i
:= deg(v
i
) −2. Note that the sum over V of the values π equals 0
since T
one
has n edges, and hence the sum of deg(v
i
) equals 2n. Now for each
edge (i, j) we deﬁne the transformed cost c
′
ij
to be c
ij
+ π
i
+ π
j
. Since each
vertex in a Hamilton tour is incident to exactly two edges, the new cost of a
Hamilton tour is equal to the current cost plus two times the sum over V of
the values π. Since the latter is 0, the costs of all tours are unchanged, but the
costs of 1trees in general increase. Hence calculating the 1tree bound for the
transformed problem often gives a better bound, but not necessarily a 1tree,
which is a tour.
The trick may be repeated as many times as one wants, however, for large
instances a tour seldomly results. Hence, there is a tradeoﬀ between time and
strength of bound: should one branch or should one try to get an even stronger
bound than the current one by a problem transformation ? Figure 9.7 (c) shows
the ﬁrst transformation for the problem of Figure 9.7 (b).
9.1.2 Strategy for selecting next subproblem
The strategy for selecting the next live subproblem to investigate usually reﬂects
a trade oﬀ between keeping the number of explored nodes in the search tree low,
and staying within the memory capacity of the computer used. If one always
158 Branch and Bound
A
B
C
D
E
F G
H
(a)
(b)
(c)
C D E F G B A
A
B
C
D
E
F
G
11 24 25 29 0
11 0 13 20 37
24 13 0 16 39
H
15
17
22
25 20 16 0 23 12
29 37 39 23 0 21
15 17
Edge left out by Kruskal’s MST algorithm
22 12 21 0
16
30
19
15
8
16 30 19 15 8
18
18 0
29
31
29
14
8
23
14
29 31 29 14 0 23 14 8
Modified distance matrix:
Cost of 1tree = 97
H
A
B
C
D
E
F G
H
C D E F G B A
A
B
C
D
E
F
G
11 24 25 30 29 15 0
11 0 13 20 32 37 17
24 13 0 16 30 39 29
H
15
17
22
25 20 16 0 15 23 18 12
30 32 30 15 0 9 23 15
29 37 39 23 9 0 14 21
15 17 29 18 23 14 0 7
15 17 22 12 15 21 7 0
H
1tree edge
A
B
C
D
E
F G
H
Cost of 1tree = 97
Tree in rest of G
Optimal tour, cost = 100
Figure 9.7: A bound calculation of the Branch and Bound algorithm for the
symmetric TSP using the 1tree bound with “#1” equal to A and Lagrangean
relaxation for bounding.
9.1 Branch and Bound  terminology and general description 159
selects among the live subproblems one of those with the lowest bound, called
the best ﬁrst search strategy, BeFS, no superﬂuous bound calculations take place
after the optimal solution has been found. Figure 9.8 (a) shows a small search
tree  the numbers in each node corresponds to the sequence, in which the nodes
are processed when BeFS is used.
The explanation of the property regarding superﬂuos bound calculations lies
in the concept of critical subproblems. A subproblem P is called critical if
the given bounding function when applied to P results in a value strictly less
than the optimal solution of the problem in question. Nodes in the search tree
corresponding to critical subproblems have to be partitioned by the Branch and
Bound algorithm no matter when the optimal solution is identiﬁed  they can
never be discarded by means of the bounding function. Since the lower bound
of any subspace containing an optimal solution must be less than or equal to the
optimum value, only nodes of the search tree with lower bound less than or equal
to this will be explored. After the optimal value has been discovered only critical
nodes will be processed in order to prove optimality. The preceding argument
for optimality of BeFS with respect to number of nodes processed is valid only
if eager node evaluation is used since the selection of nodes is otherwise based
on the bound value of the father of each node. BeFS may, however, also be used
in combination with lazy node evaluation.
Even though the choice of the subproblem with the current lowest lower bound
makes good sense also regarding the possibility of producing a good feasible
solution, memory problems arise if the number of critical subproblems of a
given problem becomes too large. The situation more or less corresponds to a
breath ﬁrst search strategy, in which all nodes at one level of the search tree are
processed before any node at a higher level. Figure 9.8 (b) shows the search tree
with the numbers in each node corresponding to the BFS processing sequence.
The number of nodes at each level of the search tree grows exponentially with the
level making it infeasible to do breadth ﬁrst search for larger problems. For GPP
sparse problems with 120 vertices often produce in the order of a few hundred
of critical subproblems when the RoucairolHansen bounding function is used
[4], and hence BeFS seems feasible. For QAP the famous Nugent20 problem
[13] produces 3.6 10
8
critical nodes using GilmoreLawler bounding combined
with detection of symmetric solutions [5], and hence memory problems may be
expected if BeFS is used.
The alternative used is depth ﬁrst search, DFS. Here a live node with largest
level in the search tree is chosen for exploration. Figure 9.8 (c) shows the
DFS processing sequence number of the nodes. The memory requirement in
terms of number of subproblems to store at the same time is now bounded
above by the number of levels in the search tree multiplied by the maximum
number of children of any node, which is usually a quite manageable number.
160 Branch and Bound
f=1, g=0
f =1, g=0.5
f=2, g=0.5
f=g=1 f=g=4
f=g=3
f=g=4
f=g=2
f=3, g=2.5
f=1, g=0
f =1, g=0.5
f=2, g=0.5
f=g=1 f=g=4
f=g=3
f=g=4
f=g=2
f=3, g=2.5
(c)
f=1, g=0
f =1, g=0.5
f=2, g=0.5
f=g=1 f=g=4
f=g=3
f=g=4
f=g=2
f=3, g=2.5
1
2 3
4 5 6
7
8 9
1
2
3 4
5 6
7
8
9
1
2 3
4
5 6 9
8 7
(a)
(b)
Figure 9.8: Search strategies in Branch and Bound: (a) Best First Search, (b)
Breadth First Search, and (c) Depth First Search.
9.1 Branch and Bound  terminology and general description 161
DFS can be used both with lazy and eager node evaluation. An advantage
from the programming point of view is the use of recursion to search the tree
 this enables one to store the information about the current subproblem in an
incremental way, so only the constraints added in connection with the creation of
each subproblem need to be stored. The drawback is that if the incumbent is far
from the optimal solution, large amounts of unnecessary bounding computations
may take place.
In order to avoid this, DFS is often combined with a selection strategy, in which
one of the branches of the selected node has a very small lower bound and the
other a very large one. The idea is that exploring the node with the small
lower bound ﬁrst hopefully leads to a good feasible solution, which when the
procedure returns to the node with the large lower bound can be used to fathom
the node. The node selected for branching is chosen as the one, for which the
diﬀerence between the lower bounds of its children is as large as possible. Note
however that this strategy requires the bound values for children to be known,
which again may lead to superﬂuous calculations.
A combination of DFS as the overall principle and BeFS when choice is to be
made between nodes at the same level of the tree is also quite common.
In [2] an experimental comparison of BeFS and DFS combined with both eager
and lazy node evaluation is performed for QAP. Surprisingly, DFS is superior
to BeFS in all cases, both in terms of time and in terms of number of bound
calculations. The reason turns out to be that in practice, the bounding and
branching of the basic algorithm is extended with additional tests and calcula
tions at each node in order to enhance eﬃciency of the algorithm. Hence, the
theoretical superiority of BeFS should be taken with a grain of salt.
9.1.3 Branching rule
All branching rules in the context of Branch and Bound can bee seen as sub
division of a part of the search space through the addition of constraints, often
in the form of assigning values to variables. If the subspace in question is sub
divided into two, the term dichotomic branching is used, otherwise one talks
about polytomic branching. Convergence of Branch and Bound is ensured if the
size of each generated subproblem is smaller than the original problem, and the
number of feasible solutions to the original problem is ﬁnite. Normally, the sub
problems generated are disjoint  in this way the problem of the same feasible
solution appearing in diﬀerent subspaces of the search tree is avoided.
For GPP branching is usually performed by choosing a vertex not yet assigned
162 Branch and Bound
to any of V
1
and V
2
and assigning it to V
1
in one of the new subproblems
(corresponding to that the variable of the node receives the value 1) and to V
2
in the other (variable value equal to 0). This branching scheme is dichotomic,
and the subspaces generated are disjoint.
In case of QAP, an unassigned facility is chosen, and a new subproblem is created
for each of the free location by assigning the chosen facility to the location. The
scheme is called branching on facilities and is polytomic, and also here the
subspaces are disjoint. Also branching on locations is possible.
For TSP branching may be performed based on the 1tree generated during
bounding. If all vertices have degree 2 the 1tree is a tour, and hence an optimal
solution to the subproblem. Then no further branching is required. If a node
has degree 3 or more in the 1tree, any such node may be chosen as the source of
branching. For the chosen node, a number of subproblems equal to the degree
is generated. In each of these one of the edges of the 1tree is excluded from the
graph of the subproblem ruling out the possibility that the bound calculation
will result in the same 1tree. Figure 9.9 shows the branching taking place after
the bounding in Figure 9.7. The bound does, however, not necessarily change,
and identical subproblems may arise after a number of branchings. The eﬀect
of the latter is not an incorrect algorithm, but a less eﬃcient algorithm. The
problem is further discussed as an exercise.
9.1.4 Producing an initial solution
Although often not explicitly mentioned, another key issue in the solution of
large combinatorial optimization problems by Branch and Bound is the con
struction of a good initial feasible solution. Any heuristic may be used, and
presently a number of very good general heuristics as well as a wealth of very
problem speciﬁc heuristics are available. Among the general ones (also called
metaheuristics or paradigms for heuristics), Simulated Annealing, Genetic Al
gorithms, and Tabu Search are the most popular.
As mentioned, the number of subproblems explored when the DFS strategy for
selection is used depends on the quality of the initial solution  if the heuris
tic identiﬁes the optimal solution so that the Branch and Bound algorithm
essentially veriﬁes the optimality, then even DFS will only explore critical sub
problems. If BeFS is used, the value of a good initial solution is less obvious.
Regarding the three examples, a good and fast heuristic for GPP is the Kernighan
Lin variable depth local search heuristic. For QAP and TSP, very good results
have been obtained with Simulated Annealing.
9.2 Personal Experiences with GPP and QAP 163
A
B
C
D
E
F G
H
Cost of 1tree = 98
A
B
C
D
E
F G
H
A
B
C
D
E
F G
H
A
B
C
D
E
F G
H
The optimal tour ! Cost of 1tree = 104
Exclude (A,H)
Exclude (D,H)
Exclude (H,G)
Figure 9.9: Branching from a 1tree in a Branch and Bound algorithm for the
symmetric TSP.
9.2 Personal Experiences with GPP and QAP
The following subsections brieﬂy describe my personal experiences using Branch
and Bound combined with parallel processing to solve GPP and QAP. Most of
the material stems from [3, 4] and [5]. Even though parallelism is not an integral
part of Branch and Bound, I have chosen to present the material, since the key
components of the Branch and Bound are unchanged. A few concepts from
parallel processing is, however, necessary.
Using parallel processing of the nodes in the search tree of a Branch and Bound
algorithm is a natural idea, since the bound calculation and the branching in
each node is independent. The aim of the parallel procesing is to speed up
the execution time of the algorithm, To measure the success in this aspect,
the speedup of adding processors is measured. The relative speedup using p
processors is deﬁned to be the processing time T(1) using one processor divided
by the processing time T(p) using p processors:
S(p) = T(1)/T(p)
164 Branch and Bound
The ideal value of S(p) is p  then the problem is solved p times faster with p
processors than with 1 processor.
An important issue in parallel Branch and Bound is distribution of work: in
order to obtain as short running time as possible, no processor should be idle
at any time during the computation. If a distributed system or a network
of workstations is used, this issue becomes particularly crucial since it is not
possible to maintain a central pool of live subproblems. Various possibilities for
load balancing schemes exist  two concrete examples are given in the following,
but additional ones are described in [7].
9.2.1 Solving the Graph Partitioning Problem in Parallel
GPP was my ﬁrst experience with parallel Branch and Bound, and we imple
mented two parallel algorithms for the problem in order to investigate the trade
oﬀ between bound quality and time to calculate the bound. One  called the
CTalgorithm  uses an easily computable bounding function based on the prin
ciple of modiﬁed objective function and produces bounds of acceptable quality,
whereas the other  the RHalgorithm  is based on Lagrangean relaxation and
has a bounding function giving tight, but computationally expensive bounds.
The system used was a 32 processor IPSC1/d5 hypercube equipped with Intel
80286 processors and 80287 coprocessors each with 512 KB memory. No dedi
cated communication processors were present, and the communication facilities
were Ethernet connections implying a large startup cost on any communication.
Both algorithms were of the distributed type, where the pool of live subproblems
is distributed over all processors, and as strategy for distributing the workload
we used a combined “on demand”/”on overload” strategy. The “on overload”
strategy is based on the idea that if a processor has more than a given thresh
old of live subproblems, a number of these are sent to neighbouring processors.
However, care must be take to ensure that the system is not ﬂoated with com
munication and that ﬂow of subproblems between processors takes place during
the entire solution process. The scheme is illustrated in Figure 9.10.
Regarding termination, the algorithm of Dijkstra et. al. [6] was used. The
selection strategy for next subproblem were BeFs for the RHalgorithm and
DFS for the CTalgorithm. The ﬁrst feasible solution was generated by the
KernighanLin heuristic, and its value was usually close to the optimal solution
value.
For the CTalgorithm, results regarding processor utilization and relative speed
9.2 Personal Experiences with GPP and QAP 165
queuemax
to neighbours

(a)
queueup
queuedown
new
queuemax
queuemax

(b)
queuedown
new
queuemax
Figure 9.10: Illustration of the on overload protocol. (a) is the situation, when a
processor when checking ﬁnds that it is overloaded, and (b) shows the behaviour
of a nonoverloaded processor
No. of proc. 4 8 16 32
CT time (sec) 1964 805 421 294
proc. util. (%) 97 96 93 93
no. of bound calc. 449123 360791 368923 522817
RH time (sec) 1533 1457 1252 1219
proc. util. (%) 89 76 61 42
no. of bound calc. 377 681 990 1498
Table 9.1: Comparison between the CT and RHalgorithm on a 70 vertex
problem with respect to running times, processor utilization, and number of
subproblems solved.
up were promising. For large problems, a processor utilization near 100% was
observed, and linear speedup close to the ideal were observed for problems
solvable also on a single processor. Finally we observed that the best speedup
was observed for problems with long running times. The RHalgorithm behaved
diﬀerently – for small to medium size problems, the algorithm was clearly inferior
to the CTalgorithm, both with respects to running time, relative speedup and
processor utilization. Hence the tight bounds did not pay oﬀ for small problems
– they resulted idle processors and long running times.
We continued to larger problems expecting the problem to disappear, and Figure
166 Branch and Bound
4
8
16
32
4 8 16 32
processors
CTbounding
RHbounding
ideal
Figure 9.11: Relative speedup for the CTalgorithm and the RHalgorithm for
a 70 vertex problem.
9.11 and Table 9.1 shows the results for a 70vertex problem for the CT and RH
algorithms. We found that the situation did by no means improve. For the RH
method it seemed impossible to use more than 4 processors. The explanation
was found in the number of critical subproblems generated, cf. Table 9.2. Here
it is obvious that using more processors for the RHmethod just results in a
lot of superﬂuous subproblems being solved, which does not decrease the total
solution time.
9.2.2 Solving the QAP in Parallel
QAP is one of my latest parallel Branch and Bound experiences. The aim of the
research was in this case to solve the previously unsolved benchmark problem
Nugent20 to optimality using a combination of the most advanced bounding
functions and fast parallel hardware, as well as any other trick we could ﬁnd
and think of.
We used a MEIKO system consisting of 16 Intel i860 processors each with 16
MB of internal memory. Each processor has a peak performance of 30 MIPS
when doing integer calculation giving an overall peak performance of approxi
9.2 Personal Experiences with GPP and QAP 167
CTalgorithm RHalgorithm
No. Vert. Cr. subpr. B. calc. Sec. Cr. subpr. B. calc. Sec.
30 103 234 1 4 91 49
40 441 803 2 11 150 114
50 2215 3251 5 15 453 278
60 6594 11759 18 8 419 531
70 56714 171840 188 26 1757 1143
80 526267 901139 931 19 2340 1315
100 2313868 5100293 8825 75 3664 3462
110 8469580 15203426 34754 44 3895 4311
120 – – – 47 4244 5756
Table 9.2: Number of critical subproblems and bound calculations as a function
of problem size.
mately 500 MIPS for the complete system. The performance of each single i860
processor almost matches the performance of the much more powerful Cray 2
on integer calculations, indicating that the system is very powerful.
The processors each have two Inmos T800 transputers as communication pro
cessors. Each transputer has 4 communication channels each with bandwidth
1.4 Mb/second and startup latency 340 µs. The connections are software pro
grammable, and the software supports pointtopoint communication between
any pair of processors. Both synchronous and asynchronous communication are
possible, and also both blocking and nonblocking communication exist.
The basic framework for testing bounding functions was a distributed Branch
and Bound algorithm with the processors organized as a ring. Workload distri
bution was kept simple and based on local synchronization. Each processor in
the ring communicates with each of its neighbours at certain intervals. At each
communication the processors exchange information on the respective sizes of
subproblem pools, and based hereon, subproblems are sent between the proces
sors. The speedup obtained with this scheme was 13.7 for a moderately sized
problem with a sequential running time of 1482 seconds and a parallel running
time with 16 processors of 108 seconds.
The selection strategy used was a kind of breadth ﬁrst search. The feasibility
hereof is intimately related to the use of a very good heuristic to generate the
incumbent. We used simulated annealing, and as reported in [5], spending less
than one percent of the total running time in the heuristic enabled us to start
the parallel solution with the optimal solution as the incumbent. Hence only
critical subproblems were solved. Regarding termination detection, a tailored
algorithm were used for this purpose.
168 Branch and Bound
Mautor & Roucairol Fac. br. w. symmetry
Problem No. nodes. Time (s) No. nodes. Time (s)
Nugent 15 97286 121 105773 10
Nugent 16.2 735353 969 320556 34
Nugent 17 – – 24763611 2936
Nugent 18 – – 114948381 14777
Nugent 20 – – 360148026 57963
Elshafei 19 575 1.4 471 0.5
Armour & Buﬀa 20 531997 1189 504452 111
Table 9.3: Result obtained by the present authors in solving large standard
benchmark QAPs. Results obtained by Mautor and Roucairol is included for
comparison.
Problem Dynamic dist. Init. subpr. per proc. Static dist.
Nugent 8 0.040 1 0.026
Nugent 10 0.079 1 0.060
Nugent 12 0.328 6 0.381
Nugent 14 12.792 24 13.112
Nugent 15 10.510 41 11.746
Nugent 16 35.293 66 38.925
Table 9.4: Result obtained when solving standard benchmark QAPs using static
workload distribution. Results obtained with dynamic distribution are included
for comparison.
The main results of the research are indicated in Table 9.3. We managed to solve
previously unsolved problems, and for problems solved by other researchers, the
results clearly indicated the value of choosing an appropriate parallel system for
the algorithm in question.
To get an indication of the eﬃciency of socalled static workload distribution in
our application, an algorithm with static workload distribution was also tested.
The results appear in Table 9.4. The subproblems distributed to each proces
sor were generated using BeFS sequential Branch and Bound until the pool of
live subproblems were suﬃciently large that each processors could get the re
quired number of subproblems. Hence all processors receive equally promising
subproblems. The optimal number of subproblems pr. processors were deter
mined experimentally and equals roughly (p − 8)
4
/100, where p is the number
of processors.
9.3 Ideas and Pitfalls for Branch and Bound users. 169
9.3 Ideas and Pitfalls for Branch and Bound
users.
Rather than giving a conclusion, I will in the following try to equip new users of
Branch and Bound  both sequential and parallel  with a checklist correspond
ing to my own experiences. Some of the points of the list have already been
mentioned in the preceding sections, while some are new.
• The importance of ﬁnding a good initial incumbent cannot be overesti
mated, and the time used for ﬁnding such one is often only few percentages
of the total running time of the algorithm.
• In case an initial solution very close to the optimum is expected to be
known, the choice of node selection strategy and processing strategy makes
little diﬀerence.
• With a diﬀerence of more than few percent between the value of the initial
solution and the optimum the theoretically superior BeFS Branch and
Bound shows inferior performance compared to both lazy and eager DFS
Branch and Bound. This is in particular true if the pure Branch and
Bound scheme is supplemented with problem speciﬁc eﬃciency enhancing
test for e.g. supplementary exclusion of subspaces, and if the branching
performed depends on the value of the current best solution.
9.4 Supplementary Notes
9.5 Exercises
1. Finish the solution of the biking tourist’s problem on Bornholm.
2. Give an example showing that the branching rule illustrated in Figure
9.9 may produce nodes in the search tree with nondisjoint sets of feasi
ble solutions. Devise a branching rule, which ensures that all subspaces
generated are disjoint.
3. Solve the symmetric TSP instance with n = 5 and distance matrix
(c
e
) =
¸
¸
¸
− 10 2 4 6
− − 9 3 1
− − − 5 6
− − − − 2
¸
170 Branch and Bound
by Branch and Bound using the 1tree relaxation to obtain lower bounds.
4. Solve the asymmetric TSP instance with n = 4 and distance matrix
(c
ij
) =
¸
¸
¸
− 6 4 17
12 − 7 20
19 9 − 14
19 19 5 −
¸
by Branch and Bound using an assignment relaxation to obtain bounds.
5. Consider the GPP as described in Example 1. By including the term
λ (
¸
v∈V
x
v
−[V [/2)
in the objective function, a relaxed unconstrained problem with modiﬁed
objective function results for any λ. Prove that the new objective is less
than or equal to the original on the set of feasible solutions for any λ.
Formulate the problem of ﬁnding the optimal value of λ as an optimization
problem.
6. A node in a Branch and Bound search tree is called semicritical if the
corresponding bound value is less than or equal to the optimal solution of
the problem. Prove that if the number of semicritical nodes in the search
tree corresponding to a Branch and Bound algorithm for a given problem
is polynomially bounded, then the problem belongs to {.
Prove that this holds also with the weaker condition that the number of
critical nodes is polynomially bounded.
7. Consider again the QAP as described in Example 3. The simplest bound
calculation scheme is described in Section 2.1. A more advanced, though
still simple, scheme is the following:
Consider now partial solution in which m of the facilities has been assigned
to mof the locations. The total cost of any feasible solution in the subspace
determined by a partial solution consists of three terms: costs for pairs
of assigned facilities, costs for pairs consisting of one assigned and one
unassigned facility, and costs for pairs of two unassigned facilities. The
ﬁrst term can be calculated exactly. Bounds for each of the two other terms
can be found based on the fact that a lower bound for a scalar product
(a
1
, ..., a
p
) (b
π(1)
, ..., b
π(p)
), where a and b are given vectors of dimension p
and π is a permutation of ¦1, ..., p¦, is obtained by multiplying the largest
element in a with the smallest elements in b, the nextlargest in a with
the nextsmallest in b etc.
9.5 Exercises 171
For each assigned facility, the ﬂows to unassigned facilities are ordered
decreasingly and the distances from the location of the facility to the
remaining free locations are ordered increasingly. The scalar product is
now a lower bound for the communication cost from the facility to the
remaining unassigned facilities.
The total transportation cost between unassigned facilities can be bounded
in a similar fashion.
(a)
Consider the instance given in Figure 6. Find the optimal solution to the
instance using the bounding method described above.
(b)
Consider now the QAP, where the distances between locations are given
as the rectangular distances in the following grid:
1 2 3
4 5 6
The ﬂows between pairs of facilities are given by
F =
¸
¸
¸
¸
¸
¸
¸
0 20 0 15 0 1
20 0 20 0 30 2
0 20 0 2 0 10
15 0 2 0 15 2
0 30 0 15 0 30
1 2 10 2 30 0
¸
Solve the problem using Branch and Bound with the bounding function
described above, the branching strategy described in text, and DFS as
search strategy.
To generate a ﬁrst incumbent, any feasible solution can be used. Try prior
to the Branch and Bound execution to identify a good feasible solution.
A solution with value 314 exists.
8. Consider the 01 knapsack problem:
max
¸
n
j=1
c
j
x
j
s.t.
¸
n
j=1
a
j
x
j
≤ b
x ∈ B
n
with a
j
, c
j
> 0 for j = 1, 2, . . . , n.
172 Branch and Bound
(a) Show that if the items are sorted in relation to their cost/weight
ratio then the LP relaxation can be established as x
j
= 1 for j =
1, 2, . . . , r − 1, x
r
= (b −
¸
r−1
j=1
a
j
)/a
r
and x
j
= 0 for j > r where r
is deﬁned by such that
¸
r−1
j=1
a
j
≤ b and
¸
r
j=1
a
j
> b.
(b) Solve the instance
max 17x
1
+ 10x
2
+ 25x
3
+ 17x
4
5x
1
+ 3x
2
+ 8x
3
+ 7x
4
≤ 12
x ∈ B
4
by Branch and Bound.
Bibliography
[1] A. de Bruin, A. H. G. Rinnooy Kan and H. Trienekens, “A Simulation
Tool for the Performance of Parallel Branch and Bound Algorithms”,
Math. Prog. 42 (1988), p. 245  271.
[2] J. Clausen and M. Perregaard, “On the Best Search Strategy in Parallel
BranchandBound  BestFirstSearch vs. Lazy DepthFirstSearch”,
Proceedings of POC96 (1996), also DIKU Report 96/14, 11 p.
[3] J. Clausen, J. L. Tr¨ aﬀ, “Implementation of parallel BranchandBound
algorithms  experiences with the graph partitioning problem”, Annals
of Oper. Res. 33 (1991) 331  349.
[4] J. Clausen and J. L. Tr¨ aﬀ, “Do Inherently Sequential Branchand
Bound Algorithms Exist ?”, Parallel Processing Letters 4, 12 (1994),
p. 3  13.
[5] J. Clausen and M. Perregaard, “Solving Large Quadratic Assignment
Problems in Parallel”, DIKU report 1994/22, 14 p., to appear in Com
putational Optimization and Applications.
[6] E. W. Dijkstra, W. H. J. Feijen and A. J. M. van Gasteren, “Derivation
of a termination detection algorithm for distributed computations”,
Inf. Proc. Lett. 16 (1983), 217  219.
[7] B. Gendron and T. G. Cranic, “Parallel BranchandBound Algo
rithms: Survey and Synthesis”, Operations Research 42 (6) (1994),
p. 1042  1066.
[8] T. Ibaraki, “Enumerative Approaches to Combinatorial Optimiza
tion”, Annals of Operations Research vol. 10, 11, J.C.Baltzer 1987.
174 BIBLIOGRAPHY
[9] P. S. Laursen, “Simple approaches to parallel Branch and Bound”,
Parallel Computing 19 (1993), p. 143  152.
[10] P. S. Laursen, “Parallel Optimization Algorithms  Eﬃciency vs. sim
plicity”, Ph.D.thesis, DIKUReport 94/31 (1994), Dept. of Comp. Sci
ence, Univ. of Copenhagen.
[11] E. L. Lawler, J. K. Lenstra, A. H. G. .Rinnooy Kan, D. B. Shmoys
(ed.), “The Travelling Salesman: A Guided Tour of Combinatorial
Optimization”, John Wiley 1985.
[12] T. Mautor, C. Roucairol, “A new exact algorithm for the solution
of quadratic assignment problems”, Discrete Applied Mathematics 55
(1994) 281293.
[13] C. Nugent, T. Vollmann, J. Ruml, “An experimental comparison of
techniques for the assignment of facilities to locations”, Oper. Res. 16
(1968), p. 150  173.
[14] M. Perregaard and J. Clausen , “Solving Large Job Shop Scheduling
Problems in Parallel”, DIKU report 94/35, to appear in Annals of OR.
[15] C. Sch¨ utt and J. Clausen, “Parallel Algorithms for the Assignment
Problem  Experimental Evaluation of Three Distributed Algorithms”,
AMS DIMACS Series in Discrete Mathematics and Theoretical Com
puter Science 22 (1995), p. 337  351.
Appendix A
A quick guide to GAMS –
IMP
To login at IMM from the GBAR is easy:
1. Login at your favorite xterminal at the GBAR.
2. Start a xterm. (Click on the middle mousebuttom, select terminals and
then terminal ).
3. In that xterm write:
• ssh l nh?? serv1.imm.dtu.dk
where nh?? correspond to your course login name. You will be prompted
for a password.
Alternatives to serv1.imm.dtu.dk are
• serv2.imm.dtu.dk
• serv3.imm.dtu.dk
• sunfire.imm.dtu.dk
You now have a shellwindow working at IMM. Remember to change password
at IMM with the
176 A quick guide to GAMS – IMP
passwd
command.
To write a ﬁle at IMM start an emacs editor at IMM by writing (in the xterm
working at IMM):
emacs &
To start cplex write (in the xterm working at IMM):
cplex64
You can start more xterms to work at IMM in the same way.
CPLEX can be run in interactive mode or used as a library callable from e.g.
C, C++ or Java programs. In this course we will only use CPLEX in interactive
mode.
To start a CPLEX session type cplex and press the enterkey. You will then
get an introductory text something like:
ILOG CPLEX 9.000, licensed to "universitylyngby", options: e m b q
Welcome to CPLEX Interactive Optimizer 9.0.0
with Simplex, Mixed Integer & Barrier Optimizers
Copyright (c) ILOG 19972003
CPLEX is a registered trademark of ILOG
Type ’help’ for a list of available commands.
Type ’help’ followed by a command name for more
information on commands.
CPLEX>
To end the session type quit and press the enterkey. Please do not quit CPLEX
by closing the xterm window before having exited CPLEX. At IMM we have 20
licenses, so at most 20 persons can run CPLEX independently of each other. If
177
you quit by closing the xterm window before leaving CPLEX in a proper manner
it will take the license manager some time to recover the CPLEX license.
CPLEX solves linear and integer programming problems. These must be entered
either through the builtin editor of CPLEX or by entering the problem in ”your
favorite editor”, saving it to a ﬁle, and then reading the problem into CPLEX
by typing
read <filename.lp>.
The ﬁle has to have extension .lp and the contents has to be in the lpformat.
A small example is shown below.
\Problem
name: soejle.lp
Minimize
obj: x1 + x2 + x3 + x4
Subject To
c1: x1 + 2 x3 + 4 x4 >= 6
c2: x2 + x3 >= 3
End
The ﬁrst line is a remark and it stretches the entire line. Subject to can be
replaced with st. The text written in the start of each line containing the
objective function or constraints is optional, so obj: can be omitted.
After having entered a problem, it can be solved by giving the command optimize
at the CPLEX> prompt and press enter. To see the result, the command display solution variables
and press enter is used, ”” indicating that the values of all variables are to be
displayed.
CPLEX writes a logﬁle, which records the events of the session. An example of
the logﬁle corresponding to the solution of the example above is shown below.
The events of the session has been:
cplex <enter>
read soejle.lp <enter>
optimize <enter>
display solution variables  <enter>
quit <enter>
178 A quick guide to GAMS – IMP
and the resulting logﬁle looks like:
Log started (V6.5.1) Tue Feb 15 10:24:58 2000
Problem ’soejle.lp’ read.
Read time = 0.00 sec.
Tried aggregator 1 time.
LP Presolve eliminated 0 rows and 1 columns.
Reduced LP has 2 rows, 3 columns, and 4 nonzeros.
Presolve time = 0.00 sec.
Iteration log . . .
Iteration: 1 Infeasibility = 3.000000
Switched to devex.
Iteration: 3 Objective = 3.000000
Primal  Optimal: Objective = 3.0000000000e+00
Solution time = 0.00 sec. Iterations = 3 (2)
Variable Name Solution Value
x3 3.000000
All other variables in the range 14 are zero.
Now let us take our initial problem and assume that we want x1 and x2 to be
integer variables between 0 and 10. That the variables are nonnegative are
implicitely assumed by CPLEX, but we need to state the upper bound and the
integrality condition. In this case our program will look like:
\Problem
name: soejle.lp
Minimize
obj: x1 + x2 + x3 + x4
Subject To
c1: x1 + 2 x3 + 4 x4 >= 6
c2: x2 + x3 >= 3
Bounds
x1<=10
x2<=10
Integer
179
x1
x2
End
Bounds is used to declare bounds on variables, and the section afterwards,
Integer states that x1 and x2 must be integer solutions. The bounds sec
tion must be placed before the section declaring the integer variables. It does
not seem intuitive nevertheless if you do not state a bounds part CPLEX will
assume the integer variables to be binary. If you want the integer variable
to have no upper bound you can x2<=INF in the bounds section.
The command help shows the possible commands in the current situation. Also,
CPLEX provides help if the current command is not suﬃcient to uniquely to
determine an action. As an example, if one types display CPLEX will respond
with listing the options and the question ”Display what ?” CPLEX also oﬀers
possibilities to change parameters in a problem already entered  these possibil
ities may be investigated by entering help as the ﬁrst command after having
entered CPLEX.
180 A quick guide to GAMS – IMP
Appendix B
Extra Assignments
Question 1
You have just been employed as a junior OR consultant in the consulting form
Optimal Solutions. The ﬁrst task that lands on your desk is a preliminary
study for a big manufacturing company Rubber Duck. The company has
three factories F1, F2 and F3 and four warehouses W1, W2, W3 and W4. In
the current setup of their supply chain products are transported from a factory
to a warehouse via a crossdock. A crossdock is a special form of warehouse
where goods are only stored for a very short time. Goods arrive at a crossdock,
is then possibly repacked and then put on trucks for their destination. It is
used in order to consolidate the transportation of goods. Rubber Duck uses two
crossdocks, CD1 and CD2, in their operations.
The availability of trucks deﬁnes an upper limit on how many truck loads we
can transport from a factory to a crossdock and from a crossdock to a ware
house. Table B.1 show the transportation capacities measured in truck loads
from factory to crossdock, and from crossdock to warehouse.
182 Extra Assignments
F1 F2 F3
CD1 34 28 27
CD2 40 40 22
W1 W2 W3 W4
CD1 30 24 22 —
CD2 — 26 12 58
Table B.1: Capacities in the Rubber Duck supply chain measured in truck loads.
Note that it is not possible to supply warehouse W4 from crossdock CD1 and
warehouse W1 from crossdock CD2.
Question 1.1
We want to determine how much Rubber Duck at most can get through their
supply chain from the factories to the warehouses. Therefore give a maximum
ﬂow formulation of it. Draw a graph showing the problem as a maximum ﬂow
problem. Your network should have 11 vertices: a source (named 0), a vertex for
each factory, crossdock and warehouse, and a sink (named 11). The graph must
contain all information necessary when formulation a maximum ﬂow problem.
Question 1.2
Solve the maximum ﬂow problem using the augmenting path algorithm. Show
the ﬂow graphically on a ﬁgure of the network. Write the value of the maximum
ﬂow and list each of the augmenting paths. Find a minimum cut and list the
vertices in the minimum cut.
Question 1.3
Just as you are about to return a small report on your ﬁndings to your manager a
phone call from a planner from Rubber Ducks logistics department reveals some
new information. It turns out that there is a limit on how much each of the
crossdocks can handle. This can be viewed as an extension to the maximum
ﬂow model. Let c
v
be the maximum number of truck loads which can pass
through an internal vertex v. Given a ﬂow x
uv
from vertex u to vertex v, the
ﬂow capacity of vertex v can be expressed as
¸
(u,v)∈E
x
uv
≤ c
v
.
Describe how to modify a network so that this new extension can be handled
as a maximum ﬂow problem in a modiﬁed network.
183
Given that the capacity of crossdock CD1 is 72 truck loads and of CD2 is
61 truck loads introduce the modiﬁcations and solve the new maximum ﬂow
problem. Show the ﬂow graphically on a ﬁgure of the network. Write the value
of the maximum ﬂow and list each of the augmenting paths. Find a minimum
cut and list the vertices in the minimum cut.
Question 2
Your success on the ﬁrst assignment quickly lands you another case. A medium
sized company with branches geographically distributed all over Denmark has
approached Optimal Solutions. The company you are going to work for has
decided to build an intranet based on communication capacity leased from a
large telecommunication vendor, Yellow. This operator has a backbone network
connecting a number of locations, and you can lease communication capacity
between individual pairs of points in the backbone network and in that way
build your own network. Fortunately, all branches are located in one of the
locations of the backbone network, however, not all locations house a branch
of the company.
The pricing policy of Yellow is as follows: Each customer individually buys
in on a set connections chosen by the customer. The price is calculated as
the sum of the prices for the individual connections. Each customer is oﬀered
a ”customized” price for each connections, and these prices (although usually
positive) may be negative to reﬂect a discount given by Yellow.
Figure B.1 shows a small example, where 3 branches have to be connected in a
network with 6 locations.
Your task is to analyze the situation of the company and determine which con
nections to lease in order to establish the cheapest intranet. Yellow guarantees
that the connections will always be operative. You do therefore not have to
consider failure protection.
Question 2.1
Formulate a general mathematical programming model for the optimization
problem of establishing the cheapest intranet, given that all costs are non
negative. The model should have a decision variable for each connection, and a
set of constraints expressing that for each partition of the set of locations into
two nonempty subsets with a branch in each, at least one connection joining
184 Extra Assignments
6
10
9
9
4
5
5
6
7
5
Figure B.1: An example of a telecommunications network  the branches to be
connected of are the black vertices.
the two subset must be chosen.
Can the model be used also when prices may be negative? Would you recom
mend the use of the model for large companies with hundreds of branches?
Question 2.2
Use the model to solve the example problem of Figure B.1 using CPLEX with
01 variables. Solve the model also when the integrality constraints on the vari
ables are removed. Comment on the two solutions obtained and their objective
function values.
Question 2.3
An alternative to solving the problem to optimality is to use a method, which
gives a good (but not necessarily optimal) solution fast. Such a method is called
a heuristic.
As you realize the proximity of the problem to the minimum spanning tree you
hit upon the following idea inspired by Prims algorithm: Start by choosing
(randomly) a branch. Let us call it b
0
. Now for each other branch compute
the shortest distance to b
0
. The branch with the shortest distance to b
0
will
185
be denoted b
1
. Then add the connections on the path from b
1
to b
0
to the set
of selected connections. In each of the subsequent iterations we continue this
approach. Determine the unconnected branch b
i
with the shortest distance to
the tree already generated, that is, ﬁnd an unconnected branch the shortest
distance to any location that has been selected.
The process is terminated when all branches are added to the tree, that is, all
branches are connected.
Obviously, the “killer” part of the described algorithm is the shortest path
calculations. However, there are two problems: 1) the network is not directed,
and 2) there may be edges of negative length. How would you cope with the
nondirectedness of the network, and which shortest path algorithm would you
suggest to use given that negative edge lengths are present?
However, you realize that it may be easier to solve the problem regarding nega
tive edge lengths by changing the problem to reﬂect that it is always beneﬁcial
to choose an edge of negative cost, even though you don’t need the connection.
Describe the corresponding problem transformation, whereby you end up with
a network with only nonnegative edges lengths.
You now are in the position that you may choose any of the shortest path
algorithms touched upon in the course. In Figure B.2, the network of your
clients company is shown. Which connections should be leased, if these are
determined with the method just described with b
0
chosen to be A? Show the
sequence in which the locations are considered, and the connections added in
each iteration.
Question 3
As a result of you analysis Rubber Duck has decided to implement a supply chain
optimization ITsystem. They have speciﬁcally asked for your involvement in the
project. You have been put in charge of the rather large project of implementing
the ITsystem and the corresponding organizational changes.
You immediately remember something about project management from you
engineering studies. To get into the tools and the way of thinking, you decide
to solve a small test case using PERT/CPM.
The available data is presented in Table B.2.
186 Extra Assignments
Activity Imm. pred. Normal time Crash time Unit cost when
(D
i
) (d
j
) shortening
e
0
2 1 7
e
1
e
0
3 2 7
e
2
e
1
2 1 5
e
3
e
1
4 2 2
e
4
e
2
, e
3
4 1 6
e
5
e
0
9 2 2
e
6
e
4
, e
5
6 2 9
Table B.2: Data for the activities of the test case.
Question 3.1
Find the duration of the project if all activities are completed according to their
normal times.
In Hillier and Lieberman Section 22.5, a general LP model is formulated allowing
one to ﬁnd the cheapest way to shorten a project.
Question 3.2
Formulate the model for the test project and state the dual model.
Question 3.3
For each activity i there is a variable x
i
. Furthermore there are also two con
strains, “x
i
≥ 0” and “x
i
≤ D
i
− d
i
”. Denote the dual variables of the latter
g
i
. Argue that if d
i
< D
i
, then either g
i
is equal to 0 or x
i
= D
i
− d
i
in any
optimal solution.
Question 3.4
What is the additional cost of shortening the project time for the test case to
16?
187
Background for the ﬁrst project.
For convenience, the background from the ﬁrst project is repeated below.
You are employed by the consulting ﬁrm Optimal Solutions and one of your
customers are a medium sized company with branches geographically distributed
all over Denmark. The company has decided to build an intranet based on
communication capacity leased from a large telecommunication vendor, Yellow.
This operator has a backbone network connecting a number of locations, and
you can lease communication capacity between individual pairs of points in the
backbone network and in that way build you own network. Fortunately, all
branches are located in one of the locations of the backbone network, however,
not all locations house a branch of the company.
The pricing policy of Yellow is as follows: Each customer individually buys in
on a set connections chosen by the customer. The price is the the sum of the
prices for the individual connections. Each customer is oﬀered a ”customized”
price for each connections, and these prices (although usually positive) may be
negative to reﬂect a discount given by Yellow.
You are now hired to analyzed the situation of the company and determine
which connections to lease in order to establish the cheapest intranet. Yellow
guarantees that the connections will always be operative, so failure protection
is not an issue.
Background for the second project.
After having reviewed your solution to the ﬁrst project, the company comes back
with the following comments: 1) As you have pointed out, the pricing strategy of
Yellow seems odd in that connections with negative prices are “too attractive”.
2) Though good solutions found quickly are valuable, its is necessary to be able
also to ﬁnd the optimal solution to the problem, even if this may be quite time
consuming.
Question 4.1
In order to prepare for possible changes in the pricing strategy a model must
be built, in which the intranet solution proposed must be a connected network.
188 Extra Assignments
Formulate such a mathematical model.
Question 4.2
It now turns out that Yellow has also discovered the ﬂaw in their original pricing
strategy. They have informed you that a new price update will arrive soon. Until
then you decide to continue your experiments on some other data you got from
your client. The network is shown in Figure B.3 on page 193.
You decide to try to solve the problem using Branch & Cut based on your
mathematical model from Question 2.1 of the ﬁrst project. You initially set up
an LP with objective function and the bounds “x
ij
≤ 1” in CPLEX and add
only one constraint, namely that the number of connections chosen must be at
least 4. You then solve the problem using CPLEX. What is the result? Identify
a constraint, which must be fulﬁlled by an optimal integer solution, but which
is currently not fulﬁlled. Add this constraint and solve the revised problem.
Comment on the result.
Question 4.3
The constraints in your mathematical model express that no cut in the network
with capacity less than one (where “capacity” is measured by the sum over
xvalues on edges across the cut) separating two branch nodes must exist.
In order to apply Branch & Cut, you need a separation routine, i.e. an algorithm,
which given a potential solution returns a violated inequality if such one exists.
Explain how a maximum ﬂow algorithm can be used as separation routine and
estimate the complexity of this approach.
Question 4.4
You suddenly come to think about Gomory cuts. Maybe these are applicable?
You realize that using CPLEX in interactive mode, it is not possible to extract
the updated coeﬃcients in a tableau row, but you can extract the updated coef
ﬁcients in the objective function (the reduced costs). Is it in theory possible to
189
generate a Gomory cut from the updated coeﬃcients in the objective function?
If so, what is the result for the resulting LPtableau from question 1.2?
Question 5
Your boss suddenly runs into your room and cries: “Drop everything  I have an
urgent job for you. We need to win this new client. Find the best assignment
of persons to jobs in the following situation:
I have 5 persons and 5 jobs, a proﬁt matrix N showing in entry (i, j) the proﬁt
earned from letting person i do job j, and a cost matrix O showing the cost of
educating person i for job j (the matrices are shown below). The educational
budget is 18, and exactly one person has to be assigned to each job.”
N =
¸
¸
¸
¸
¸
4 10 6 5 4
0 8 2 8 1
3 10 5 6 6
2 9 4 8 4
1 8 4 9 3
¸
O =
¸
¸
¸
¸
¸
4 5 6 5 3
5 8 24 3 6
4 10 8 3 5
2 9 2 10 8
2 5 3 1 4
¸
You decide to solve the problem to optimality using BranchandBound.
You ﬁrst convert the problem to a cost minimization problem, which you then
solve. The lower bound computation should be done using Lagrangian relaxation
as described in the next section, i.e. you have to use the objective value of the
assignment problem resulting from the relaxation of the budget constraint with
a nonnegative multiplier. In order to explain the method to your boss, you
have to explain why any nonnegative multiplier gives rise to a lower bound for
your given problem.
190 Extra Assignments
Suggested line of work for each question
Question 4.1
Consider an edge ¦i, j¦. You have to ensure that “if x
ij
= 1 then i must not be
separated form any set containing all branches”. How to model constraints of
the type “x
j
= 0 implies x
i
= 0” is described in Hillier and Lieberman  ﬁgure
out how to express “x
j
= 1 implies x
i
= 1” and use the idea in connection with
the usual connectedness constraints know from the ﬁrst project.
Question 4.2
In BranchandCut, you ﬁrst solve the LP relaxation, and as for TSP, this has
exponentially many constraints. You should ﬁnd one violated constraint by
inspection, add it to the system, resolve, and describe the eﬀect.
Question 4.3
As for TSP, Max Flow can be used based on that the xvalues are interpreted
as capacities. But you have to take care that only cuts separating two branch
vertices are taken into consideration.
Question 4.4
The basic calculations leading to the classical Gomory cut from a chosen cut
row, i, with nonintegral right hand side starts with the equation corresponding
to the i’th row of the Simplex tableau. Start out assuming that the objective
function value is nonintegral, write out the “objective function equation” and
check if all steps in the derivation procedure are valid.
Question 5
Converting the proﬁt maximization problem to a cost minimization problem
with nonnegative cost is done by multiplying all values in the proﬁt matrix
by −1 and adding the largest element of the original proﬁt matrix to each
191
element. Formulation of the Lagrangian relaxation can be done by subtracting
the righthand side of the budget constraint from the lefthand side, multiplying
the diﬀerence with λ and adding the term to the objective function. Collecting
terms to get the objective function coeﬃcient for each x
ij
, you get an assignment
problem for each ﬁxed value of λ. Choose a value of λ, e.g. 1, and solve the
assignment problem. If the budget constraint is not fulﬁlled, then try to increase
λ and resolve. If an optimal solution cannot be found through adjustments of
λ (what are the conditions, under which an optimal solution to the relaxed
problem is also optimal for the original problem?), choose a variable to branch
on and create two new subproblems, each of which is treated as the original
problem.
192 Extra Assignments
A
B
C
D
E
F
G
H
I
J
K
L
M
3
1
1
7
5
4
6
3
3
3
5
5
5
2
2
2
7
7
4
4
4
9
9
6
10
7
5
4
Figure B.2: The intranetwork problem of your company. The black locations
correspond to your branches to be connected.
193
1
1
A
B
C
D
7
3
8
10
3
4
2
4
4
4
E
F
G
H
I J
2
3
6
8
2
3
1
1
K
L
1
3
5
6
1
5
2
Figure B.3: An example intranetwork problem from your client. The black
locations correspond to your branches to be connected.
Technical University of Denmark Department of Management Engineering DK2800 Kongens Lyngby, Denmark www.man.dtu.dk
ISSN 09093192
Preface
This is a draft version. This volume is a supplement to Wolseys ”Integer Programming” for the course Networks and Integer Programming (42113) at the Department of Management Engineering at the Technical University of Denmark. Whereas Wolsey does an excellent job of describing the intricacies and challenges of general integer program and the techniques necessary to master the ﬁeld, he basically ignores the ﬁeld on network optimization. We do not claim to master this in total but we address the most important problems in network optimization in relation to Wolseys text. We have included two supplementary readings in form of a chapter on duality and branch and bound. Finally an appendix contains a small introduction to the interactive part of CPLEX.
Kgs. Lyngby, February 2009 Jesper Larsen Jens Clausen
ii
Contents
Preface 1 Introduction 1.1 1.2 Graphs and Networks . . . . . . . . . . . . . . . . . . . . . . . . Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i 1 1 3
2 Teaching Duality in Linear Programming – The Multiplier Approach 5 2.1 2.2 2.3 2.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A blending example . . . . . . . . . . . . . . . . . . . . . . . . . General formulation of dual LP problems . . . . . . . . . . . . . Discussion: Pros and Cons of the Approach . . . . . . . . . . . . 6 8 12 18
I
Network Optimization
21
23
3 The Minimum Spanning Tree Problem
iv 3.1 3.2 3.3 3.4 3.5
CONTENTS Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . Kruskal’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . Prim’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . Supplementary Notes . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 33 35 38 38
4 The Shortest Path Problem 4.1 4.2 4.3 4.4 4.5 4.6 4.7 The BellmanFord Algorithm . . . . . . . . . . . . . . . . . . . . Shortest Path in Acyclic graphs . . . . . . . . . . . . . . . . . . . Dijkstra’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . Relation to Duality Theory . . . . . . . . . . . . . . . . . . . . . Applications of the Shortest Path Problem . . . . . . . . . . . . . Supplementary Notes . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43 45 48 51 55 57 62 62
5 Project Planning 5.1 5.2 5.3 5.4 5.5 5.6 5.7 The Project Network . . . . . . . . . . . . . . . . . . . . . . . . . The Critical Path . . . . . . . . . . . . . . . . . . . . . . . . . . . Finding earliest start and ﬁnish times . . . . . . . . . . . . . . . Finding latest start and ﬁnish times . . . . . . . . . . . . . . . . Considering TimeCost tradeoﬀs . . . . . . . . . . . . . . . . . . Supplementary Notes . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69 70 75 76 77 81 86 87
CONTENTS 6 The Max Flow Problem 6.1 6.2 6.3 6.4 6.5 The Augmenting Path Method . . . . . . . . . . . . . . . . . . .
v 93 97
The PreﬂowPush Method . . . . . . . . . . . . . . . . . . . . . . 104 Applications of the max ﬂow problem . . . . . . . . . . . . . . . 110 Supplementary Notes . . . . . . . . . . . . . . . . . . . . . . . . . 110 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7 The Minimum Cost Flow Problem 7.1 7.2 7.3 7.4 7.5
115
Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . 119 Network Simplex for The Transshipment problem . . . . . . . . . 122 Network Simplex Algorithm for the Minimum Cost Flow Problem 127 Supplementary Notes . . . . . . . . . . . . . . . . . . . . . . . . . 132 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
II
General Integer Programming
137
139
8 Dynamic Programming 8.1
The FloydWarshall Algorithm . . . . . . . . . . . . . . . . . . . 139
9 Branch and Bound 9.1 9.2 9.3 9.4 Branch and Bound  terminology and general description
143 . . . . 146
Personal Experiences with GPP and QAP . . . . . . . . . . . . . 163 Ideas and Pitfalls for Branch and Bound users. . . . . . . . . . . 169 Supplementary Notes . . . . . . . . . . . . . . . . . . . . . . . . . 169
vi 9.5 Exercises
CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 175 181
A A quick guide to GAMS – IMP B Extra Assignments
w) : v. Edges are lines connecting two vertices.
. A graph is a structure that is composed of vertices and edges. a network is a (directed) graph in which the edges have an associated ﬂow. E). E is a subset of {(v. Table 1. E(G)) consists of a ﬁnite set of vertices V (G) and a set of edges E(G) – for short denoted G = (V. w ∈ V }.1 gives a number of examples of the use of graphs.1: Simple examples of networks A graph G = (V (G).Chapter
1 Introduction
1.1
Graphs and Networks
Network models are build from two major building blocks: edges (sometimes called arcs) and vertices (sometimes called nodes). A directed graph (sometime denoted digraph) is a graph in which the edges have been assigned an orientation – often shown by arrowheads on the lines. Finally. Only where it is necessary we will use the ”extended” form to describe a graph. Vertices cities switching centers pipe junctions rail junctions Edges roads telephone lines pipes railway lines Flow vehicles telephone calls water trains
Table 1.
2. (5. 7. 6. j) ∈ E is incident with the vertices i and j. 3). (4. G \ B is also called the subgraph induced by V \ B. Figure 1. 4. (2. V + (i) = {(i. 8). 2. in which each vi is a vertex. 7). 5. 3. k. A subgraph H of G has V (H) ⊂ V (G) and E(H) ⊂ E(G). (2. The subgraph H of G is spanning if V (H) = V (G). 9} and E = {(1. and where ei = (vi−1 . edgesimple. (3. 2). 9)}. 6). A chain is a simple path.1 shows a picture of a graph consisting of 9 vertices and 16 edges. G \ A is given by V (G \ A) = V (G) and E(G \ A) = E \ A. P is called a path from v0 to vk or a (v0 . 8. i. We deﬁne V + (i) and V − (i) to be the set of edges out of i resp. (1. 6. (6. j) : i. (1.1 can be V (H) = {1. i and j are called neighbors (or adjacent). 9. An example of a subgraph in Figure 1. (2. 6. 2). 8). 5). j) : i. 4. 2. 4} then the edges in the subgraphs are the edges of G where both endpoints are in V (H) i. Furthermore the path 1. 7).1 then 1. v1 .
. m: V (G) = n and E(G) = m. if all ei are diﬀerent. if all vi are diﬀerent. e2 . where V = {1. and simple. 4). E). A path P in G is a sequence v0 .e. .e. j ∈ V }. j) ∈ E : i ∈ V } and V − (i) = {(j. When A ⊂ E. ek . When B ⊂ V . In a complete (or fully connected) graph all possible pairs of vertices are connected with an edge.. 5. (3. (2.1: Graphical illustrations of graphs An edge (i. vk .. (5. 1 is a closed path. i) ∈ E : j ∈ V }.2
Introduction
The number of vertices resp. j) : i ∈ B ∨ j ∈ B}. into i. G \ B (or G[V \ B]) is given by V (G \ B) = V (G) \ B and E(G \ B) = E \ {(i. 5). 4)}. If we look at our graph in Figure 1. . (1. vk )path. vi ). 7.. 8 denotes a path. The path is said to be closed if v0 = vk . j ∈ V (H)}. E = {(i.. 4). each ei an edge. Instead of a graphical representation is can be stated as G = (V. 2. 3. 3. 6). (4. 4). e1 . (3. that is. 3).. 9). 7). edges in G is denoted n resp. (1. I = 1. as all vertices on the path are distinct it is a simple path (or chain).
7 2 1 8
3
6
9
4
5
Figure 1. E(H) = {(1. (8.. (6. 8). where E(H) ⊂ {(i.
. In a complete (or fully connected) graph all edges are “present”.. . I = 1.2 Algorithms
3
A circuit or cycle C is a path. which means that any vertex is reachable from any other vertex. v1 . j) = i) and an endvertex (head h(i.t(i.2. vi−1 ). these relates to the underlying graph.. in which each vi is a vertex. 8. Formally a directed graph or digraph. which is closed and where v0 . A circuitfree graph G is called a forest. j ∈ V }. j) of vertices. 1 or 2.. k. j) ∈ E is incident with the vertices i and j. ei is called forward. j) = j). e1 . . in which all edges are forward.often denoted G = (V. When graphterminology is used for digraphs. i. vi ). Any digraph has a underlying graph. Let us return to the concept of orientations on the edges.. E(G)) consists of a ﬁnite set of vertices V (G) and a set of edges E(G) .e E = {(i. A dicircuit is a circuit. v is called a cutvertex. 6. An example of a circuit in Figure 1. and where ei is either (vi−1 . 6. A graph G can be strongly connected.. If G is connected. vk . A path P in the digraph G is a sequence v0 . e2 . if a path from i to j exists for all pairs (i. If ei is equal to (vi−1 . The number of vertices resp. and j is called a neighbor to i and they are called adjacent. A graph G is connected. v ∈ V .
. E). j) : i. each ei an edge. 2.1 is 1. ek . Each edge has a startvertex (tail . explain and give a couple of small examples.
1. 2.2
1.1. vk−1 are all diﬀerent. which is found by ignoring the direction of the edges. A edge (i. P is called a directed path. 3. When talking about directed graphs it means that directions of the edges should be obeyed.1
Algorithms
Concepts
An algorithm is basically a recipe that in a deterministic way describes how to accomplish a given task. G = (V (G).. m : V (G) = n and E(G) = m. j) : i.. vi ) or (vi . E is a subset of {(i. if all edges are forward. edges in G is denoted n resp. show some pseudo code.. . if G is also connected G is a tree. and G \ v is not connected. otherwise ei is backward. j ∈ V }.
3
Depthﬁrst search Breadthﬁrst search
.4
Introduction
1.2.2 1.2.
but the solution provides only a bound for the solution of the primal problem.
.Chapter
2
Teaching Duality in Linear Programming – The Multiplier Approach
Duality in LP is often introduced through the relation between LP problems modelling diﬀerent aspects of a planning problem. Though providing a good motivation for the study of duality this approach does not support the general understanding of the interplay between the primal and the dual problem with respect to the variables and constraints. However. M. This paper describes the multiplier approach to teaching duality: Replace the primal LPproblem P with a relaxed problem by including in the objective function the violation of each primal constraint multiplied by an associated multiplier. This is the dual problem of P . The new problem is hence to choose the multipliers so that this bound is optimized. LP duality is described similarly in the work by A. we suggest here that the approach is used not only in the technical parts of a method for integer programming. The relaxed problem is trivial to solve. Geoﬀrion on Lagrangean Relaxation for Integer Programming. but as a general tool in teaching LP.
The dual of a given LPproblem can then be found by transforming this to a problem of one of the two symmetrical types and deriving it’s dual through the deﬁnition. and the goal is to minimize the unit cost of the blend. [2. Duality in LP is in most textbooks as e. where no natural interpretation of the dual problem in terms of primal parameters exists. 9] introduced using examples building upon the relationship between the primal and the dual problem seen from an economical perspective.6
Teaching Duality in Linear Programming – The Multiplier Approach
2. Also accompanying theorems on unboundedness and infeasibility. Duality is introduced by stating that two LPproblems are dual problems by deﬁnition. be a blending problem: Determine the contents of each of a set of available ingredients (e. 7.1 (Duality Theorem) If feasible solutions to both the primal and the dual problem in a pair of dual LP problems exist.g. in situations.e. and the ﬁnal blend has to satisfy certain requirements regarding the total contents of minerals and vitamins. General descriptions of duality are often handled by means of symmetrical dual forms as introduced by von Neumann.g. The primal problem may e. While giving a good motivation for studying dual problems this approach has an obvious shortcoming when it comes to explaining duality in general. The classical duality theorems are then introduced and proved.g. The dual problem here turns out to be a problem of prices: What is the maximum price one is willing to pay for “artiﬁcial” vitamins and minerals to substitute the natural ones in the blend? After such an introduction the general formulations and results are presented including the statement and proof of the duality theorem: Theorem 2. Each ingredient contains varying amounts of vitamins and minerals.1
Introduction
Duality is one of the most fundamental concepts in connection with linear programming and provides the basis for better understanding of LP models and their results and for algorithm construction in linear and in integer programming. fruit or grain) in a ﬁnal blend. and the Complementary Slackness Theorem are presented in most textbooks. Though perfectly clear from a formal point of view this approach does not provide any understanding the interplay
. then there is an optimum solution to both systems and the optimal values are equal. The costs of units of the ingredients are given. i.
4]. Now ﬁnd that problem among the easy ones.2. The main contribution of the paper is not theoretical but pedagogical: the derivation of the dual of a given problem can be presented without problem transformations and deﬁnitions. which provides the strongest bounds. The reader is assumed to have a good knowledge of basic linear programming. which are hard to motivate to people with no prior knowledge of duality. In [1. Section 3 presents the general formulation of duality through multipliers and the proof of the Duality Theorem. In the LP case we relax the primal problem into one with only nonnegativity constraints by including in the objective function the violation of each primal constraint multiplied by an associated multiplier. Hence. The motivating idea is that of problem relaxation: If a problem is diﬃcult to solve. A similar approach is sketched in [8. The advantages of this approach are 1) that the dual problem of any given LP problem can be derived in a natural way without problem transformations and deﬁnitions 2) that the primal/dual relationship between variables and constraints and the signs/types of these becomes very clear for all pairs of primal/dual problems and 3) that Lagrangean Relaxation in integer programming now becomes a natural extension as described in [3.
. The paper is organized as follows: Section 2 contains an example of the approach sketched. The dual problem now turns out to be the problem of maximizing this lower bound. concepts as Simplex tableau. the optimal value of the relaxed problem is a lower bound (in case of primal minimization) on the optimal primal value. a presentation of general duality trying to provide this understanding is given. 5]. In the following another approach used by the author when reviewing duality in courses on combinatorial optimization is suggested. For each choice of multipliers respecting sign conditions derived from the primal constraints. will be used without introduction. Also standard transformations between diﬀerent forms of LP problems are assumed to be known. Appendix II]. but the presentation is rather complicated. reduced costs etc. then ﬁnd a family of easy problems each resembling the original one in the sense that the solution provides information in terms of bounds on the solution of our original problem. basic variables. and Section 4 discusses the pros and cons of the approach presented.1 Introduction
7
between signs of variables in one problem and type of constraints in the dual problem.
· · · . the Pasta Basta lasagne.1: Contents of R and Q and price for diﬀerent types of fruit
2.2
A blending example
The following example is adapted from [6].1. ton F1 2 3 13 F2 3 0 6 F3 0 2 4 F4 5 4 12
Table 2. it is obvious that LP minimization problems with no constraints but nonnegativity of x1 .t. in order to ensure durability. If the complete production is based on ecologically produced preservatives. These contents and the cost of buying the fruit are speciﬁed in Table 2. Natural questions when one knows one solution method for a given type of problem are: Is there an easier way to solve such problems? Which LP problems are trivial to solve? Regarding the latter. and there are four types of fruit each with their particular content (number of units) of R and Q in one ton of the fruit. which types of fruit and which amounts should be bought in order to supply the necessary preservatives in the cheapest way? The problem is obviously an LPproblem:
min (P ) s. x2 . One of their products. Artiﬁcially produced counterparts R’ and Q’ are usually used in the production – these are bought from a chemical company PresChem.
13x1 2x1 3x1
+6x2 +3x2
+4x3 +2x3
+12x4 +5x4 +4x4 =7 =2
x1 . but are undesirable from the ecological point of view. xn are trivial to solve:
.2 as result. The Pasta Basta company wants to evaluate an ecological production versus a traditional one. x4 ≥ 0 With x2 and x3 as initial basic variables the Simplex method solves the problem with the tableau of Table 2. has to contain certain preservatives. R and Q can alternatively be extracted from fresh fruit.8
Teaching Duality in Linear Programming – The Multiplier Approach Fruit type Preservative R Preservative Q Cost pr. Pasta Basta has a market corresponding to daily needs of 7 units of R and 2 units of Q. R and Q. x3 .
y2 ) gives some information regarding the solution of P . xn ≥ 0
If at least one cost coeﬃcient is negative the value of the objective function is unbounded from below (in the following termed “equal to −∞”) with the corresponding variable unbounded (termed “equal to ∞”) and all other variables equal to 0. y2 . and the maximum of these lower bound turns out to be equal to the optimal value of P .t. · · · . one for each value of y1 .2: Optimal Simplex tableau for problem LP
. None of these seem to be the problem we actually want to solve. otherwise it is 0 with all variables equal to 0.··· .2 is not of the form just described. y2 ))
13x1 + 6x2 + 4x3 + 12x4 + y1 (7 − 2x1 − 3x2 − 5x4 ) min x1 .
c1 x1 + · · · + cn xn x1 . Costs x2 x4
15 3/2 1/2
Table 2. but the solution of each P R(y1 .2 A blending example
9
min s. However. such a problem may easily be constructed from P : Measure the violation of each of the original constraints by the diﬀerence between the righthand and the lefthand side:
7 − (2x1 + 3x2 + 5x4 ) 2 − (3x1 + 2x3 + 4x4 ) Multiply these by penalty factors y1 and y2 and add them to the objective function:
(P R(y1 . y2 .x4 ≥0 + y2 (2 − 3x1 − 2x3 − 4x4 )
We have now constructed a family of relaxed problems. which are easy to solve. The idea of replacing a diﬃcult problem by an easier one. x1 15/2 7/12 3/4 x2 0 1 0 x3 3 5/6 1/2 x4 0 0 1
Red. for which the optimal solution provides a lower bound for the optimal solution of the original problem.2. is also the key to understanding BranchandBound methods in integer programming. The blending problem P of 2. It is a lower bound for any y1 .
y2 ) ≤ opt(P ) 2. i. The lower bound for opt(P ) provided by such a pair of yvalues is of no value.e. x4 in P R(y1 .10
Teaching Duality in Linear Programming – The Multiplier Approach
Let opt(P ) and opt(P R(. ∀y1 . x4 satisfying the constraints of P . · · · . y2 .)). it is neither infeasible nor unbounded .3. y2 and follows from the fact that for any set of values for x1 .y2 ∈R (opt(P R(y1 . · · · . y2 it must also hold for the pair giving the maximum value of opt(P R(. we concentrate on yvalues for which this does not happen. y2 )): (13 + (6 + (4 min x1 . maxy1 . Hence opt(P R(y1 . Now observe the following points:
1.we return to that case in Section 2. These pairs are exactly those assuring that each coeﬃcient for x1 . · · · . y2 ∈ R: opt(P R(y1 . y2 are ﬁxed the term 7y1 + 2y2 in the objective function is a constant. y2 ))
x1 x2 x3 x4
Since y1 . x4 is nonnegative:
. Hence. y2 ))) ≤ opt(P )
1) states that opt(P R(y1 .) resp. Turning back to the relaxed problem the claim was that it is easily solvable for any given y1 . If any coeﬃcient of a variable is less than 0 the value of P R is −∞. In the next section we will prove that
y1 .y2 ∈R
max (opt(P R(y1 . y2 ))) = opt(P )
The best bound for our LP problem P is thus obtained by ﬁnding optimal multipliers for the relaxed problem. y2 )) is found by minimizing over a set containing all values of feasible solutions to P implying 1). Since 1) holds for all pairs y1 .x4 ≥0 + (12 + −2y1 −3y1 −5y1 7y1 −3y2 ) ) −2y2 ) −4y2 ) +2y2
(P R(y1 .··· . We have here tacitly assumed that P has an optimal solution. which is 2). We just collect terms to ﬁnd the coeﬃcients of x1 . y2 ) is a lower bound for opt(P ) for any choice of y1 .)) denote the optimal values of P and P R(. the values of P and P R are equal since the terms originating in the violation of the constraints vanish.
2 A blending example
11
(13 (6 (4 (12
−2y1 −3y1 −5y1
−3y2 ) ) −2y2 ) −4y2 )
≥0 ≥0 ≥0 ≥0
⇔ ⇔ ⇔ ⇔
2y1 3y1 5y1
+3y2 2y2 +4y2
≤ ≤ ≤ ≤
13 6 4 12
If these constraints all hold the optimal solution to P R(y1 . and with the type of constraints determining the signs of the corresponding dual variables. In the next section we will discuss the construction in general. The company PresChem selling the artiﬁcially produced counterparts R’ and Q’ to Pasta Basta at prices r and q is considering to increase these as much as possible well knowing that many consumers of Pasta Basta lasagne do not care about ecology but about prices. since y1 . These customers want as cheap a product as
. with variables of one problem corresponding to constraints of the other. the proof of the Duality Theorem as stated in the introduction. so nothing is gained with respect to ease of solution – we have no reason to believe that DP is any easier to solve than P . y2 ) has x1 . We end this section by deriving DP as frequently done in textbooks on linear programming. (DP )
7y1 2y1 3y1 5y1 y 1 .2. y2 ∈ R
+2y2 +3y2 2y2 +4y2 ≤ 13 ≤ 6 ≤ 4 ≤ 12
The problem DP resulting from our reformulation is exactly the dual problem of P .t. and the question of unboundedness/infeasibility of the primal problem. It is again a linear programming problem. x4 all equal to 0 with a value of 7y1 + 2y2 which. and we have through 1) and 2) proved the socalled Weak Duality Theorem – that opt(P ) is greater than or equal to opt(DP ). · · · . the above example indicates that linear programming problems appear in pairs deﬁned on the same data with one being a minimization and the other a maximization problem. is larger than −∞. Since we want to maximize the lower bound 7y1 + 2y2 on the objective function value of P . However. Using the multiplier approach we have derived the dual problem DP of our original problem P . y2 are ﬁnite. we have to solve the following problem to ﬁnd the optimal multipliers:
max s.
The prices r and q are normally regarded to be nonnegative.3. but the very unlikely possibility exists that it may pay oﬀ to oﬀer Pasta Basta money for each unit of one preservative used in the production provided that the price of the other is large enough. For example.3
2.
2. at the cost of 13. Pasta Basta can extract 2 units of R and 3 units of Q from one ton of F1. Therefore the prices are allowed to take also negative values. A is an m × n matrix and b is an n × 1 matrix of reals. Of course r and q cannot be so large that it is cheaper for Pasta Basta to extract the necessary amount of R and Q from fruit. The optimization problem of PresChem is thus exactly DP .12
Teaching Duality in Linear Programming – The Multiplier Approach
possible. If the complete production of lasagne is based on P’ and Q’. the proﬁt of PresChem is 7r + 2q. The process just described can be depicted as follows:
.1
General formulation of dual LP problems
Proof of the Duality Theorem
The typical formulation of an LP problem with n nonnegative variables and m equality constraints is
min
cx Ax = b x ≥ 0
where c is an 1 × n matrix. Hence
2r + 3q ≤ 13
The other three types of fruit give rise to similar constraints. and Pasta Basta must also produce a cheaper product to maintain its market share.
Hence. which is exactly the same as the primal objective value obtained by assigning to the basic variables xB the values deﬁned by the updated righthand side B −1 b multiplied by the vector of basic costs cB . Then any set of multipliers must give rise to a dual solution with value −∞
. Hence an optimal basis B and a corresponding optimal basic solution xB for P exists.2. The case in which the problem P has no optimal solution is for all types of primal and dual problems dealt with as follows. The vector yB = cB B −1 is called the dual solution or the set of Simplex multipliers corresponding to B. Consider ﬁrst the situation where the objective function is unbounded on the feasible region of the problem. The vector satisﬁes that if the reduced costs of the Simplex tableau is calculated using yB as π in the general formula
c = c − πA ¯ then ci equals 0 for all basic variables and cj is nonnegative for all nonbasic ¯ ¯ variables.3 General formulation of dual LP problems
13
min
maxy∈Rm {minx∈Rn {cx + y(b − Ax)}} + maxy∈Rm {minx∈Rn {(c − yA)x + yb}} + max yb yA ≤ c y f ree
cx Ax = b x ≥ 0
→
→ →
The proof of the Duality Theorem proceeds in the traditional way: We ﬁnd a set of multipliers which satisfy the dual constraints and gives a dual objective function value equal to the optimal primal value. Assuming that P and DP have feasible solutions implies that P can be neither infeasible nor unbounded.
yB A ≤ c holds showing that yB is a feasible dual solution. The value of this solution is cB B −1 b.
Showing that a set of multipliers exists such that the optimal value of the relaxed problem equals the optimal value of the original problem is slightly more
. if no primal solution exist we minimize over an empty set . and the dual feasible set is empty. · · · .an operation returning the value +∞.+∞).2
Other types of dual problem pairs
Other possibilities of combinations of constraint types and variable signs of course exist. which ensures that an upper bound results for all choices of multipliers.3. One frequently occurring type of LP problem is a maximization problem in nonnegative variables with less than or equal constraints. x1 .
2. the ˆ ˆ relaxed objective function will have a value greater than or equal to that of the original objective function since b−Aˆ and hence y(b−Aˆ) will be nonnegative. If maximizing (resp. xn . Hence. In this case the dual problem is either unbounded or infeasible. x x Therefore the relaxed objective function will be pointwise larger than or equal to the original one on the feasible set of the primal problem. to the original problem. +∞ for a maximization problem) since this is the only “lower bound” (“upper bound”) allowing for an unbounded primal objective function. The set of multipliers minimizing this bound must now be determined.14
Teaching Duality in Linear Programming – The Multiplier Approach
(resp. Finally. no set of multipliers satisfy the dual constraints. The construction of the dual problem is outlined below: cx Ax ≤ b x ≥ 0
max
→
miny∈Rm {maxx∈Rn {cx + y(b − Ax)}} + + miny∈Rm {maxx∈Rn {(c − yA)x + yb}} + + min yb yA ≥ c y ≥ 0
→ →
Note here that the multipliers are restricted to being nonnegative. the desired relation between the primal and dual problem with respect to objective function value holds – the optimum values are equal. thereby ensuring that for any feasible solution. minimizing) over an empty set returns the value −∞ (resp.
yI ≥ 0 i. Hence the relaxed objective function should be pointwise less than or equal to the original one on the feasible set. which often creates severe diﬃculties in the teaching situation.2. often no explanation of the interplay is provided. The interplay between the types of primal constraints and the signs of the dual variables is one of the issues of duality. The proof just given now applies for P ′ and DP ′ . We have previously illustrated this interplay in a number of situations. and the dual problem
.e. s ≥ 0
Note now that if we derive the dual problem for P ′ using multipliers we end up with the dual problem of P : Due to the equality constraints. the multipliers are now allowed to take both positive and negative values. For the sake of completeness we now state all cases corresponding to a primal minimization problem – the case of primal maximization can be dealt with likewise. First note that the relaxed primal problems provide lower bounds. Using the common approach to teaching duality. which we want to maximize. and the nonnegativity of the optimal multipliers yB are ensured through the sign of the reduced costs in optimum since these now satisfy
c = (c 0) − yB (A I) ≤ 0 ¯
⇔
yB A ≥ c
∧
yB ≥ 0
Since P ′ and P are equivalent the theorem holds for P and DP as well.3 General formulation of dual LP problems
15
complicated than in the previous case. The constraints on the multipliers imposed by the identity matrix corresponding to the slack variables are. The reason is that the value of the relaxed objective function no longer is equal to the value of the original one for each feasible point. however. A standard way is to formulate an LP problem P ′ equivalent to the given problem P by adding a slack variable to each of the inequalities thereby obtaining a problem with equality constraints: max cx max Ax ≤ b = x ≥ 0 cx +0s Ax +Is = b x. exactly the nonnegativity constraints imposed on the multipliers by the inequality constraints of P . it is larger than or equal to this.
e. s ≥ 0
give rise to exactly the same dual problem. the j’th dual constraint will be (yA)j ≤ cj . Regarding the signs of the dual variables we get the following for the three possible types of primal constraints (Ai.16
Teaching Duality in Linear Programming – The Multiplier Approach
is a maximization problem.e. x = bi For a feasible x.e.3.3
The Dual Problem for Equivalent Primal Problems
In the previous section it was pointed out that the two equivalent problems max cx max Ax ≤ b = x ≥ 0 cx +0s Ax +Is = b x. x) should be nonpositive. and yi (bi − Ai. bi −Ai. and yi (bi − Ai. The coeﬃcient of xj is (c − yA)j . i. x) should be nonpositive. which we previously have not explicitly discussed. yi should be nonpositive as well. these are determined by the sign of the coeﬃcients to the variables in the relaxed primal problem in combination with the sign of the variables themselves.e a
. x ≤ bi For a feasible x. denotes the i’th row of the matrix A): Ai. which may be nonnegative. no sign constraints should be imposed on yi . i. xj ≤ 0 In order not to allow unboundedness of the relaxed problem (c − yA)j must be less than or equal to 0. xj f ree In order not to allow unboundedness of the relaxed problem (c−yA)j must be equal to 0 since no sign constraints on xj are present. x) should be nonpositive. bi − Ai. and yi (bi −Ai. x is less than or equal to 0. x is larger than or equal to 0. nonpositive or free. Again we have three cases: xj ≥ 0 To avoid unboundedness of the relaxed problem (c − yA)j must be greater than or equal to 0. x is equal to 0. Hence. Let P ′ be a minimization problem in standard form. Ai. i. the j’th dual constraint will be (yA)j ≥ cj . i.
2. the j’th dual constraint will be (yA)j = cj . Suppose P is any given minimization problem in variables. Hence. yi should be nonnegative. bi − Ai. Ai. x ≥ bi For a feasible x. This is true in general. Regarding the types of the dual constraints. Hence.
The subtraction of slack variables are dealt with similarly. These give rise to two dual constraints. thereby reintroducing the sign constraint for yi . The sign change of the column in combination with the change in preferred sign of the objective function coeﬃcient leaves the dual constraint unchanged.
′
.2: −yi ≤ 0 ⇔ yi ≥ 0. cf. whereas a negative sign of the coeﬃcient of a nonnegative variable is preferred. subtraction of surplus variables from ≥constraints. Section 2. If a slack variable is subtracted from the lefthand side of the inequality constraint to obtain an equation ai1 x1 + · · · + ain xn − si = bi ⇔ (bi − ai1 x1 − · · · − ain xn ) + si = 0 the multiplier must now be allowed to vary over R. We have commented upon the addition of slack variables to ≤constraints in the preceding section. a positive sign of the coeﬃcient of a nonpositive variable is beneﬁcial. P ′ also has DP as its dual problem. is introduced by the column of the slack variable. which must be nonnegative in order for the relaxed objective function to provide a lower bound for the original one on the feasible set. A constraint ai1 x1 + · · · + ain xn ≥ bi ⇔ (bi − ai1 x1 − · · · − ain xn ) ≤ 0 gives rise to a multiplier. If a nonpositive variable xj is substituted by xj of opposite sign. Since the Duality Theorem holds for P ′ and DP as shown previously and P ′ is equivalent to P .3 General formulation of dual LP problems
17
problem in nonnegative variables with equality constraints. For minimization purposes however. A new constraint in the dual problem. the theorem also holds for P and DP . constructed from P by means of addition of slack variables to ≤constraints. all signs in the corresponding column of the Simplex tableau change. The proof of the Duality Theorem for all types of dual pairs P and DP of LP problems may hence be given as follows: Transform P into a standard problem P ′ in the well known fashion.2. which when taken together result in the same dual equality constraint as obtained directly. and change of variables. Finally. however. if a free variable xj is substituted by the diﬀerence between two non′ ′′ negative variables xj and xj two equal columns of opposite sign are introduced. Then the dual problems of P and P ′ are equal.
• the clariﬁcation of the interplay between the sign of variables and the type of the corresponding constraints in the dual pair of problems. • the possibility to avoid problem transformation and “duality by deﬁnition” in the introduction of general duality in linear programming. I do not advocate that duality should be taught based solely on the multiplier approach. The decision on whether to give the traditional presentation of duality or the multiplier approach ﬁrst of course depends on the particular audience. In my experience.18
Teaching Duality in Linear Programming – The Multiplier Approach
2.e. and • the resemblance with duality in nonlinear programming.4
Discussion: Pros and Cons of the Approach
The main advantages of teaching duality based on multipliers are in my opinion • the independence of the problem modeled by the primal model and the introduction of the dual problem. cf.
. but rather that it is used as a supplement to the traditional presentation (or vice versa). • the early introduction of the idea of getting information about the optimum of an optimization problem through bounding using the solution of an easier problem. [3]. The only disadvantage in my view is one listed also as an advantage: • the independence of the introduction of the dual problem and the problem modelled by the primal model since this may make the initial motivation weaker. i. which can be used to avoid the situation of frustrated students searching for an intuitive interpretation of the dual problem in cases. where such an interpretation is not natural. it oﬀers a valuable supplement. • the possibility of introducing partial dualization by including only some constraint violations in the objective function. that no story has go with the dual problem.
Williams. X. [9] W. 1976. 1971.edu/ goemans. Winston. Goemans. Int. 1974. 1993. Operations Research . Princeton University Press.Applications and Algorithms. P. Applied Dynamic Programming. E. Course Notes. Princeton University Press. Geoﬀrion. John Wiley. 1963. Krarup and C.mit.
. Prog.37. L. Linear and Combinatorial Programming. VUB Press (to appear). E. Model Solving in Mathematical Programming. Thomson Publishing. Lagrangean Relaxation for Integer Programming.114. [8] H. [4] A. [2] G. 1962. Linear Programming and Extensions. Study 2 82 . 1 1 . M. SIAM Review 13. Belgium’s Best Beer and Other Stories. John Wiley. Dantzig. [6] J. Linear Programming. Murty. [5] M. B. available from http://theory. Bellman and S. Duality in NonLinear Programming: A Simpliﬁed ApplicationsOriented Approach. G. 1994. Math. Vanderhoeft. Dreyfus. M.Bibliography
[1] R.lcs. Geoﬀrion. 1994. [7] K. [3] A.
20
BIBLIOGRAPHY
.
Part I
Network Optimization
.
.
Let us recall that a tree is a connected graph with no cycles. that is. We will generally assume the cost to be nonnegative.Chapter
3
The Minimum Spanning Tree Problem
The minimum spanning tree problem is one of the oldest and most basic graph problems in computer science and Operations Research.1 shows an MST in a graph G. E) be given. w). Boruvka. E. we have a cost function w : E → R. a Czech mathematician was investigating techniques to secure an eﬃcient electrical coverage of Bohemia. Let an undirected graph G = (V. and that a spanning tree is a subgraph in G that is a tree and includes all vertices of G i. Figure 3. but as we will see it is more a matter of convenience. The minimum spanning tree (MST) problem calls for ﬁnding a spanning tree whose total cost is minimum.
. V . The network will be denoted G = (V.e. The total cost is measured as the sum of the costs of all the edges in the tree. Note that a network can have many MST’s. Each of the edges are assigned a cost (or weight or length) w. Its history dates back to Boruvka’s algorithm developed in 1926 and still today it is an actively researched problem.
3. 6 and 2. 5 are not connected
1
(c) Not a spanning tree as the subgraph contains a cycle 2.
.1: Examples illustrating the concept of spanning and minimum spanning tree.24
The Minimum Spanning Tree Problem
1
4 1 4
3
2
3 2
4
3 2
6
5
6
(a) A network
1 2
1 2
4 1 4
3
4 1
3 2
4
3
2
5
3
4
3 2
6
5
4
3 6 2
6
6
(b) Not a spanning tree as 1. 5. 2
1
4 1 4
3
2
4 1 4
3
2
3 2
3 2
4
3 2
6
5
4
3 2
6
5
6
6
(d) A spanning tree with a value of 18
(e) A minimum spanning tree with a value of 12
Figure 3. 4. 3.
X ′ } is a partition of the set of vertices into two subsets. j). We deﬁne a binary variable xij for each edge (i. v) ∈ E : u ∈ X.2.
3. At some few occasions you might meet the maximum spanning tree problem (or as Wolsey calls it the “maximum weight problem”).1 Optimality conditions
25
We will denoted edges in the spanning tree as tree edges and those edges not part of the spanning tree as nontree edges. Furthermore. For each edge a cost of digging down the cables have been computed. The notion of a cut is shown in Figure 3. We say that a cut respects a set A of edges if no edge in A crosses the cut. In its operations it is constrained to only lay cables along certain connections represented by the edges. The MST gives us the cheapest solution that connects all houses. j) is in the subgraph otherwise
For a spanning tree of n vertices we need n − 1 edges. denote the set of arcs crossing the cut δX = {(u. The minimum spanning tree problem arises in a wide number of applications both as a “stand alone” problem but also as subproblem in more complex problems. For X ⊂ V .1
Optimality conditions
A central concept in understanding and proving the validity of the algorithms for the MST is the concept of a cut in an undirected graph. v ∈ V \ X}. As an example consider a telecommunication company planning to lay cables to a new neighborhood. a spanning tree). At the same time a network that reaches all houses must be build (ie. Let us use the deﬁnition of a cut to establish a mathematical model for the problem. Edges that have one endpoint in the set X and the other in the endpoint X ′ is said to cross the cut. where we wish to maximize the total cost of the constituent edges. for each possible cut C at least one of the edges crossing the cut must be in the solution
. A cut C = {X.3. can be solved by multiplying all costs with −1 and the solve the minimum spanning problem. Let xij = 1 0 if (i. Solving the maximum spanning tree problem.
xe ≥ 1.
e∈γ(S)
∅⊂S⊂V
xe ∈ {0. j ∈ S}. Therefore we get the following model:
The Cutset formulation of the MST
min
e∈E
we xe xe = n − 1
e∈E
s. – otherwise the subgraph deﬁned by the solution is not connected. 1} where γ(S) = {(i. j) : i ∈ S.26
The Minimum Spanning Tree Problem
X
X’
Figure 3. The edges crossing the cut {X. X ′ } are shown in bold.
e∈δ(S)
∅⊂S⊂V
xe ∈ {0. Note that with the deﬁnition of a cut if we delete any tree edge (i. 1}
Sub tour elimination formulation of the MST
min
e∈E
we xe xe = n − 1
e∈E
s.
xe ≤ S − 1.2: The cut C deﬁned by the partitioning X ⊂ V and X ′ = V \ X is shown for a graph G.t. j) from a
.t.
5.1 Optimality conditions
27
spanning tree it will partition the vertex set V into two subsets. A is a subset of edges of some minimum spanning tree. Although diﬀerent they both use the greedy principle in building an optimal solution. The algorithm maintains a set A with the following condition: Prior to each iteration. 1 In this chapter we will look at two algorithms for solving the minimum spanning tree problem. In many textbooks on algorithms you will also see the algorithms for the minimum spanning tree problem being used as it is a classical application of the greedy principle. j) into a spanning tree we will create exactly one cycle. j)} is a subset of edges of some minimum spanning tree before the next iteration. At each iteration we determine an edge (i. 2.3). The greedy principle advocates making the choice that is best at the moment.
. which grows the minimum spanning tree one edge at a time. Finally. So we get the following selection rule for a “generic” algorithm. 4. The greedy strategy shared by both our algorithms can be described by the following “generic” algorithm. 4) to the spanning tree forms a unique cycle 1. Furthermore if we insert an edge (i. the path from i to j in the spanning tree will be uniquely deﬁned (see Figure 3. j) that can be added to A without breaking the condition stated above. So that if A is a subset of edges of some minimum spanning tree before the iteration then A ∪ {(i.3. for any pair of vertices i and j. An edge with this property will be denoted a safe edge. and although it in general is not guaranteed to provide optimal solutions it does in this case.
1 2
4 1 4
3
3 2
4
3 2
6
5
6
Figure 3.3: Adding the edge (1.
edge (i. j) is therefore a safe edge. This will then prove that (i. l) is
. j) is a safe edge. j)} by using a cutandpaste technique. S}.
Now the interesting question is. Theorem 3. Note that there can be more than one light edge crossing a cut. Now the general idea of the proof is to construct another minimum spanning tree T ′ that includes A ∪ {(i. j) be a light edge crossing the cut {S. As A must be a proper subset of T there must be an edge (i. how do ﬁnd these safe edges? First.28
The Minimum Spanning Tree Problem
Algorithm 1: The Selection rule
1 2 3
select a cut in G that does not contain any selected edges determine a safe edge of the cutset and select it if there are more than one safe edge select a random one of these
So how do we use the selection rule?
• The selection rule is used n − 1 times as there are n − 1 edges in a tree with n vertices. j) ∈ A. there is at least one edge in T on the path p that also crosses the cut. j)} it contains a cycle (see Figure 3. a safe edge must exist. If we look at the graph T ∪ {(i. Now we can state our rule for recognizing safe edges. • Initially none of the edges in the graph are selected. Let (k. S} be any cut of G that respects A. Let {S. As the cut respects A (k. j) ∈ T such that (i. The condition deﬁnes that there exists a spanning tree T such that A ⊆ T . undirected graph (V. of course. Proof: Let T be a minimum spanning tree that includes A. In order to come up with a rule for recognizing safe edges we need to deﬁne a light edge. S}. A light edge crossing a cut is an edge with the minimum weight of any edge crossing the cut. l) be such an edge. Each iteration will select exactly one edge. Since i and j are on opposite sides of the cut {S. so lets assume that this is not the case.4). w) be a connected. Let A be a subset of E that is part of some ¯ minimum spanning tree for G. E) with a weight function w : E → R. j) we are done. If T contains the light edge (i.1 Let G = (V. and ¯ let (i. Then. l) cannot be in A. Let us ¯ call this cycle p. E. During the process one edge at a time will be selected and when the methods stops the selected edges form a MST for the given graph G. j) is safe for A. Furthermore as (k. The edge (i.
.3. A minimum spanning tree T ′ that contains (i. but not the edges in G.4: The edges in the minimum spanning tree T is shown. l) is an edge on the unique path p from i to j in T . j). The edges already in A are drawn extra fat. S ′ = S}. j) is a light edge ¯ crossing the cut {S. and (i. The edge (k.1 Optimality conditions
29
k
p
l
S’ S
i
j
Figure 3. j) is formed by removing the edge (k. l) from T and adding (i.
We consider the cut X = {1. Moreover. The question is whether T ′ is a spanning tree? Since (i. This will add (1. It is is added to the solution. j) reconnects the two components thus forming a new spanning tree T ′ = T \ {(k. As A ⊆ T and (k.2 and 3. More conceptually A deﬁnes a graph GA = (V. We can consider any cut we would like. (i.△ Now we can look at how an execution of the generic algorithm would look like. S} and (k. Here the cheapest edge crossing the cut is (2. l)} ∪ {(i. 6} (b). l) also crosses the cut wij ≤ wkl And therefore wT ′ = wT − wkl + wij ≤ wT
and as T is a minimum spanning tree we have wT ≤ wT ′ . one for each vertex. When the algorithm begins A is empty and the forests contains n trees. which has become a spanning tree. since T ′ is a minimum spanning tree. j) is safe for A. GA is a forest. l). Cheapest edge crossing the cut is (4.
. j) for A connects distinct components of GA . since A ∪ {(i. We can not ﬁnd any cuts that does not contain edges in the tree crossing the cut (f). Initially no edges are selected. So T ′ must also be a minimum spanning tree.30
The Minimum Spanning Tree Problem
on the unique path in T from i to j T would decompose into two components if we remove (k. The edge (3. We can now pick any cut not containing the edge (4. What we miss is just to prove that (i. The two algorithms in section 3.5 illustrates the execution of the generic algorithm. j)}. 4) to the solution. 4. Figure 3. Adding (i. j) is a light edge (that was one of the initial reasons for picking it) ¯ crossing the cut {S. j)} must be acyclic. Let us use the cut X = {6} and X ′ = V \ X (a). 2. The cheapest edge crossing the cut is (4. 3) which is added to the solution. j) is actually a safe edge for A. Consequently. Next we consider the cut X = {2} (c). 3} (d). A). and each of the connected components of GA is a tree. l) ∈ A then A ⊆ T ′ . 5) is now identiﬁed as the cheapest and is added to the solution. 5) which is added to the solution. any safe edge (i. 8).3 use the following corollary to our theorem. 6). so A ∪ {(i. j)} ⊆ T ′ . The next cut we consider is X = {1} (e). Now let us consider the cut X = {1. The selection rule tells us to pick a cut respecting A and then choose an edge of minimum weight in this cut.
EC ) be a component in the forest GA = (V.1 Optimality conditions
31
1
4 1 4
3
2
3 2
4
3 2
6
5
6
(b) Step 1
(a) Starting graph
(c) Step 2
1
(d) Step 3
2 4 1 4 3
3
2 4 3 5
2 6
6
(e) Step 4
(f) Step 5
Figure 3. j) is safe for A.2 Let G = (V.
Corollary 3. E. w) be a connected. Let A be a subset of E that is included in some minimum spanning tree for G. undirected graph with weight function w : E → R. then (i. and let C = (VC .3. If (i.5: The execution of the generic algorithm on our sample graph. j) is a light edge connecting C to some other component in GA . A).
.
VC } respects A. Therefore (i. △
. j) is safe for A. and (i.32
The Minimum Spanning Tree Problem
¯ Proof: The cut {VC . j) is a light edge for this cut.
v)
Running Kruskal’s algorithm on the example of Figure 3. (4. 4) is selected (f). Then edge (4.2 Kruskal’s Algorithm
33
3. We terminate when F  = n − 1. and due to the nondecreasing weight property. 5) is selected (d). Initially F will therefore be empty. First we sort all edges in nondecreasing order of their cost. (1. 5). Had the order between (1. 5) is selected (1.1 is a natural basis for the ﬁrst of the algorithms for the MST that will be presented. First the edge (2. Instead (4. (1. it is a minimum weight edge across that cut. (2. Furthermore. (4. 5) is discarded as it would create a cycle (2 → 3 → 5 → 2) (e). 2). 3) is selected (b). If it creates a cycle then one of the edges on the cycle cannot be part or the minimum spanning tree and as we are checking the edges in nondecreasing order. 6). Algorithm 2: Kruskal’s algorithm Data: Input parameters Result: The set F containing the edges in the MST
1 2 3 4 5
F ←∅ L ← edges sorted by nondecreasing weight for each (u.6. They are then stored in a list called L. We therefore discard it. the currently considered is the one with the largest cost. 5). 5). As 5 edges are selected
.1 results in the steps depicted in Figure 3. First the edges are sorted in nondecreasing order: (2. Now we examine the edges one at a time in the order they appear in L and check whether adding each edge to the current set F will create a cycle. v) ∈ L do if a path from u to v in F does not exist then F ← F ∪ (u. 3). Pseudocode for the algorithm is presented in Algorithm 2. the edges in F constitute a minimum spanning tree T . Otherwise the edge is added to F . At termination. The edge (2. That is. 6). it is a light edge but then it is also a safe edge. and afterwards the edge (3.3.2
Kruskal’s Algorithm
The condition of theorem 3. 4). (3. 4) and (1. it is the ﬁrst edge added across some cut. (5. 6) is selected (c). The argument for correctness follows from that if an edge is added. This algorithm is known as Kruskal’s algorithm. we deﬁne a set F that will contain the edges that have been chose to be part of the minimum spanning tree being constructed. 2) been diﬀerent we would have chosen diﬀerently. We build the minimum spanning tree from scratch by adding one edge at a time.
6: The execution of Kruskal’s algorithm on our sample graph.
1
4 1 4
3
2
1
4 1 4
3
2
3 2
3 2
4
3 2
6
5
4
3 2
6
5
6
6
(b) Step 1
(a) Starting graph
(c) Step 2
1
(d) Step 3
2 4 1 4 3
3
2 4 3 5
2 6
6
(e) Step 4
(f) Step 5
Figure 3.
.34
The Minimum Spanning Tree Problem
for a graph with 6 vertices we are done.
Sorting can most eﬃciently be done in O(m log m) = O(m log n2 ) = O(m log n) which is achieved by eg. . In step 3 in the execution of our example F contains three trees. Otherwise we add edge (k. These can be stored as singly linked lists. F2 .
3. In this way we can get a time complexity of O(m + n log n) plus the time it takes to sort the edges. Hence.g. 6 . the bottleneck becomes the identiﬁcation of the minimum cost edge. ﬁnding an edge of minimum ¯ cost crossing the cut {S. all in all O(m log n). When we want to examine whether inserting (k. giving in total the cost O(mn). If that is the case adding (k. that is. and
. We maintain a tree spanning a subset of vertices S and add a nearest neighbour to S. Ff . This is done by ﬁnding an edge ¯ (i. l) will create a cycle or not. we scan through these linked lists and check whether both the vertices k and l belongs to the same linked list.3. If we for each vertex adds two pieces of information we can improve the time complexity: 1. and in order to ﬁnd the minimum cost edge in each iteration we must search through all the m edges. 2. A naive implementation would look like this. l) and thereby merge the two lists containing k and l into one list. l) would produce a cycle. This data structure requires O(n) time for each edge we examine. [1]). Merge Sort (see e. The algorithm terminates when S = V .3
Prim’s algorithm
The idea of the Prim’s algorithm is to grow the minimum spanning tree from a single vertex. . S}. An initial look at the time complexity before we state the algorithm in pseudocode suggests that we select a minimum cost edge n − 1 times. The set F is a any stage of the execution of the algorithm a forest. We arbitrarily select a vertex as the starting point. . li represents the minimum cost of an edge connecting i and some vertices in S. j) of minimum cost with i ∈ S and j ∈ S.3 Prim’s algorithm
35
The time complexity of Kruskal’s algorithm consists of the time complexity of ﬁrst sorting the edges in nondecreasing order and then checking whether we produce a cycle or not. The subsets deﬁning the trees in the forest will be denoted F1 . 5 and 4. 3. The time to detect a cycle depends on the method used for this step. So the overall time complexity will be O(mn) A better performance can be achieved using more advanced data structures. Then we iteratively add one edge at a time. namely 1 . .
4)) and the label of vertex 2 is also updated to 4 (due to the edge (1. Each vertex will get a label of +∞ except vertex 1 that will get the value 0. The label pi identiﬁes the edge ¯ (pi .36
The Minimum Spanning Tree Problem 2. and therefore it should be added in the current iteration. We simply compute arg min{li : i ∈ S}. The “classical” binary heap would give us a time complexity of O(m log n). j) ∈ E where j ∈ Q do if wij < lj then pj ← i lj ← wij
Time complexity depends on the exact data structures. 2)) (a). This reduces the time complexity of the identiﬁcation from O(m) to O(n).
1 2 3 4 5 6 7 8 9 10
Q←V pr ← 0. j) ∈ E an edge weight wij is given. E) with n vertices and m edges.
If we maintain these values. Further information can be found in section 3. For each edge (i.4. pi that identiﬁes the other endpoint of a minimum cost edge incident with vertex i. Result: An nvector p the predecessor vertex for i in the minimum spanning tree.7 show the execution of Prim’s algorithm on our example graph. li ← ∞ for i ∈ V \ {r} while Q = ∅ do i ← arg minj∈Q {lj } Q ← Q \ {i} for (i. In order to maintain updated information in the labels we must update the ¯ values of a label as it is moved from S to S. the basic implementations with list representation results in a running time of O(n2 ) A more advanced implementation make use of the heap data structure. Initially we choose 1 to be the vertex we expand from. we can easily ﬁnd the minimum cost of an edge in ¯ the cut. Now vertex 4
. however. Thus the total time complexity for Prim’s algorithm will be O(n2 ) (see algorithm 3). Algorithm 3: Prim’s algorithm Data: A graph G = (V. S}. A source vertex is denoted r. So initially vertex 1 is included in S and the labels of vertex 4 are updated to 4 (due to the edge (1. Figure 3. i) as the minimum cost edge over the cut {S. lr ← 0 pi ← nil.
We have choosen vertex 1 to be the initial vertex in the tree.3 Prim’s algorithm
37
0
0
1
4 1
4 1 4
3
2 4
4 1 4
3
2
3 2
3 2
4
4 2
3 6
6
5
4 4
3 2
6 2
5 3
6
(a) Starting graph
(b) Step 1
(c) Step 2
(d) Step 3
(e) Step 4
(f) Step 5
Figure 3.7: The execution of Prims’s algorithm on our sample graph. A vertex without a ”distance” symbolizes a distance of +∞.
.3.
38
The Minimum Spanning Tree Problem
is selected for inclusion into S. The choice between 2 and 4 is actually arbitraty as both vertices have a label of value 4 (the lowest value of all labels) (b). The selection of vertex 4 updates the values of vertex 5 and 6 to 3 resp. 2. In the next step we select vertex 6 and enter it into set S thereby adding the edge (4, 6) to our solution (c). Next we select vertex 5 which leads to an update of the label of vertex 3 but also vertex 2 (the value is lowered from 4 to 3) (d). We then select vertex 3 and update the value of label 2 (e) before ﬁnally selecting vertex 2. Now S = V and the algorithm terminates.
3.4
Supplementary Notes
For more information of sorting and also data structures for an eﬃcient implementation of the algorithms described in this chapter see [1]. For a description of how the better time complexity can be established for Kruskal’s algorithm we refer to [2]. There exist diﬀerent versions of heap data structures with diﬀerent theoretical and practical properties (Table 3.1 shows some worst case running times for Prim’s algorithm depending on the selected data structure). An initial discussion on the subject related to MST can be found in [2]. heap type Binary heap dheap Fibonacci heap Johnson’s data structure running time O(m log n) O(m logd n)) with d = max{2, m } n O(m + n log n) O(m log log C)
Table 3.1: Running time of Prim’s algorithm depending on the implemented data structure Further information on the heap data structure can be found in [1].
3.5
Exercises
1. Construct the minimum spanning tree for the graph in Figure 3.8. Try to use both Kruskals algorithm and Prims algorithm. 2. An alternative algorithm: Consider the following algorithm for ﬁnding a MST H in a connected graph G = (V, E, w): At each step, consider for
3.5 Exercises
39
1 4 1 7 4 3
2
7 3 7
4
2 5 3 8 2 6 6 3
4
4
Figure 3.8: Find the minimum spanning tree for the graph using both Kruskals and Prims algorithm each edge e whether the graph remains connected if it is removed or not. Out of the edges that can be removed without disconnecting the graph choose the one with the maximum edge cost and delete e from H. Stop if no edge can be removed without disconnecting the graph. Show that the algorithm ﬁnds an MST of G. Apply the algorithm to the example of exercise 1. 3. The MiniMax Spanning Tree problem  MMST: Suppose that rather than identifying a MST wrt. the sum of the edge costs, we want to minimize the maximum cost of any edge in the tree. The problem is called the MiniMax Spanning Tree problem. Prove that every MST is also an MMST. Does the converse hold? 4. The Secondbest Minimum Spanning Tree  SBMST: Let G = (V, E, w) be an undirected, connected graph with cost function w. Suppose that all edge costs are distinct, and assume that E ≥ V . A Secondbest Minimum Spanning Tree is deﬁned as follows: Let T be the set of all spanning trees of graph G, and let T ′ be a MST of G. Then a secondbest minimum spanning tree is a spanning tree T ′′ ∈ T such that w(T ′′ ) = minT ∈T −{T ′ } {w(T )}. (a) Let T be a MST of G. Prove that there exists edges (i, j) ∈ T and (k, l) ∈ T such that T ′ = T − {(i, j)} ∪ {(k, l)} is a secondbest minimum spanning tree of G.
40
The Minimum Spanning Tree Problem Hints: (1) Assume that T ′ diﬀers from T by more than two edges, and derive a contradiction. (2) If we take an edge (k, l) ∈ T and insert it in T we make a cycle. What can we say about the cost of the (k, l) edge in relation to the other edges? (b) Give an algorithm to compute the secondbest minimum spanning tree of G. What is the time complexity of the algorithm? 5. Cycle detection for Kruskals algorithm. Given an undirected graph. Describe an O(m + n) time algorithm that detects whether there exists a cycle in the graph. 6. The endnode problem. ([3]). Suppose that you are given a graph G = (V, E) and weights w on the edges. In addition you are also give a particular vertex v ∈ V . You are asked to ﬁnd the minimum spanning tree such that v is not an endnode. Explain brieﬂy how to modifya minimum spaning tree algorithm to solve the same problem eﬃciently.
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. The MIT Press and McGrawHill, 2001. [2] Ravinda K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network Flows. Prentice Hall, 1993. [3] George B. Dantzig, Mukund N. Thapa. Linear Programming. Springer, 1997.
42
BIBLIOGRAPHY
.
We will later in section 8. Each of the edges are assigned a weight (or length or cost) w. w).krak. There are many applications of the shortest path problem.dk. 5. The problem can in a natural way be extended to the problem of ﬁnding the shortest path from one point to all other points. E. The overall network is then denoted as G = (V. 2 ). It is also used in www. which can be positive as well as negative.
.rejseplanen. It is used in route planning services at diﬀerent webpages (in Denmark examples are www.1 look at the situation where we want to calculate the shortest path between any given pair of vertices. This graph contains a negative length cycle ( 2.dk) and in GPS units. Finally a starting point.Chapter
4
The Shortest Path Problem
One of the classical problems in Computer Science and Operations Research is the problem of ﬁnding the shortest path between two points. Given a lengthed directed graph G = (V. Let us for convenience assume that all lengths are integer. Note that if we have a negative length cycle in our graph then it does not make sense to ask for the shortest path in the graph (at least not for all vertices). We now want to ﬁnd the shortest paths from r to all other vertices in V \ {r}. Consider Figure 4. denoted the root or source r is given.dgs. E).1. which generates itineraries based on public transportation in Denmark (here shortest is with respect to time). 3.dk and www.
xe = −(n − 1)
e∈V
+ (r)
xe −
e∈V − (i) e∈V + (i)
xe = 1
xe ∈ Z+
.t. the number of paths entering the vertex must be exactly 1 larger than the number of paths leaving the vertex.3) (4. 5. as the path from 1 to v ends here. 6 can be made arbitrary short by using the negative length cycle several times. Let xe denote the number of paths using each edge e ∈ E.1) (4. 8 and 9 are well deﬁned it means that the shortest paths for the vertices 2. 3.1: An example of a graph G = (V. 3.2) i∈V \r e∈E (4. Let vertex r be the root. 2 Like with the minimum spanning tree problem eﬃcient algorithms exist for the problem. E. This gives the following mathematical model:
min
e∈E
we xe xe −
e∈V
− (r)
(4. w) with root node 1 with a negative length cycle 2.4)
s.44
The Shortest Path Problem
Although the shortest paths from r = 1 to 7. but before we indulge in these methods let us try to establish a mathematical model for the problem. For vertex r n − 1 paths have to leave r. Therefore the shortest paths from 1 to each of the vertices are undeﬁned. 4. For any other vertex v.
3 2 4 6 1 5 14 1 7 6 7 3 4 9 1 2 10 4
8
Figure 4. 5.
one entry for each vertex in the network y = y1 . If the potential is feasible for a shortest path problem then we have an optimal solution. pj . yi + wij ≥ yj for all (i. . Now this error is ﬁxed and we can again check if the potential is now feasible.1
The BellmanFord Algorithm
One way to use the deﬁnition of potentials in an algorithm is to come up with an initial setting of the potentials and then to check if there exists an edge (i. Now if P is a shortest path from v1 to vk then any subpath Pij of P will be a shortest path from vi to vj . vk be a path from v1 to vk . j). pi+1 . . if y satisﬁes that 1. yn . . that is. . Furthermore the subgraph deﬁned by the shortest paths from r to all other vertices is a tree rooted at r. . yj can be interpreted as the length of a path from r to j. Therefore the problem of ﬁnding the shortest paths from r to all other nodes is sometimes refereed to as the Shortest Path Tree Problem. We
. . j) ∈ E then y is called a feasible potential. This means that the predecessor the each vertex is uniquely deﬁned. v2 . j) where yi + wij < yj . . In order to come up with an algorithm for the shortest path problem we deﬁne potentials and feasible potentials. If that is not the case the potential is feasible and the solution is therefore optimal.
4.4.1 The BellmanFord Algorithm
45
Let P = v1 . j) where yi + wij < yj then potential is not feasible and the solution is therefore not optimal – there exists a path from r to j via i that is shorter than the one we currently have. A subpath Pij of P is then deﬁned as Pij = pi . yi + wij is also the length of a path from r to j. This is called a potential. . Furthermore. It is a path from r to i and then ﬁnally using the arc (i. So if yj is equal to the length of a shortest path from r to j then yj ≤ yi + wij . yr = 0 and 2. On the other hand if there is an edge (i. Consider a vector of size n. Not surprisingly. . . if G is assumed to be strongly connected the tree is a spanning tree. . This can be repaired by setting yj equal to yi + wij . If not it would contradict the assumption that P is a shortest path. Furthermore. .
the sum yi + wij also represents the length of a path from node r to j and therefore it makes sense to compare the two values. containing the length of the shortest path from 1 to i resp. pi ← Nil for all other i while an edge exists (i. so for Fords algorithm we need to be aware of negative length circuits. y and p.. A simple solution that quickly resolves the deﬁciency of
.
keep doing this until the potential is feasible. E). This is Fords algorithm for the shortest path problem as shown more formally in algorithm 4. This is a problem if the instance contains a negative length circuit.46
The Shortest Path Problem
i
j
r
Figure 4. as these may lead to an inﬁnite computation. Algorithm 4: Ford’s algorithm Data: A distance matrix W for a digraph G = (V. yi ← ∞ for all other i pr ← 0. The dashed lines represents a path composed of at least one arc. j) ∈ E the wij equals the distance from i to j. j) ∈ E such that yj > yi + wij do yj ← yi + wij pj ← i
Note that no particular sequence is required.. As the potentials always have to be an upper bound on the length of the shortest path we can use yr = 0 and yi = +∞ for i ∈ V \ {r} as initial values. the predecessor vertex for i on the path for each vertex in {1.2: As the potential in node i reﬂects the possible length of a path from r to i and so for node j. . n}
1 2 3 4 5
yr ← 0.. otherwise wij = +∞ Result: Two nvectors. The operation of checking and updating yj is in many text books known as “correcting” or “relaxing” (i. It will not be possible to detect the negative length cycle and therefore the algorithm will never terminate. j). If the edge (i.
4. E. We can be even less restrictive. If the edge (i. the predecessor vertex for i
1 2 3 4 5 6 7 8 9
yr ← 0. containing the length of the shortest path from r to i resp. Each individual inspection and prospective update can be executed in constance time. otherwise wij = +∞ Result: Two nvectors. yi ← ∞ for i ∈ V \ {r} pr ← 0. j) */
Note that in step 6 in the algorithm we run through all edges in the iteration.1 (Correctness of BellmanFord’s Algorithm) Given a connected. So step 4 takes O(m) and this is done n − 1 times so all in all we get O(nm).1 The BellmanFord Algorithm
47
Fords algorithm is to use the same sequence for the edges in each iteration. The induction hypothesis is: In iteration k. j) ∈ E the wij equals the distance from i to j.
. y and p. then yi is the length of P. The extension from Ford’s algorithm to BellmanFord’s algorithm is to go through all edges and relax the necessary ones before we go through the edges again.
Proof: The proof is based on induction. pi ← Nil for i ∈ V \ {r} k←0 while k < n and ¬(y feasible) do k ←k+1 foreach (i. The initialization (12) is O(n). The time complexity can be calculated fairly easy by looking at the worst case performance of the individual steps of the algorithm. Proposition 4. Algorithm 5: BellmanFord Algorithm Data: A distance matrix W for a digraph G = (V. w) with a cost function w : E → R and a root r The BellmanFord’s algorithm produces a shortest path tree T . j) ∈ E do if yj > yi + wij then yj ← yi + wij pj ← i
/* correct (i. Next the outer loop in step 3 is executed n−1 times each time forcing the inner loop in step 4 to consider each edge once. E) with n vertices. directed graph G = (V. if the shortest path P from r to i has ≤ k edges. For the base case k = 0 the induction hypothesis is trivially fulﬁlled as yr = 0 = shortest path from r to r.
the length of the shortest path. 3. .
. In case the graph is acyclic we can solve the Shortest Path problem faster. n} such that for any (v. v2 . and therefore no changes should be made in the n’th iteration. Recall that an acyclic graph is a directed graph without (directed) cycles. ie.3 is 1. If all distances are nonnegative. . . vk−1 ... If negative edge lengths are present. A topological sorting of the vertices is a numbering N : V → {1. we now assume that for any shortest path from r to i with less than k edges yi is the length of the path. . a shortest path containing at most (n − 1) edges exists for each i ∈ V . . In the k’th iteration we perform a correct operation on all edges we speciﬁcally also correct the edge (vk−1 . But with as the graph contains n vertices a simple path can at most contain n − 1 edges. If a negative length circuit exists. 2. In the k’th iteration we will perform a correct on all edges. △
4.. w) ∈ V : N (v) < N (w). . l). . . Given that the shortest path from r to l consist of k edges Pl = r. vk−1 . this can be discovered by an extra iteration in the main loop. 4. 5.48
The Shortest Path Problem
For the inductive step. This will establish the shortest path from r to l and therefore yvi = shortest path from r to l. . v2 . the algorithm still works. v1 . l due to our induction hypothesis we know the shortest path from r to v1 . 6.3: An acyclic graph The topological sorting of the vertices of the graph in Figure 4. In order to be able to reduce the running time we need to sort the vertices that we will check in a speciﬁc order.2
Shortest Path in Acyclic graphs
Before we continue with the general case let us make a small detour and look at the special case of an acyclic graph. If any y values are changed in the n’th iteration it indicates that at least one shorter path has been established.
1
7 4 2
2
4
10 4
3
14
1
4
3
5
4
6
Figure 4.
. vn
1 2 3 4 5 6
y1 ← 0.2 Shortest Path in Acyclic graphs
49
As vertex 1 is the only vertex with only outgoing edges it will consequently be the ﬁrst edge in the sorted sequence of vertices. Hence. the time complexity is O(m). pv ← Nil for all other v for i ← 1 to n − 1 do foreach j ∈ V + (vi ) with yj > yvi + wvi j do yj ← yvi + wvi j p j ← vi
Let us assume we have done the topological sorting.. Algorithm 6: Shortest path for acyclic graph Data: A topological sorting of v1 .. In algorithm 6 each edge is considered only once in the main loop due to the topological sorting. yv ← ∞ p1 ← 0. If we look at vertex 5.
. it is placed after vertices 1 and 2 as it has two incoming edge from those two vertices. .4.
4: The execution of the acyclic algorithm on our sample graph.
.50
The Shortest Path Problem
0
7
0
1 7
2 10 4 4 14 2
3
1 7
2 10 4 4 14 2
3
4 1
14
4 1
3 4 5
4 6
3 5
4
4 6
4
(a) Init
0 7 17 0
(b) Step 1
7 5
1 7
2 10 4 4 14 2 4 1
14
3
1 7
2 10 4 4 14 2 4 1
3
3 5
4
4 6
11
7
3 5
4
4 6
8
4
4
(c) Step 2
0 7 5
0
(d) Step 3
7 4
1 7
2 10 4 4 14 2 4 1
7
3
1 7
2 10 4 4 14 2 4 1
3
3 5
4
4 6
8
7
3 5
4
4 6
8
4
4
(e) Step 4
0 7 4
(f) Step 5
2 7 4 4 14 2 4 1
7
1
3 10
3 5
4
4 6
8
4
(g) Step 6
Figure 4.
if the lengths are road distances. Even though we could still use BellmanFord’s algorithm Dijkstra’s algorithm – named after its inventor.4. Algorithm 7: Topological sorting Data: An acyclic digraph G = (V. travel times. We can improve the time complexity of the BellmanFord algorithm if we assume that all edge lengths are nonnegative. for example. This assumption makes sense in many situations. j) ∈ E it holds that Nv < Nw
1 2 3 4 5 6 7 8 9 10
start with all edges being unmarked Nv ← 0 for all v ∈ V i←1 while i ≤ n do ﬁnd v with all incoming edges marked if no such v exists then STOP Nv ← i i←i+1 mark all (v. E) Result: A numbering N of the vertices in V so for each edge (i. w) ∈ E
Now with the time complexity of O(m) the overall time complexity of ﬁnding the shortest path in an acyclic graph becomes O(m).3
Dijkstra’s Algorithm
Now let us return to a general (lengthed) directed graph. or travelling costs.3 Dijkstra’s Algorithm
51
A topological sorting of the vertices in an acyclic graph can be achieved by the algorithm below. the Dutch computer scientist Edgar Dijkstra – utilizes the fact that the edges have nonnegative lengths to get a better running time.
4.
. Note that if the source vertex is not equal to the ﬁrst vertex in the topological sorting it means that some vertex cannot be reached from the source vertex and will remained labeled with distance +∞.
containing the length of the shortest path from r to i resp.
1 2 3 4 5 6 7 8 9 10
S ← {r}. j) ∈ E an edge weight wij is given. First the source vertex will be selected and we relax all outgoing edges of vertex 1.52
The Shortest Path Problem
Algorithm 8: Dijkstra’s algorithm Data: A digraph G = (V. n}.
An example of the execution of Dijkstras algorithm is shown in Figure 4. In the second step vertex 2 is selected. 4) to the shortest path tree and ﬁnally in step 5 we add vertex 5 and thereby (4. For each edge (i. U ← ∅ pr ← 0.. E) with n vertices and m edges. the edges (1. This adds (3. yr ← 0 pi ← Nil. This results in updating y2 and y3 to 1 resp. 2. 3) path of the solution. 3) as it is more expensive to get to vertex 3 via vertex 2. A source vertex is denoted r. U ← U ∪ {i}. Initially all potentials are set to +∞ except the potential for the source vertex that is set to 0. 7) but not (2. that is. thereby determining edge (1. 3). 5) to the solution. .
. j) ∈ E do if yj > yi + wij then yj ← yi + wij pj ← i S ← S \ {i}. y and p.5. In step 3 vertex 3 is selected making edge (1. Result: Two nvectors. yi ← ∞ for i ∈ V \ {r} while S = ∅ do i ← arg minj∈S {yj } for j ∈ V \ U ∧ (i... 2) and (1. In step 4 vertex 4 is chosen. 2) as part of the shortest path solution. Note: wij ≥ 0. All outgoing edges are inspected and we relax (2. the predecessor vertex for i on the path for each vertex in {1.
4.3 Dijkstra’s Algorithm
53
1
7
7 2 5
2
5
0 1
1 1 2 2
0
1 1 1 2 2 2
2
3
3 3 4
2
3
4
(a) Starting graph
1 7
(b) Step 1
1 8
7 2 5
7 2 5
0
1 1 1 2 2 2
0
1 1 1 2 2 2
3 3
2
3 4
2
3
5
4
(c) Step 2
1 7
(d) Step 3
1 7
7 2 5 2
7 5
0
1 1 1 2 2 2
0
1 1 1 2 2 2
3 3
2 5
3 4
2
3
5
4
(e) Step 4
(f) Step 5
Figure 4. Vertex 1 is source vertex. A vertex without a ”distance” symbolizes a distance of ∞
.5: The execution of Dijkstra’s algorithm on a small example directed graph.
that is. yv ≤ yu for all u ∈ S. The induction hypothesis is that before an iteration it holds that for each vertex i in U . E. and this step takes – like in the MST algorithm – O(1) time.2 (Correctness of Dijkstras Algorithm) Given a connected. Hence the complexity of the algorithm is O(n2 ) if a list representation of the y vector is used. (check for yourself) Let v be the element with the smallest y value selected in the inner loop of iteration k. The Dijkstra algorithm will produce a shortest path tree T rooted at r.6: Proof of the correctness of the Dijkstra algorithm Consider path R. directed graph G = (V. R starts in r. Let (u. Now yv is the length of a path Q from r to v passing only through vertices in U (see Figure 4. Since v is not in P .
v Q
r
R u w
Figure 4. Theorem 4. the shortest path from the source vertex r to i has been found and is of length yi .6).3). w) be the ﬁrst
. R has at least one edge from a vertex in P to a vertex not in P . Suppose that this is not the shortest path from r to v – then another path R from r to v is shorter. yi is the shortest path from r to i with all vertices except i belonging to U . Proof: The proof is made by induction.54
The Shortest Path Problem
Note that this algorithm is very similar to Prim’s algorithm for the MST (see section 3. Also after the ﬁrst iteration where U = {r} the assumption is true. As U = ∅ this is obviously true initially. and a complexity of O(mlogn) can be obtained if the heap data structure is used instead. and for each vertex i not in U . which is in U . w) with length function w : E → R+ and a root node r. The only diﬀerence to Prim’s algorithm is in fact the update step in the inner loop.
This means that the determinant of each square submatrix is equal to 0. the update step in the inner loop ensures that after the current iteration it again holds for u not in P (which is now the “old” P augmented with v) that yu is the shortest path from from r to u with all vertices except u belonging to P . j) ∈ E
(4.8)
yj − yi ≤ wij yi free
. It would seem like the model and our solution approaches are independent ways of looking at the problem.6) (4.4. In fact. This means that the integer requirement in our model (4. 1 or −1. A classical result (dating back to the mid 50’s) is that the vertexedge incidence matrix of a directed graph is totally unimodular. The dual to the shortest path problem then becomes:
max s. they have a great deal in common.4). and in this section we will elaborate more on that.4 Relation to Duality Theory
55
edge of this type.4) can be relaxed to
xe ≥ 0
e ∈ E. therefore the length of the part of R from w to v is nonnegative. but was discarded as v was picked. The vertex w is a candidate for vertex choice in the inner loop in the current iteration. and hence the total length of R is at least the length of Q.
(4. Let us now construct the dual.t.5)
Now our shortest path problem is actually an LP problem.
−(n − 1)yr +
i=r
yi (i.7) (4.1) . Since righthand sides are integers it implies that all positive variables in any basic feasible solution to such problems automatically take integer values. Let yi be the dual variable for constraint i.(4. △
4.4
Relation to Duality Theory
From our approach solving the Shortest Path Problem using potentials it is not evident that there is a connection to our mathematical model (4. Hence yw ≥ yv . Furthermore. All edge lengths are assumed to be nonnegative. This is a contradiction – hence Q is a shortest path from r to v.
y2 .11) (4.9) (4. 4) x14 −1 (2. .1: Derivation of primal and dual LP problem For this instance the dual objective can be read from the RHS column and the columns of the primal (one for each edge) constitute the constraints. so we get:
max s. . . The dual variables y1 .14)
. .13) (4. 3) x53 (5.12) (4. . y5 free
(4. . y2 . The primal and dual problems are conveniently set up in Table 4.t.
2 4 1 6
1 1 7 5
5
4 4 3
Figure 4.1. 3) x43 (5. 4) x54 rel = = = = = RHS −4 1 1 1 1
Table 4. .7 with r = 1. 2) x42 1 −1 ≤ 1 1 −1 ≤ 4 1 −1 ≤ 5 1 −1 ≤ 1 (4.
−4y1 + y2 + y3 + y4 + y5 y2 − y1 ≤ 4 y4 − y1 ≤ 7 . . 2) x12 −1 1 (1.56
The Shortest Path Problem
Let us look at the primal and dual problem of the instance in Figure 4.7: An example Edges Variables y1 y2 y3 y4 y5 relation wij (1.10) (4. . . . 5) x25 −1 1 ≤ 4 ≤ 7 1 ≤ 6 (4. . y5 are associated with the ﬁve vertices in the graph. y4 − y5 ≤ 1 y1 . .
we ﬁnd a invalid constraint in our dual model.5 Applications of the Shortest Path Problem
57
It can easily be shown that the dual problem is determined up to an arbitrary constant. and here we naturally choose to ﬁx yr = 0. However note that we may not be able to use Dijkstra’s algorithm in solutions to some of these applications as the graphs involved may have negative edge lengths. what our algorithm do is to maintain primal feasibility and CSC. Now in an iteration of any one of our methods for solving the shortest path problem we ﬁnd an edge to correct. So the CSC holds initially.5
Applications of the Shortest Path Problem
Now we will look at some interesting applications of the shortest path problem.
.4. that is. dual problem. If pj = i this implies that xij > 0 in the model. and trying to obtain dual feasibility. We can therefore ﬁx one of the dual variables. basically stating that all xij ’s are zero. Now fundamental LP theory tells us that theses solutions are optimal as well. CSC was fullﬁlled before and after the operations it still holds. which corresponds to setting xij to a nonnegative value.
4. Then we set yj equal to yi + wij and change the predecessor. The predecessor variables p can be seen as representing the primal variables x in our model. Central in LP theory is that all simplextype algorithms for LP can be viewed as interplay between primal feasibility. dual feasibility and the set of complementary slackness conditions (CSC). j) ∈ E
(4.15)
CSC must be fullﬁlled by a pair of optimal solutions for the primal resp. So in essence. Initially our algorithm start by setting pi equal to Nil. In our case CSC will be:
(yi + wij − yj )xij = 0 (i. When dual feasibility is obtained we have primal and dual feasibility. and CSC.
808 .log 11. We want to model the possibilities of exchanging currency in order to establish the best way to go from one currency to another. we can convert n units of currency x to ne−wP units of currency y. the weight of the path is given by w(P ) = i=1 wci ci+1 l−1 = i=1 − log rci ci+1 l−1 = − log i=1 rci ci+1
l−1
So if we convert currencies along the path P . the best way to convert x to y is to do it along the shortest path in this graph.log 0. c2 . Also.8: A graph to solve the currency conversion problem.xe.562
Islandic Kronur
Figure 4.238 .58
The Shortest Path Problem
Swedish Kroner
.105 . . cl from vertex c1 to cl .com at 11:17 on February 16th 2007. .log 0. .log 9.832 Danish Kroner .
The currency conversion problem
Consider a directed graph G = (V.084
. If there is a path P = c1 . E) where each vertex denotes a currency.
. If 1 unit of a currency u can buy ruv units of another currency v we model it by an edge of weight − log ruv from u to v (see Figure 4. Currencies are taken from www.log 1. Thus. . we have found a way to increase our fortune by simply making a sequence of currency exchanges. if there is a negative cycle in the graph.log 0.8).
. In a system of diﬀerence constraints each row of A contains exactly one 1. one −1 and the rest 0’s. the variables xi may be times at which events are to occur. or to determine that no feasible solution exists.3 Let x = (x1 . thereby also solving the linear programming problem. Thus. Another feasible solution to the system of diﬀerence constraints above is x1 = 5. we investigate a special case of linear programming that can be reduced to ﬁnding shortest paths from a single source. . Systems of diﬀerence constraints occur in many diﬀerent applications. For example. xn ) be a feasible solution to a system of diﬀerence constraints. . x2 + d. An example could be: x1 − x2 x1 − x3 x2 − x3 x3 − x4 x2 − x4 ≤ ≤ ≤ ≤ ≤ −3 −2 −1 −2 −4
This system of three constraints has a feasible solution. . .4. Now sometimes we don’t really care about the objective function. there are special cases that can be treated more eﬃcient in another way. Proof: Look at the general for of a constraint: xj − xi ≤ bk
. and let d be any constant. xn + d) is a feasible solution to the system of diﬀerence constraints as well. any vector x that satisﬁes the constraints (Ax ≤ b). In this section. x3 = 10.5 Applications of the Shortest Path Problem
59
Diﬀerence constraints
While we in general rely on the simplex method or an interior point method to solve a general linear programming problem. we have: Theorem 4. we just wish to ﬁnd an feasible solution. The singlesource shortest path problem can then be solve using the BellmanFord algorithm. namely x1 = 0. . x3 = 5. . In fact. . Each constraint can be viewed as stating that there must be at least/most certain amount of time (bk ) between two events. x2 = 8. x4 = 7. Then x + d = (x1 + d. x2 = 3. that is. x2 . each of the constraints can be written on the form xj − xi ≤ bk . x4 = 12.
. More formally we get a set of vertices V = {v0 . . . v2 ). Vertex vi will correspond to variable xi . vn )} Edges of the type (v0 . If G contains no negative weight cycles. v1 ). while the edges (vi . If we insert xj + d and xi + d instead of xj and xi we get (xj + d) − (xi + d) = xj − xi ≤ bk So if x satisﬁes the constraints so does x + d. we add a vertex v0 and edges from v0 to every other vertex to guarantee that every other vertex is reachable. . The graph of our example is shown in Figure 4. . vn } and E = {(vi . then x1 = y1 . vj ) get a weight of bk . . and each edge corresponds to one of the m inequalities involving two unknowns.△ The system of constraints can be viewed from a graph theoretic angle. If A is an m × n matrix then we represent it by a graph with n vertices and m directed edges.9: The graph representing our system of diﬀerence constraints. x2 = y2 . .4 Given a system Ax ≤ b of diﬀerence constraints. xn = yn
. Finally. vi ) will get the weight 0.9 Theorem 4.60
The Shortest Path Problem
0
1 3
7 0
0 2 0 2 1
0 4 3 2 4
Figure 4. (v0 . . (v0 . v1 . . . Let G be the corresponding constraint graph constructed as described above vertex 0 being the root. vj ) : xj − xi ≤ bk is a constraint} ∪ {(v0 . .
or equivalently yj − yi ≤ wij . That is obviously a contradiction. [6] er nummer 832 i mit paperbibliotek. j) ∈ E. .k ≤ wk1
Adding left hand sides together and right hand sides together results in 0 ≤ weight of cycle. . Using the BellmanFord algorithm we can now calculate the following solution x1 = −7. Here yi is the length of the shortest path from 0 to i. . .5 Given a system Ax ≤ b of diﬀerence constraints.4. x4 = 0. that is. By the triangle inequality. .5 Applications of the Shortest Path Problem
61
is a feasible solution to the system. Let G be the corresponding constraint graph constructed as described above vertex 0 being the root.
Proof: Consider any edge (i. . x2 = −4. Then x2 − x1 x3 − x2 xk − xk−1 x1 − xk ≤ ≤ . 2. △
Theorem 4. Therefore BellmanFord will run in O(n + 1)(n + m)) = O(n2 + nm). x3 = −2. 1 .
Proof: Suppose the negative weight cycle is 1. j).
Swapping applications in a daily airline ﬂeet assignment
Brug Talluris artikel som har brug for korteste vej som endnu et eksempel p˚ a brugen af korteste vej. In [1] it is established that you can achieve a time complexity of O(nm). By adding 7 to all variables we get the ﬁrst solution we saw. 0 < 0. Thus letting xi = yi and xj = yj satisﬁes the diﬀerence constraint xj − xi ≤ wij that corresponds to edge (i. If G contains a negative weight cycle then there is no feasible solution for the system.
. k. yj ≤ yi + wij . △ A system of diﬀerence constraints with m constraints and n unknowns produces a graph with n + 1 nodes and n + m edges. w12 w23
≤ wk−1.
. . Afterwards verify the result by constructing the shortest path integer programming model for the graph. More information on those can be found in [5]. Is the path between i and j in a minimum spanning tree necessarily a shortest path between the two vertices? Give a proof or counterexample.7
Exercises
1. What happens if we relax the integrality constraint and solve it using an LP solver? Also verify the potentials by establishing the dual problem and solve it in your LP solver. As a soft start on the project. More information about totally unimodular (TU) matrixes can be found in [4].
.
4. Use Dijkstra’s algorithm to ﬁnd the solution the shortest paths from vertex 1 to all other vertices in the graph in Figure 4. for which the transportation time does not exceed 25 minutes?” You have been assigned the responsibility of the optimization component of the system.10. . The queries posed to the system will be of the type “what is the shortest path (in kilometers) from Herlev to Kastrup. 2. that also have the same integer property. . y4 and the corresponding shortest path tree. Besides TU there exists other classes of matrices. The section on the subject in these notes are based on that text.11 and show the resulting values of y1 . The company has been contacted by a customer. You are employed in a consultancy company dealing mainly with routing problems.6
Supplementary Notes
A nice exposition to the relationship between the shortest path problem and LP theory can be found in [3]. Given two vertices i and j. Balanced and Perfect. (a) Apply Dijkstra’s algorithm to the digraph in Figure 4. who is planning to produce a traﬃc information system. Solve it in your MIP solver. The goal of the system is to be able to recommend the best path between two destinations in a major city taking into account both travel distance and time. you ﬁnd diﬀerent books in which the usual single source shortest path problem is considered. 3.62
The Shortest Path Problem
4.
however. Then ﬁnd the correct distances (and correct shortest path tree) using i) BellmanFord algorithm for shortest paths.11 and compute the vertex potentials with the shortest path tree found in question (a) as basis tree. show that the distances determined in question (a) are not the correct shortest distances. Show the ﬁrst iteration of the algorithm for the graph of Figure 4. you decide also to look at alltoall shortest path algorithms. State the problem corresponding to the digraph in Figure 4.10: Find the shortest paths from the source vertex 1 to all other vertices (b) The Shortest Path Problem can be formulated as a transshipment problem.11. ii) the transshipment algorithm as described in chapter 7. In case G does not satisfy this requirement. (d) As indicated one can use Dijkstra’s algorithm repeatedly to solve the allpairsshortest path problem – given that the graph under consideration has nonnegative edge costs. You apply the algorithm of FloydWarshall. possible to change the cost of the edges so all costs become nonnegative and the shortest paths with respect to the new edge costs are the same as
. (c) For the sake of completeness.4. it is. in which the source has capacity 4 and each other vertex is a demand vertex with demand 1. Using the potentials.7 Exercises
63
Figure 4.
an edge of cost 0 from K to v is added.64
The Shortest Path Problem
1 1 3 1 s 2 4 3 4
Figure 4. and why? Show the application of this algorithm on the digraph of Figure 4. the shortest st path with respect to the original edge costs wij . (f) You are now in possession of two alltoall shortest path algorithms: The FloydWarshall algorithm. The shortest path from K to each vertex i ∈ V is now calculated ′ – this is denoted πi . v. ′ (e) Explain why it holds that wij is nonnegative for all i.11. such that Dijkstra’s algorithm may now be applied to G with the modiﬁed edge costs.12. The method is the following: An artiﬁcial vertex K is added to G. What is the computational complexity for each of the methods? For which graphs would you recommend FloydWarshall’s algorithm. The new edge lengths wij are now deﬁned by ′ wij = wij + πi − πj . and for each vertex v in G. and the repeated application of Dijkstra’s algorithm once per vertex of the given graph. cf. Which shortest path algorithm would you recommend for ﬁnding the shortest paths from K to the other vertices. and for which would you use V applications of Dijkstra’s algorithm?
2
2
3
3
. j ∈ V .12. Present an argument showing that for all s. w ∈ V is also the ′ shortest st path with respect to the modiﬁed edge costs wij .11: Digraph with source vertex s those for the old edge costs (although the actual lengths of the paths change). Figure 4. t ∈ V . which illustrates the construction for the digraph of Figure 4.
12: Digraph with additional vertex K and with 0length edges added.
.7 Exercises
65
0
1
0 K 0
0
1 2 1 2
3
s 2
0
3
4 3 4
3
Figure 4.4.
66
The Shortest Path Problem
.
Prentice Hall. and Clifford Stein. Magnanti. [2] Ravinda K. Introduction to Algorithms. 2001. Integer Programming. and James B. Wiley Interscience. 1998.Bibliography
[1] Thomas H. Nemhauser and Laurence A. Charles E. Leiserson. Ahuja. Wolsey. Wolsey. Orlin. Cormen. Rørbech. Second Edition. Thomas L. The MIT Press and McGrawHill. [6] Kalyan T. 1998. [5] George L. 1993. Transportation Science 30(3):237–248.
. LP formulations of the shortest path tree problem. Ronald L. Network Flows. Rivest. Wiley Interscience. [3] Jakob Krarup and Malene N. Integer and Combinatorial Optimization. 4OR 2:259–274 (2004). Talluri. [4] Laurence A. Swapping Applications in a Daily Airline Fleet Assignment.
68
BIBLIOGRAPHY
.
develop a realistic schedule.
. PERT/CPM has been used for many diﬀerent projects including: 1. These techniques can assist the project manager in breaking down the overall project into smaller parts (called activities or tasks). and ﬁnally monitor the progress of the project.Chapter
5 Project Planning
The challenging task of managing largescale projects can be supported by two Operations Research techniques called PERT (Program Evaluation and Review Technique) and CPM (Critical Path Method). Microsoft Projects. Construction of a new building or road. Today project management software based on these methods are widely available in many software packages such as eg. The methods were developed independently of each other in the late 50’s and the original versions of PERT and CPM had some important diﬀerences. but are now often used interchangeably and are combined into one acronym: PERT/CPM. In fact PERT was developed in 1958 to help measure and control the progress of the Polaris Fleet Ballistic Missile program for the US Navy. Around the same time the American chemical company DuPont was using CPM to manage its construction and repair of factories. coordinate and plan the diﬀerent activities.
Movie productions. Research and development of a new product. 3. The tasks that needs to be carried out during the project is broken down into a set of individual “atomic” tasks called activities. This relates to ﬁnancial planning.1
The Project Network
As project managers we ﬁrst and foremost would like to get a visualization of the project that reveal independencies and the overall “ﬂow” of the project. As project manager at for GoodStuﬀ Enterprises we have the responsibility for the development of a new series of advanced intelligent toys for kids called MasterBlaster. 4. No project ever goes 100% as planned. A project manager must have technical skills.70 2. Construction of ITsystems. 3. For each activity we need to know the duration of the activity and its immediate predecessors. 4. and managing innovation and problem solving within the project. For the MasterBlaster project the activities and their data can be seen in Table 5. As the toy should be ready before the Christmas sale we have been asked to investigate if we can ﬁnish the project within 30 weeks. contract management. Some of the roles are: 1. Based on a preliminary idea top management has given us green light to a more thorough feasibility study. redeﬁning activities etc. It is the responsibility of the project manager to adapt to changes such as reallocation of resources. obtain appropriate and necessary resources. In the job one needs to direct and supervise the project from the beginning to the end.
5.
. 2. Ship building.1. reduce the project to a set of manageable tasks. As the saying goes one picture is worth a thousand words.
Project Planning
The role of the project manager in project management is one of great responsibility. In order to introduce the techniques we will look at an example. 5. The project manager must deﬁne the ﬁnal goal for the project and must motivate the workers to complete the project on time. The project manager must deﬁne the project.
A node is used to separate an activity from each of its immediate predecessors. I Duration (weeks) 10 4 8 6 6 5 4 7 2 3
71
Table 5. and Fi. AON is also seen as easier to revise than AOA when there are changes in the network. First we build a ﬂow network as an AON network. E F. In order to start the process two “artiﬁcial” activities “St” (for start) and “Fi” (for ﬁnish) are added.2 we have a breakdown of a tiny project into activities (as duration is not important for visualizing the project it is omitted here). how do we visualize the connection of the diﬀerent tasks in the project and their relative position in the overall plan. Let us look at the diﬀerences on a small example. St is the ﬁrst activity of the project and Fi is the ﬁnal activity of the project. D. Since the ﬁrst inventions of the project management methods PERT and CPM there have been two alternatives for presenting project networks: AOA (Activityonarc): Each activity is represented as a an arc. A. Note that this will result in a acyclic network (why?). C. G.1 The Project Network Activity A B C D E F G H I J Description Product design Market research Production analysis Product model Marketing material Cost analysis Product testing Sales training Pricing Project report Immediate predecessor – – A A A C D B. AON (Activityonnode: Each activity is represented by a node.1: Activities in project MasterBlaster The ﬁrst issue we will look at is how to visualize the project ﬂow. H F. Now for each prede
. We start of by drawing the nodes St. The arcs are used to show the precedence relationships. Furthermore AON are easier to understand than AOA for inexperienced users.5. In Table 5. Project networks consist of a number of nodes and a number of arcs. that is. In an AON network nodes are equal to activities and arcs will represent precedence relations. B. AON have generally been regarded as considerably easier to construct than AOA.
The larger and more complex the project becomes the more dummy activities will have to be introduced to construct the AOA network. Let us construct the project network for our MasterBlaster project. and these dummy activities will blur the picture making it harder to understand. F i) arc and a (D. we draw an arc (A. and we then get the network as shown in Figure 5. We will therefore focus on AON networks in the following as they are more intuitive and easy to construct and comprehend.72 Activity A B C D Immediate predecessor – – A. that is. A) arc and an (St. D) arc. B) arc. We have now constructed an AON network representation of our small example shown in Figure 5. C) arc and a (B. This node represents an activity of zero
. In the small example we need to be able to show that whereas activity C is preceded by A and B. B B
Project Planning
Table 5. The only way to do this is by creating a “dummy” activity d. C). F i) arc as they do not occur as predecessors to any activity.1
A C
St
Fi
B
D
Figure 5. The nodes merely describes an “state change” or a checkpoint.2 Although the AOA network is more compact than the AON network the problem with the AOA network is the use of dummy activities. As A and B do not have any predecessors we introduce an (St. a (B. D is only preceded by B.2: A breakdown of activities for a tiny project cessor relation we draw an arc. Finally we introduce a (C. We begin by making the start node (denoted St).1: An AON network for our little example Where the activities are represented by nodes in the AON network they are represented by arcs in an AOA network.
D and E and making an arc from A to each of these nodes. In this case it is node A and B. In relation to our project a number of questions are quite natural:
. After having made node J and connected it to activities F and I.
Figure 5.4. This is shown in Figure 5. Fi. we conclude the construction of the project network by making the ﬁnish node.3 where the duration of the activities are indicated next to the nodes. All activities that are not immediate predecessor to an activity is connected to Fi. All nodes in the project that do not have an immediate predecessor will have the start node as their immediate predecessor.5.3: Start building the project network by making the start node and connecting nodes that does not have an immediate predecessor to the start node. Just as the start node the ﬁnish node represents an imaginary activity of duration zero. an arc from St to A. Then we make the nodes for activities C. For our MasterBlaster project the project network is shown in Figure 5. and one from St to B are generated. This way we continue to build the AON network.2: An AOA network for out little example
duration that starts the project.1 The Project Network
73
A St B d
C
Fi D
Figure 5. After the St node the nodes A and B.
how much delay can be tolerated without delaying the project completion?
. • What is the total time required to complete the project if no delays occur? • When do the individual activities need to start and ﬁnish (at the latest) to meet this project completion time? • When can the individual activities start and ﬁnish (at the earliest) if no delays occur? • Which activities are critical in the sense that any delay must be avoided to prevent delaying project completion? • For other activities. The duration of the individual activities are given next to each of the nodes.4: A AON network of our MasterBlaster project.74
Project Planning
Figure 5.
C. This longest path is also called the critical path. which is the total duration of all activities on the path if they where executed with no breaks in between. and ends at the Fi node. The length of a path identiﬁes a minimum time required to complete the project. Thus the project can be completed in 28 weeks. H. H. An example of a path is St. For the other activities there will be some slack. making it possible to catch up on delays without forcing a delay in the entire project. F. C. visits a number of activities. I. J. F i St. A. F i . J. H. C. J. B. The MasterBlaster project contains a total of 5 paths which are enumerated below: St. E. D. A. A. C. A. For the ﬁrst path St. F i and St. I. A. C. J. J. F i we get 10 + 8 + 5 + 3 = 26. J. J. J. I. A. I. Note that there can be more than one critical path if several paths have the same maximum length. Such a path cannot contain a cycle. I. namely: St. F. A. A. F i St. I. F. F i St. A.2
The Critical Path
A path through a project network starts at the St node. E. G. F i St. H. H. F i with a length of 28 weeks. C. F i 26 28 23 28 16
The estimated project duration equals the length of the longest path through the project network. In the case of our MasterBlaster project there are in fact two critical paths. St. F i St. G. Below we have listed all paths and their length.5. A. J. F. I. J. St. B. F. C. F i St. J. F i St. The length of a path is calculated by adding the durations of the activities on the path together.
. two weeks less than the limit set by top management. If we run a project we must be especially careful about keeping the time on the activities on a critical path. Therefore we cannot complete a project faster than the length of the longest path. D. E. J. F i To each of the paths we assign a length. A. F.2 The Critical Path
75
5. F i. J. F. A. J. I.
When we need to compute ES and EF for activity H we need to apply the Earliest Start Time Rule. 16} = 16 and then EFH = 23. EFE } = max{4. They need to start and end maintaining the calculated duration of the project.76
Project Planning
5. and 2.6
.3
Finding earliest start and ﬁnish times
As good as it is to know the project duration it is necessary in order to control and manage a project to know the scheduling of the individual activities. or ESi = max{EFj : j is an immediate predecessor of i} Applying this rule for activity H means that ESH = max{EFB . D and E based on an earliest start of 10 (as this is the earliest ﬁnish of their immediate predecessor. No delays means that 1. Initially we assign 0 to ESst and EFst . ES and EF is computed in a top down manner for the project graph. In the same way earliest start and ﬁnish can easily be computed for C. For each activity i in our project we deﬁne: • ESi as the earliest start for activity i. Now clearly EFi can be computed as ESi + di . actual duration equals estimated duration. Now we can set ESA and ESB to 0 as they follow immediately after the start activity. and • EFi to be Earliest ﬁnish for activity i Let us denote the duration of activity i as di . The values for the project can be seen in Figure 5. The PERT/CPM scheduling procedure begins by determining start and ﬁnish time for the activities. and EFB 4. It states that the earliest start time of an activity is equal to the largest of the earliest ﬁnish times of its immediate predecessors. each activity begins as soon as all its immediate predecessors are ﬁnished. activity A). We continue this way from top to bottom computing ES and EF for activities. Activities F and G can be computed now that we know the earliest ﬁnish of their immediate predecessors. EFA must then be 10.
We deﬁne: • LSi to be the latest start time for activity i.
.5.
5.4
Finding latest start and ﬁnish times
The latest start time for an activity i is the latest possible time that the activity can start without delaying the completion of the project. and • LFi to be latest ﬁnish time for for activity i. Obviously LSi = LFi − di .5: A AON network of our MasterBlaster project with earliest start and ﬁnish times.4 Finding latest start and ﬁnish times
77
Figure 5.
We want to compute LS and LF for each of the activities in our problem. So we get LSI = 25 and LFI = 23. So we get LSJ = 28 and LFJ = 25. which states that the latest ﬁnish time of an activity is equal to the smallest of the latest
. In order to calculate the latest ﬁnish time we need to use the Latest Finish Time Rule. Again LS and LF for activity I can easily be calculated as it has a unique successor. This automatically becomes then latest ﬁnish time of the ﬁnish node. To do so we use a bottom up approach given that we have already established the completion time of our project. Latest ﬁnish time must be equal to the latest start time of the successor which is the ﬁnish activity.78
Project Planning
Figure 5. Now LS and LF for activity J is computed.6: A AON network of our MasterBlaster project with earliest start and ﬁnish times. For activity F we need to take both activity I and J into consideration. Initially LSﬁ and LFﬁ is set to 28.
Figure 5. 25} = 23 and LSF will therefore be 18. or
79
LFi = min{LSj : j is an immediate successor of i}
because activity i has to ﬁnish before any of its successors can start.
.7: A AON network of the MasterBlaster project with latest start and ﬁnish times.7. For the MasterBlaster project we get the numbers in Figure 5.5.4 Finding latest start and ﬁnish times start times of its immediate successors. LSJ } = min{23. With this in mind we can now compute LFF = min{LSI . In the same way the remaining latest start and ﬁnish times can be calculated.
Figure 5. For our projects the slacks are given in Table 5.8 the latest and earliest start and ﬁnish times are presented in one ﬁgure.3 An activity with a positive slack has some ﬂexibility: it can be delayed (up to a certain point) and not cause a delay of the entire project. The slack for an activity is deﬁned as the diﬀerence between its latest ﬁnish time and its earliest ﬁnish time. look at activity G we can see that the project can start as early as week 16 but even if we start as late as in week 21 we can still ﬁnish the project on time.8: A AON network of the MasterBlaster project with earliest and latest start and ﬁnish times. We can in fact for each of the activities compute the slack. It will cause a knockon eﬀect and thereby delay the entire project it activity E gets delayed. These quite simple measures are very important in handling a project. On the contrary activity E has to be managed well as there is no room for delays. When we eg.80
Project Planning
In Figure 5.
.
Top management reviews our project plan and comes up with a oﬀer.4. F. In many building projects there are incentives made by the developer to encourage the builder to ﬁnish before time. H. They oﬀer an incentive of 40000 Euros if the project can be ﬁnish in 25 weeks or earlier. E.
5. This is often done in order to ﬁnish within a certain deadline.5 Considering TimeCost tradeoﬀs Slack 0 pos. Crashing a project refers to crashing a number of activities in order to reduce the duration of the project below its normal time. Crashing an activity refers to taking (costly) measures to reduce the duration of an activity below its normal time. The situation is shown in Figure 5. I.5
Considering TimeCost tradeoﬀs
Once we have identiﬁed the critical path and the timing on the activity the next question is if we can shorten the project. Activities A. for each activity the cost for each unit of reduction in time is constant. The main issue is to know whether the slack is zero or positive. Any delay for an activity along this path will delay project completion. Doing so we are targeting a speciﬁc time limit and we would like to reach this time limit in the least cost way. In the most basic approach we assume that the crashing cost is linear. Note that each activity with zero slack is on a critical path through the project network. They think 28 weeks will make the product available very late in comparison with the main competitors. G
81
Table 5. J B.3: The slacks for the MasterBlaster project. This could be to use extra manpower or using more expensive but also more powerful equipment.5. Therefore through the calculation of the slacks the project manager now has a clear identiﬁcation of activities that needs his utmost attention. Quickly we evaluate every activity in our project to estimate the unit cost of crashing each activity and by how much we can crash each of the activities. The numbers for our project is shown in Figure 5.9. that is. C. D. Now based on these ﬁgures we can look at the problem of ﬁnding the least cost
.
I. Here we use the unit crash cost to determine a way of reducing the project duration by 1 week at a time. And many projects will have signiﬁcantly more paths than 5. So we have not yet reached the 25 weeks and therefore perform yet another iteration. C. which is activity F. H.
. We can run through Table 5. One way is the Marginal Cost Approach. J.4 and ﬁnd the cheapest activity to crash which is activity H. F. F i and St. A. F i . F. As the table clearly demonstrates. J.5.82
Project Planning
Crash crash cost
Normal normal cost
crash time
normal time
Figure 5. a weakness of the method is that even with a few routes it can be quite diﬃcult to keep an overview (and to ﬁt it on one piece of paper!). A.9: Crashing an activity
way of reducing the project duration from the current 28 weeks to 25 weeks. We ﬁnd the cheapest activity that can be crashed. J.6. As a consequence of crashing activity H by one week all paths that contains activity H are reduced by one in length. I. A. which are St. E. F i . Initially there are two critical paths in the MasterBlaster project. The easiest way to do this is by constructing a table like the one shown in Table 5. I. We now crash activity H by one week and our updated table will look like Table 5. In the Marginal Cost Approach we repeatedly ask the question “What is the cheapest activity to crash in a critical path?”. We look at the critical path(s). C. The length of the critical path is still 28 weeks but there is not only one critical path in the project namely St. The Marginal Cost Approach was developed in conjunction with the early attempts of project management.
.4: Crashing activities in project MasterBlaster. Activity to crash Crash cost Path length ACFIJ ADGJ AEHIJ 28 23 28
ACFJ 26
BHIJ 16
Table 5. As the project must be considered more unstable after the crashing we should deﬁnitely think twice before going for the bonus.5: The initial table for starting the marginal cost approach and crash it by one week we then get the situation in Table 5.5 Considering TimeCost tradeoﬀs Activity A B C D E F G H I J Duration (normal) 10 4 8 6 6 5 4 7 2 3 Duration (crashing) 7 3 7 4 3 3 3 3 2 2 Unit Crashing cost 8000 4000 9000 16000 11000 6000 7000 5000 — 9000
83
Table 5. As we have still not reached the required 25 weeks we continue.7.5.8 So overall it costs us 35000 Euros to get the project length down to 25 weeks.6: Crashing activity H by one week and update the paths accordingly. The ﬁnal result can be seen in Table 5. Given that the overall reward is 40000 Euros the “surplus” is 5000 Euros. Another way of determining the minimal crashing cost is to state the problem as an LP model and solve it using an LP solver. Activity to crash H Crash cost 5000 Path length ACFIJ ADGJ AEHIJ 28 23 28 28 23 27
ACFJ 26 26
BHIJ 16 15
Table 5. The wise project manager will probably resist the temptation as the risk is to high and the payoﬀ to low.
constraints and variable deﬁnitions.
We need to present the problem as an LP problem with objective function. Since the start time of each activity is directly related to the start time and duration of each of its immediate predecessors we get:
. The decisions that we need to take are by how much we crash an activity and when the activity should start. yF i . Activity to crash H F H F H A Crash cost 5000 6000 5000 6000 5000 8000 Path length ACFIJ ADGJ AEHIJ 28 23 28 28 23 27 27 23 27 27 23 26 26 23 26 26 23 25 25 22 24
ACFJ 26 26 25 25 24 24 23
BHIJ 16 15 15 14 14 13 13
Table 5. A variable. + 9000xJ . is introduced and the project duration constraint yF i ≤ 25 is added to the model. How will our objective function look like? The objective function is to minimize the total cost of crashing activities: min 8000xA + 4000xB + .7: Crashing activity D by one week for the second time and update the paths accordingly. F (twice) and A (once).8: After crashing activities H (three times). Furthermore we need to add constraints that correspond to the fact that the predecessor of an activity needs to be ﬁnished before the successor can start. .84 Activity to crash H F Crash cost 5000 6000
Project Planning Path length ACFIJ ADGJ AEHIJ 28 23 28 28 23 27 27 23 27
ACFJ 26 26 25
BHIJ 16 15 15
Table 5. where xi is the reduction in the duration of the activity and yi is the starting time of the activity. . So for each activity we deﬁne two variables xi and yi . With respect to the constraints we have to impose the constraint that the project must be ﬁnished in less than or equal to a certain number of weeks.
. Before we investigate why we only got a heuristic solution for our MasterBlaster project let us look at Figure 5. For our project planning problem we arrive at the following model: min s. 8000xA + 4000xB + 9000xC + 16000xD + 11000xE + 6000xF + 7000xG + 5000xH + 9000xJ yC ≥ yA + (10 − xA ) yD ≥ yA + (10 − xA ) yE ≥ yA + (10 − xA ) ..duration) for this immediate predecessor
85
In other words if we look at the relationship between activity A and C we get yC ≥ yA + (10 − xA ). If we do that we get a surprising result..5.. . xJ ≤ 1
This model can now be entered into CPLEX or another LP solver. If we crash B we will crash C in the next step or vice versa. .5 Considering TimeCost tradeoﬀs start time of this activity ≥ (start time . that is.t. Again costing us 30000 Euros. . In total both paths have been reduced by
. xB ≤ 1. yF i ≥ yJ + (3 − xJ ) yF i ≤ 25 xA ≤ 3 . The LP optimium is 24000 Euros (simply crashing activity A three times) which is diﬀerent from the Marginal Cost Approach. In fact the Marginal Cost Approach is only a heuristic. There are two activities with a unit crashing cost of 30000 Euros (activity B and C) and one (activity D) with a unit crashing cost of 40000 Euros. In the ﬁrst step of the marginal cost analysis we will choose either B or C to crash. Assume we have two critical paths from A to F one going via B the other via C.10. How can that be? The Marginal Cost Approach said that the total crashing cost would be 35000 Euros and now we found a solution that is 11000 Euros better. Finally we just need to make sure that we do not crash more than actually permitted so we get a series of upper bounds on the xvariables. This will cost us 30000 Euros. For each arc in the AON project network we get exactly one of these constraints.. xJ ≤ 1. xA ≤ 7.
Of course in a real world project one thing is to estimate duration times another is to achieve them. But because we very narrowly looks at the cheapest activity a critical path to crash we miss this opportunity. In the solution to the MasterBlaster project problem we crash activities H and F once each to reduce the critical path by one.86
A
Project Planning
30000
B
C
30000
D
40000
E
F
Figure 5. but this is only at a cost of 8000 Euros. and that would only cost us 40000 Euros. The estimate could be wrong from the beginning or due to other reasons it could become wrong as the project is moving on. So now we have to reconsider our refusal to crash the project.10: An example of where the Marginal Cost Approach makes a suboptimal choice. This costs us 11000 Euros.
5.6
Supplementary Notes
In the material we have discussed here all durations times were static. This could also have been achieved by crashing activity D by one week. Therefore
. Now the potential surplus is 16000 Euros. 1 unit each. Now crash activity A just once also reduces the length of the critical path by one.
Determine the critical path(s). F Duration (weeks) 1 1 3 2 3 2 1 3 1 3 4
Table 5.5. Activities with a
. This very simple model on project planning can be extended in various ways taking on additional constraints on common resources etc. (b) Find earliest time. C A A C.7
Exercises
1. Remembering the project management tool in OR you have come up with the following list of activities as in Table 5.
5.7 Exercises
87
research has been conducted into looking at stochastic elements in relation to project planning. Activity A B C D E F G H I J K Description Select location Obtain keynote speaker Obtain other speakers Coordinate travel for keynote speaker Coordinate travel for other speakers Arrange dinners Negotiate hotel rates Prepare training booklet Distribute training booklet Take reservations Prepare handouts from speaker Immediate predecessor – – B A. G H I C. B A. Use this information to determine which of the paths is a critical path? (c) As the proposed date for the training seminar is being moved the training seminar needs to be prepare in less time. For further reading see [2]. You are in charge of organizing a training seminar for the OR department in your company. (a) Find all paths and path lengths through the project network.9. and slack for each of the activities.9: Activities in planning the training seminar. Further readings on project planning where we also consider stochasticity can be found in [1] that introduces the ﬁrst simple approaches. latest time.
88
Project Planning duration of 1 week cannot be crashed any more. then either gi is equal to 0 or xi = Di − di in any optimal solution. (a) Construct the project network for this project. Given a unit crashing cost of 3000 Euros for the activities D and H. There are also two constrains. C A D.11. (b) Find earliest time. there is a variable xi for each activity i in the project. Activity A B C D E F G Immediate predecessor – – B A.10: Activities for the design of a project network. E D. Argue that if di < Di . but those activities that takes 2 or 3 weeks can be crashed by one week. Finally crashing activity C costs 7000 Euros. and slack for each of the activities. Find the duration of the project if all activities are completed according to their normaltimes. You immediately remember something about project management from you engineering studies. “xi ≥ 0” and “xi ≤ Di − di ”. You are given the following information about a project consisting of seven activities (see Table 5. 5000 Euros for the activities J and K and 6000 Euros for activities E and F. F Duration (weeks) 5 2 2 4 6 3 5
Table 5. latest time. In the LP model for ﬁnding the cheapest way to shorten a course. you decide to solve the following test case using PERT/CPM: The available data is presented in Table 5. what is the maximum duration of activity D without delaying the completion of the project? 3. Denote the dual variables of the latter gi .
. What does it cost to shorten the project by one week? and how much to shorten it by two weeks? 2. Use this information to determine which of the paths is a critical path? (c) If all other activities take the estimated amount of time. Activity K can be crashed by 2 weeks. To get into the tools and the way of thinking.10). Subsequently you are put in charge of the rather large project of implementing the intranet and the corresponding organizational changes in the company.
So the situation for the given activity can be illustrated like in Figure 5. If no.11. pred. Let us turn back to our MasterBlaster project. Same question if you do not know anything about the slopes beforehand. e4
Table 5. The linear relationship on the crashing cost is vital for the results.
crash cost
crash
normal cost
normal
crash time
intermediate time
normal time
Figure 5. in this simple case with only one “break point”. how. But can we deal with this situation in our LP modeling approach? If yes.5. Suppose that we will get
.11: Data for the activities of the test case. What is the additional cost of shortening the project time for the test case to 13? 4.11: Crashing an activity You may assume that the slopes are as depicted. why not. 5.7 Exercises
Activity e1 e2 e3 e4 e5 e6 Imm. Normaltime (Di ) 5 3 4 6 6 4 Crashtime (dj ) 1 1 2 1 5 4 Crash cost 4 4 1 3 0 0
89
e1 e1 . After further analysis it has been shown that the crashing costs can actually be modeled by a piecewise linear function. e2 e2 e3 . After a thorough analysis a project manager has come to a situation where one of his activities does not exhibit the linear behavior. Let us assume that the incentive scheme set forward by the top management instead of giving a ﬁxed date and a bonus speciﬁed a daily bonus.
. Describe how this can be solved in our LP model and what the solution in the MasterBlaster project will be. Construct the project crashing curve that presents the relationship between project length and the cost of crashing until the project can no longer be crashed.90
Project Planning 9500 Euros for each week we reduce the project length.
and Computers. Pulat. B. McGraw Hill. PrenticeHall. S. Badiru and P. Comprehensive Project Management: Integrating Optimization Models. Management Principles. [2] A. 2001.
. 1995. Introduction to Operations Research.Bibliography
[1] Hillier and Lieberman.
92
BIBLIOGRAPHY
.
We may imagine the graph as a road network and we want to know how many cars we can get through this particular road network.Chapter
6 The Max Flow Problem
Let us assume we want to pump as much water from point A to B. therefore we deﬁne the graph as G = (V. In order to specify how we want to direct the ﬂow in a solution we ﬁrst need formally to deﬁne what a ﬂow actually is:
. E. E) is given. Furthermore two “special” vertices r and s are given. u) . The objective is now to determine how much ﬂow we can get from the source to the sink. In order to get the water from A to B we have a system of waterpipes with connections and pipelines. In the max ﬂow problem we determine 1) the maximal throughput and 2) how we can achieve that through the system. these are called resp. In order to ﬁnd how much can be pumped in total and through which pipes we need to solve the maximum ﬂow problem. Each edge e has a capacity ue ∈ R+ . An alternative application as described above the network is a network of water pipes and we now want to determine the maximal throughput in the system. For each pipeline we furthermore have a capacity identifying the maximum throughput through the pipeline. In the maximum ﬂow problem (for short the max ﬂow problem) a directed graph G = (V. the source and the sink.
94
The Max Flow Problem
Deﬁnition 6.1 A ﬂow x in G is a function x : E → R+ satisfying: ∀i ∈ V \ {r, s} :
(j,i)∈E
xji −
(i,j)∈E
xij = 0
∀(i, j) ∈ E : 0 ≤ xij ≤ uij Given the deﬁnition of a ﬂow we can deﬁne fx (i) = (j,i)∈E xji − (j,i)∈E xij . So fx (i) is for a given ﬂow and a give node the diﬀerence in inﬂow and outﬂow of that node where a surplus is counted positive. fx (i) called the net ﬂow into i or the excess for x in i. Speciﬁcally fx (s) is called the value of the ﬂow x. We can now state the max ﬂow problem as an integer programming problem. The variables in the problem are the ﬂow variables xij . max s.t. fx (s) fx (i) = 0 0 ≤ xe ≤ ue xe ∈ Z+
i ∈ V \ {r, s} e∈E e∈E
Notice that we restrict the ﬂow to being integer. The reason for this will become apparent as the solution methods for the max ﬂow problem are presented. Related to the deﬁnition of a ﬂow is the deﬁnition of a cut. There is a very tight relationship between cuts and ﬂows that will also become apparent later. Let us consider R ⊂ V . δ(R) is the set of edges incident from a vertex in R to a vertex in R. δ(R) is also called the cut generated by R:
δ(R) = {(i, j) ∈ E  i ∈ R, j ∈ R} Note the diﬀerence to the cut we deﬁned for an undirected graph in chapter 3. In a directed graph the edges in the cut are only the edges going from R to R and not those in the opposite direction. So in an undirected graph we have δ(R) = δ(R), but this does not hold for the deﬁnition of a cut in directed graphs. As the cut is uniquely deﬁned by the vertexset R we might simply call R the cut. A cut where r ∈ R and s ∈ R is / called an r, scut. Furthermore we deﬁne the capacity of the cut R as: u(δ(R)) =
(i,j)∈E,i∈R,j∈R
uij
95 that is, the capacity of a cut is the sum of capacities of all edges in the cut. ¯ Theorem 6.2 For any r, scut δ(R) and any ﬂow x we have: x(δ(R))−x(δ(R)) = fx (s). Proof: Let δ(R) be an (r, s)cut and f a ﬂow. Now we know that fx (i) = 0 for all nodes i ∈ R \ {s}. Trivially we have fx (s) = fx (s). If we now add these equations together we get i∈R\{s} fx (i) + fx (s) = fx (s) that is
fx (i)
i∈R
= fx (s).
(6.1)
Consider the contribution of the diﬀerent edges to the right hand side of the equation (the four diﬀerent cases are illustrated in Figure 6.1).
Figure 6.1: Illustration of the four diﬀerent cases for accessing the contribution to a ﬂow based on the nodes relationship to R and R. 1. (i, j) ∈ E, i, j ∈ R: xij occurs in none of the equations added, so it does not occur in (6.1). 2. (i, j) ∈ E, i, j ∈ R: xij occurs in the equation for i with a coeﬃcient of −1 and a +1 in the equation for j so in (6.1) it will result in a coeﬃcient of 0. 3. (i, j) ∈ E, i ∈ R, j ∈ R: xij occurs in (6.1) with a coeﬃcient of 1 and therefore the sum will have a contribution of +1 from this edge.
96
The Max Flow Problem 4. (i, j) ∈ E, i ∈ R, j ∈ R: xij occurs in (6.1) with a coeﬃcient of −1 and therefore the sum will have a contribution of −1 from this edge.
So the left hand side is equal to x(δ(R)) − x(δ(R)). △ Corollary 6.3 For any feasible ﬂow x and any r, scut δ(R), we have: fx (s) ≤ u(δ(R)). Proof: Given a feasible ﬂow x and a r, scut δ(R). From theorem 6.2 we have
x(δ(R)) − x(δ(R)) = fx (s) Since all values are positive (or zero) this gives us
fx (s) ≤ x(δ(R)). Furthermore we cannot have a ﬂow larger than the capacity, that is, x(δ(R)) ≤ u(δ(R)). So we have fx (s) ≤ x(δ(R)) ≤ u(δ(R)). △ The importance of the corollary is that it gives an upper bound on any ﬂow value and therefore also the maximum ﬂow value. In words it states – quite intuitively – that the capacity of the minimum cut is an upper bound on the value of the maximum ﬂow. So if we can ﬁnd a cut and a ﬂow such that the value of the ﬂow equals the capacity of the cut then the ﬂow is maximum, the cut is a “minimum capacity” cut and the problem is solved. The famous Max Flow  Min Cut theorem states that this can always be achieved. The proof of the Max Flow  Min Cut Theorem is closely connected to the ﬁrst algorithm to be presented for solving the max ﬂow problem. So we will already now deﬁne some terminology for the solution methods. Deﬁnition 6.4 A path is called xincrementing (or just incrementing) if for every forward arc e xe < ue and for every backward arc xe > 0. Furthermore a xincrementing path from r to s is called xaugmenting (or just augmenting). Theorem 6.5 Max Flow  Min Cut Theorem. Given a network G = (V, E, u) and a current feasible ﬂow x. The following 3 statements are equivalent: 1. x is a maximum ﬂow in G.
6.1 The Augmenting Path Method 2. A ﬂow augmenting path does not exist (an rsdipath in Gx ).
97
3. An r, scut R exists with capacity equal to the value of x, ie. u(R) = fx (s). Proof: 1) ⇒ 2) Given a maximum ﬂow x we want to prove that there does not exist a ﬂow augmenting path. Assume there does exist a ﬂow augmenting path, and seek a contradiction. If there exists a ﬂow augmenting path we can increase the ﬂow along the augmenting path by at least 1. This means that the new ﬂow is at least one larger than the existing one. But this is in contradiction with the assumption that the ﬂow was maximum. So no ﬂow augmenting path can exist when the ﬂow is maximum. 2) ⇒ 3) First deﬁne R as R = {i ∈ V : ∃ an xincrementing path from r to i}. Based on this deﬁnition we derive: 1. r ∈ R 2. s ∈ R Therefore R deﬁnes an r, scut. Based on the deﬁnition of R we derive (i, j) ∈ δ(R) xij = uij and for (i, j) ∈ δ(R) we get xij = 0. As x(δ(R)) − x(δ(R)) = fx (s) we insert x(δ(R)) = u(δ(R)) and x(δ(R)) = 0 from above, and get: u(δ(R)) = fx (s). 3) ⇒ 1) We assume that there exists an r, scut R with fx (s) = u(δ(R)). From the corollary we have fx (s) ≤ u(δ(R)). The corollary deﬁnes an upper bound equal to the current feasible ﬂow, which therefore must be maximum. This proves that the three propositions are equivalent.△
6.1
The Augmenting Path Method
Given the Max Flow  Min Cut theorem a basic idea that leads to our ﬁrst solution approach for the problem is immediate. The augmenting path method. This is the classical max ﬂow algorithm by Ford and Fulkerson from the mid50’s.
We can push one more unit along this path and thereby increase the value of the ﬂow by one. The residual graph identiﬁes based on the current ﬂow where excess can be send to. 1) there is a ﬂow of four and a capacity of four. The critical part of this approach is to have a systematic way of checking for augmenting paths. An important graph when discussing the max ﬂow problem is the residual graph. so we can push excess ﬂow from 1 in the direction of 3 using the (1. j) ∈ E ∧ xij < uij } ∪ {(j. Based on the current ﬂow we then try to ﬁnd an augmenting path in the network.5) and the algorithm terminates. s is in fact an augmenting path. that is. 3) arc. In some textbooks you might see the term “auxiliary graph” instead of residual graph. Therefore we cannot push more ﬂow from r to 1 along that arc as it is already saturated. x is deﬁned by: Vx = V (Gx ) Ex = E(Gx ) = V = {(i. 1) is also included in the residual graph as excess ﬂow from node 3 can be pushed to 1 by reducing ﬂow along the (1. But the path uses the (2. This is not necessarily the case in the original graph. This updates the previous ﬂow by adding the new ﬂow that we have managed to get from r to s. This is represented by having an arc in the opposite direction from 1 to r. 3) is in the residual graph. Therefore the residual graph contains an (1. If we look at the arc (r. 3. 3).2. a path from r to s that can accommodate an increase in ﬂow. j) ∈ E ∧ xij > 0}
The residual graph can be built from the original graph and the current ﬂow arc by arc. On the other hand we can lower the ﬂow from r to 1 along the (r.6 Given a feasible ﬂow x the residual graph Gx for G wrt.2 the path r. j) : (i. As another example consider the (1. If that is possible we introduce the increased ﬂow. If we return to Figure 6. 3) arc. Deﬁnition 6. When we cannot ﬁnd an augmenting path the maximum ﬂow is found (due to Theorem 6. But (3. 1) arc. Therefore the arc (1. 2. This is illustrated in Figure 6. Now we again look for a new augmenting path. There is room for more ﬂow from 1 to 3.98
The Max Flow Problem
The basic idea is to start with a feasible ﬂow. r) arc. i) : (i. The nice property in a residual graph is that an augmenting path consists of forward arcs only. 3) arc in the “reverse” direction as we decrease
. As a feasible ﬂow we can always use x = 0 but in many situations it is possible to construct a better initial ﬂow by a simple inspection of the graph.
1 The Augmenting Path Method
99
(a) The graph G and a ﬂow x
(b) The residual graph Gx
Figure 6.6.2: The residual graph reﬂects the possibilities to push excess ﬂow from from a node
.
3. Based on the residual graph we can identify an augmenting path r. u). Algorithm 9: The Augmenting Path algorithm Data: A directed graph G = (V. In the residual graph the path is simply made up by forward arcs. 2. 1.3 and the corresponding residual graph. In order not to start with x = 0 as the initial ﬂow we try to establish a better starting point. Otherwise no augmenting path exists. Consequently. 4. We start by pushing 4 units of ﬂow along r. s . 3). 3}. The algorithm is presented in algorithm 9. In the residual graph is not possible to construct incrementing paths beyond nodes 1 and 3. Hence. 1.100
The Max Flow Problem
the inﬂow from node 3 and instead route it to the sink. Having identiﬁed the augmenting path r. 4. Furthermore we push two units of ﬂow along the path r. 1. Instead we look at the arc (r. 3. From r we can proceed to 1 but from this node there is no outgoing arc. 4. 1. 2. Now we can describe the augmenting path algorithm where we use the residual graph in each iteration to check if there exists an augmenting path or not. we can at most push one unit of ﬂow through the path. s we update the ﬂow with 2 along the path and get to the situation shown in 6. From 3 we can continue to 2 and then to 4 and then can get to s. s and one unit of ﬂow along r. there is not an augmenting path in the graph and consequently we have found a maximum ﬂow. s and arrive at the situation shown in Figure 6. Note that forward edges are
. Consider the following example. j) ∈ E repeat construct Gx ﬁnd an augmenting path in Gx if s is reached then determine max amount x′ of ﬂow augmentation augment x along the augmenting path by x′ until s is not reachable from r in Gx
If s is reached then the augmenting path is given by the p. 2. This is done in the following ﬁgure and its corresponding residual graph is also shown. E. Now it becomes more tricky to identify an augmenting path.3 (c). The value of the ﬂow is 11 and the minimum cut is identiﬁed by R = {r. a source r and sink s Result: A ﬂow x of maximum value
1 2 3 4 5 6 7 8
xij ← 0 for all (i. s . The capacities on the path are 2 and 1.
4. s . We can push 1 unit of ﬂow through this path.1 The Augmenting Path Method
101
saturated. The next issue is how the search in the residual graph for an augmenting path is made. j) ∈ E : x′ ← xij + umax s ij
• for (i. Two things should be noted: 1) the running time of the algorithm is dependent on the capacities on the arcs (here we get a running time of 2M times the time it takes to update the ﬂow) and 2) if M is large it will take a long time to reach the maximal ﬂow although it is possible to ﬁnd the same solution in only two iterations. j) ∈ P : (i. In the algorithm we need to determine the maximum amount that we can increase the ﬂow along the augmenting path we have found. and we cannot get a negative ﬂow. For arcs that are backward arcs in the original graph the ﬂow will be reduced along the augmenting path.6. 2. When increasing the ﬂow we need to be aware that in arcs that are forward arcs in the original graph we cannot exceed the capacity.
. j) ∈ E : x′ ← xij ij
where umax is the maximum number of units that can be pushed all the way s from source to sink. 2. 1. and backward edges are empty in the cut. A ﬁrst augmenting path could be r. As a result we get the updated (augmented) ﬂow x′ :
• for (i. that is. Next iteration we get the augmenting path r. Several strategies can be used. but in order to ensure a polynomial running time we use a breath ﬁrst approach as shown in algorithm 10. 1. s again increasing the ﬂow by one unit. j) ∈ P : (j. To realize the importance of the search strategy look at Figure 6. i) ∈ E : x′ ← xji − umax s ji
• for all other (i. ﬂow equal to capacity.
s
(c) Updating based on the augmenting path r. s
Figure 6.3: An example of the iterations in the augmenting path algorithm for the maximum ﬂow problem
.102
The Max Flow Problem
(a) The graph G with an initial ﬂow an the corresponding residual graph
(b) Updating based on the augmenting path r. 2. 1. 3. 4.
4: Example of what can go wrong if we select the ”wrong” augmenting paths in the augmenting path algorithm
Algorithm 10: Finding an augmenting path if it exists Data: The network G = (V. and the current feasible ﬂow x Result: An augmenting path from r to s and the capacity for the ﬂow augmentation. sink vertex s. j) ∈ Ex label j Q ← Q ∪ {j} pj ← i if (i. j) ∈ Ex is forward in G then umax ← min{umax . The augmenting path algorithm with a breadthﬁrst search
. if it exists
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
all vertices are set to unlabeled Q ← {r}.1 The Augmenting Path Method
103
1
M
r
M 1
s
M
2
M
Figure 6. xji } j i Q←Q\i S ← S ∪ {i}
We call an augmenting path from r to s shortest if it has the minimum possible number of arcs.6. E. S ← ∅ pi ← 0 for all i umax ← +∞ for i ∈ G i while Q = ∅ and s is not labeled yet do select a i ∈ Q scan (i. u). a source vertex r. uij − xij } j i if (i. j) ∈ Ex is backward in G then umax ← min{umax .
s} :
(j.
6. A push operation will move excess ﬂow from an active node to another node along an forward arc in the residual graph. Figure 6. A preﬂow with no active nodes is a ﬂow. The nodes 2 and 3 are active nodes. This restores the balance of ﬂow.i)∈E
xji −
(i. In the augmenting ﬂow method a ﬂow was maintained in each iteration of the algorithm. a source vertex r and a sink vertex s is a function x : E → R+ satisfying:
∀i ∈ V \ {r. In the preﬂowpush method the balance constraint – inﬂow for a vertex equals outﬂow for a vertex – is relaxed by introducing a preﬂow.7 A preﬂow for a directed graph G with capacities u on the edges. An important component of the preﬂowpush method is the residual graph which remains unchanged. The general idea in the preﬂowpush algorithm is to push as much ﬂow as possible from the source towards the sink. This method is sometimes also called Pushrelabel. If fx (i) > 0 then node i is called active.104
The Max Flow Problem
solves the maximum ﬂow problem in O(nm2 ).2
The PreﬂowPush Method
An alternative method to the augmenting path method is the preﬂowpush algorithm for the Max Flow problem. If we are not careful the next push operation could potentially take one units of ﬂow
. In the graph we could push additional ﬂow from node 3 to node 2. In this case node 3 would be in balance and the only remaining active node would be node 2 as illustrated in Figure 6. and there are still active nodes we push the excess back toward the source r. the ﬂow excess fx (i) is nonnegative – “more runs into i than out of i”. So for a given graph G and preﬂow x the residual graph Gx shows where we can push excess ﬂow. Deﬁnition 6.5 (b).5 (a) illustrates the process. However it is not possible to push more ﬂow towards the sink s.j)∈E
xij ≥ 0
∀(i. Breadthﬁrst can be implemented using the queue data structure for the set Q. j) ∈ E : 0 ≤ xij ≤ uij In other words: for any vertex v except r and s.
j) ∈ Ex : di ≤ dj + 1 The idea is that as long as it is possible we would like to send the ﬂow “towards” the sink. First of all dr must be 3 and ds 0 according to the deﬁnition.
. set xe ← ue for all outgoing arcs of r. s) arc we get da ≤ ds + 1.6. 3) edge in Ex ). a preﬂow x is a function d : V → Z satisfying: 1.6. In order to control the direction of the pushes we introduce what will later turn out to be estimates (in fact lower bounds) on the distances in Gx called a labeling. Look at the simple example in Figure 6.8 A valid labeling of the vertices in V wrt. set dr ← n and di ← 0 otherwise. 2. A valid labeling for a preﬂow implies an important property of the preﬂow.7 the cut deﬁned by R = {r} is saturated. It is however possible to construct an initial feasible preﬂow that permits a valid labeling. that it “saturates a cut”. What value can we assign to da in order to get a valid labeling? According to rule 2 based on the arc (r.5 the initialization will result in the labels shown in Figure 6. Having deﬁned valid labeling it is natural to ask if any feasible preﬂow admits a valid labeling. dr = n ∧ ds = 0 2. If we look at Figure 6.7. For the max ﬂow problem in Figure 6. We will call this procedure an initialization of x and d. The answer is no.2 The PreﬂowPush Method
105
from node 2 back to node 3 and thereby reestablish the preﬂow we just had (as there is an (2. In order to estimate that we use the valid labeling. set xe ← 0 for all remaining arcs. It is therefore necessary be able to control the direction of the pushes in order to avoid repeating pushes. all edges leaving the cut have ﬂow equal to capacity. that is. ∀(i. This gives da ≥ 2 and da ≤ 1 which clearly does not leave room for a feasible value. Deﬁnition 6. In order to get a feasible preﬂow with a valid labeling we: 1. a) we have that dr ≤ da + 1 and using rule 2 on the (a. and 3.
5: Changing a preﬂow by applying a push operation
.106
The Max Flow Problem
(a) Inititial preﬂow
(b) Pushing one unit from node 3 to node 2
Figure 6.
j) in the residual graph that crosses the cut. j) ∈ δ(R). 0 < k < n such that di = k for all i ∈ V . Proof: Since there are n nodes. As the labeling is valid we must have di = dj + 1. scut δ(R). Deﬁne R = {i ∈ V : di > k}. scut δ(R) such that xij = uij for all (i. that is i ∈ R and j ∈ R. Theorem 6. there exists a value k. j) ∈ δ(R) and xij = 0 for all (i. The values on the nodes is the labeling.2 The PreﬂowPush Method
107
Figure 6.6: A preﬂow does not always admit a valid labeling as this small example shows
Figure 6.6. No edge can fulﬁll the valid labeling and
. Let us look at an edge (i. R therefore deﬁnes an r. then there exists an r. In a way the augmenting path algorithm and the preﬂowpush algorithm are “dual” to each other.7: Initialization of the preﬂow and a corresponding valid labeling. the preﬂowpush algorithm maintains a saturated cut and terminates as a feasible ﬂow has been detected. While the augmenting algorithm maintains a ﬂow and terminates as a saturated cut is found. Clearly r ∈ R (dr = n) and s ∈ R (ds = 0). As i ∈ R di ≥ k + 1 and as j ∈ R dj ≤ k − 1.9 Given a feasible preﬂow x and a valid labeling d for x.
10 Estimates on distances. So what should we do if there are no more admissible edges and a node v is still active?
. j) in the shortest path P from i to j we have di ≤ dj + 1. If we return to our max ﬂow problem in Figure 6. which can be pushed to w j (min{fx (i). Such an edge (i. • If di ≥ n this means that there is no path from i to s. With the deﬁnition d we can prove that for any feasible preﬂow x and any valid labeling d for x we have
dx (i. Algorithm 11: The Push operation
1 2
3
consider an admissible edge (i. j) is called an admissible edge. (xji + (uij − xij ))}) push this by ﬁrst reducing xji as much as possible and then increasing xij as much as possible until the relevant amount has been pushed
If an admissible edge is available the preﬂowpush algorithm can push ﬂow on that edge. • di < 2n for all nodes. • di − n is a lower bound on the distance from i to r. Having dj < di and a valid labeling means that push is only applied to arcs (i.7 it is obvious that no admissible edges exist. Adding those together gives the states result. j) where i is active and di = dj + 1. Let dx (i. j) denote the number of arcs in a shortest dipath from i to j in Gx . since such nodes are estimated to be closer to the ultimate destination. △
The Max Flow Problem
Let us return to the relationship between the labeling and the distance from a node to the sink. As a consequence from this result we have: Corollary 6. As special cases we have: • di is a lower bound on the distance from i to s. j) ≥ di − dj Basically for every edge (i. j) calculate the amount of ﬂow.108 consequently not edge in Ex is leaving R. We try to push ﬂow towards nodes j having dj < di .
s) (1 unit) and (3. and we arbitrarily choose node 1. Now we can put the elements together and get the preﬂowpush algorithm. Analysis of the running time of the preﬂowpush algorithm is based on an analysis of the number of saturating pushes and nonsaturating pushes. and now the single unit of surplus in node 1 is pushed along (1.
. Initially the nodes 1 and 3 are active. Now we can push the surplus of 7 along the edges (3. Now nodes 2 and 3 are active nodes. j) do push on (i.2 The PreﬂowPush Method
109
The answer is to relabel. (3. We choose node 3 and need to relabel. Algorithm 13: The PreﬂowPush algorithm
1 2 3 4 5 6 7
initialize x. As no admissible edges exist we relabel to 1. d while x is not a ﬂow do select an active node i if no admissible arc (i. In the next iteration nodes 2 and 3 are active and will be chosen by the preﬂow push algorithm for further operations. This is done by increasing the value of the label to the minimum of any label on an outgoing edge from the node plus one. 2) (3 units).6. j) ∈ Ex with di = dj + 1 di ← min{dj + 1 : (i. 2) (we could also have chosen edge (1. 4) (3 units). j) ∈ Ex }
Notice that this will not violate the validity of the labeling (check for yourself!). It can be shown that the pushrelabel maximum ﬂow algorithm can be implemented to run in time O(n2 m) or O(n3 ). As node 2 remains active we choose it again and following need to relabel to 2 and now we can push one unit “back” to node 3. Relabel means that we take an active node and assign it a new label that should result in generating at least one admissible edge. 4)). Now node 3 is no longer active but 4 is. First we need to relabel to get a label of value 1 and push 2 units to s. Algorithm 12: The Relabel operation
1 2
consider an active vertex i. j)
If we return to our example from before we may choose any active node and perform a relabel. j) out of i exist then relabel i while there exists an admissible arc (i. for which no edge (i.
3. Also ﬁnd a minimum cut.4 6. Solve the problem using the augmenting path algorithm and the preﬂow push algorithm.1
Applications of the max ﬂow problem
Maximum Bipartite Matching
6. In order to verify your results enter the integer programming model into your favourite MIP solver.3
6. u) in Figure 6.5
Supplementary Notes Exercises
1. Any two directed paths from s to t are called vertexdisjoint if they do not share any vertices other than s and t. Let G be a directed graph with two special vertices s and t.8: Initial iterations of the preﬂow push algorithm on our example graph.110
The Max Flow Problem
(a) Relabel and push from 1
(b) Relabel and push from 3
(c) Relabel and push from 4
(d) Relabel and push from 4 (again)
Figure 6.9 with node 1 as the source and node 8 as the sink. 2. E.
6. Prove that the maximum number of directed vertexdisjoint paths from s to t is equal to the minimum number
. Consider the max ﬂow problem deﬁned by the graph G = (V.
t of a given graph.9: Example max ﬂow problem of vertices whose removal ensures that there are no directed paths from s to t.
.5 Exercises
111
Figure 6. 3. Let P range over the set of s–t paths for two vertices s.6. Let C range over cuts that separate s and t. Then show that max min ce = min max ce
P e∈P C e∈C
where ce is the capacity of edge e.
112
The Max Flow Problem
.
1998. William H. Wiley Interscience. Cook. Pulleyblank and Alexander Schrijver. Cunningham.
. William R.Bibliography
[1] William J. Combinatorial Optimization.
114
BIBLIOGRAPHY
.
Feasibility conditions remains unchanged from the maximum ﬂow problem. The minimum cost ﬂow problem can be deﬁned as:
min
e∈E
we xe fx (i) = bi 0 ≤ xij ≤ uij i∈V (i.Chapter
7 The Minimum Cost Flow Problem
Recall the deﬁnition of fx (i) from the maximum ﬂow problem: fx (i) =
(j.j)∈E
xij
In the minimum cost ﬂow problem a feasible ﬂow x of minimum cost has to be determined. Each vertex i has a demand bi ∈ R. and if bi < 0 then i is called a source. We assume that bV = i∈V bi = 0.
In the minimum cost ﬂow problem we consider a directed graph G = (V.i)∈E
xji −
(i.1)
s.t.
. j) ∈ E
(7. E) where each edge e has a capacity ue ∈ R+ ∪ {∞} and a unit transportation cost we ∈ R. If bi ≥ 0 then i is called a sink.
Given production quantities (ai ). Vertices 1 and 2 are sources and 3.6
5 0
4 6
4.5.1 3.0 2.5 2. The objective is to direct the ﬂow from 1 and 2 in the most cost eﬃcient way to 3. The Max Flow Problem
A class of minimum cost ﬂow problems arises from transportation problems. In a transportation problem G = (V. xe .7.4 3. E) is a bipartite graph with partitions {I.6.1: Example of a minimum cost ﬂow problem. An example of a minimum cost ﬂow problem is shown in Figure 7. Among others. u.11.1. (i. j) ∈ E. and unit transportation cost from i to j (wij ) the objective is to ﬁnd the cheapest transportation of products
.7
2 10 3 1
4. we will just note the following:
1.1 2.4
5 0
4 6
6.5 1. as well as costs wij . The Shortest Path Problem
3. demands (bj ). Figure (b) shows a feasible ﬂow (bold edges).1. E. 4 and 6.
5 1 10 4 5 1 10 4
4.10.0 2.5. ue .0
(a) Graph
(b) Feasible ﬂow
Figure 7. i ∈ I and bj .5
2 10 3 1
4.10
4. J}.1. j ∈ J. ue . The set I deﬁnes a set of factories and J a set of warehouses. b). 4 and 6 are sinks.0 3.116
The Minimum Cost Flow Problem
that is. Labels on edges on the graph shown on Figure (a) are represented by we .11
6. This network can be deﬁned as G = (V. w. The cost of the ﬂow is 75 The minimum cost ﬂow problem is a very versatile problem. and we are given positive numbers ai .5 1.4 3. Labels on the edges represent we . The Transportation Problem
2.4.10
4. inﬂow in the graph is equal to outﬂow. the following ﬂow problems can be described as special cases.
Let us take a look at the constraint matrix of the minimum cost ﬂow problem. the problem is called a capacitated transportation problem. multiplying each of the second group of equations by −1. Now in order to minimize cost the model will try to maximize the ﬂow from sink to source which at the same time will maximize the ﬂow from the source through the graph to the sink as these ﬂows must be in balance.117 to the warehouses. we get a problem on the same form as minimum cost ﬂow problem.2). The shortest path problem can also be considered as a special case of the minimum cost ﬂow problem.(i. where ue is ﬁnite.2. This is illustrated in Figure 7. Deﬁne the root as a source and the remaining n − 1 other vertices as sinks. Each of the sinks will have di = 1. a source and a sink we deﬁne bi = 0 for all vertices. and setting bi = −ai for i ∈ I. and set the unit transportation cost to 0 for each of the edges.
xij = ai
j∈J.j)∈E
xe ≥ 0
Adding constraints of the form xe ≤ ue . to (7. while the root will have dr = n − 1.
. The unit transportation cost will be equal to the cost on the edges and we have stated the shortest path problem as a minimum cost ﬂow problem. Furthermore we introduce a new edge from the sink back to the source with unlimited capacity (at least larger than or equal to e∈E ue ) and a unit cost of −1. The transportation can be stated as:
min
e∈E
we xe xij = bj
i∈I. The maximum ﬂow problem can be modelled as a minimum cost ﬂow problem by the following trick: Given a directed graph with capacities on the edges.t.j)∈E
(7.(i.2) j∈J i∈I e∈E
s. As several of the important ﬂow optimization problems can be stated as special cases of the minimum cost ﬂow problem it is interesting to come up with an eﬀective solution approach for the problem. This represents n − 1 paths are “ﬂowing” out of the root and exactly one is ending in each of the other vertices. In either case.
1
5 0 0
4
2
3 4
4
0.
..4
0 2
0.
.. ... . .. . .. −1 .. .. . . .. ..2 from [2] gives a suﬃcient condition for a matrix being totally unimodular. ue . . .
xem wem ... = = = = = = = = ≥ ≥ ≥ ≥ b1 b2 . .. .4
4
(a) The graph of a maximum ﬂow problem
(b) The graph “restated” as a minimum cost ﬂow problem
Figure 7. . −1 ..3
0
0. . . .
xe2 we2 ...5 0..2: Figure (a) shows a maximum ﬂow problem with labels representing ue .. . ..... It states that a matrix A is totally unimodular if:
. j . . em
xe1 we1 −1 .. . . . . n e1 e2 . .M
3
3
3
1
5 1
5
1 0
0. . .. . . 1 . .. Labels on the edges are we . . .118
The Minimum Cost Flow Problem
1. . . . j) . .. . . . . . bn −u1 −u2 ..
. . . −1 ... Note that M should be chosen to be at least 20
1 2 ... . ..3
0..
. . −1
≥ ≥ −um
Proposition 3. . .. . In Figure (b) the same maximum ﬂow problem has been reformulated as a minimum cost ﬂow problem. . bi bj . . (i. i . . . ... −uij .
xij wij . . . ..
.. ... . 1 . −1 .
.
The matrix is therefore totally unimodular. j) as wij = wij + yi − yj .1) it is a straightforward task to formulate the dual LP.3)
s.7. If we let M1 contain the remaining constraints and M2 = ∅ the conditions of proposition 3. (i. It is let to the reader to prove that the entire matrix with the xe ≤ ue constraints also is totally unimodular. In order to come up with a more eﬃcient approach we take a look at optimality conditions for the minimum cost ﬂow problem.j)∈E
uij zij (i.1
Optimality conditions
Having stated the primal LP of the minimum cost ﬂow problem in (7. Each column contains at most two nonzero coeﬃcients (
m i=1
119
aij  ≤ 2). aij ∈ {+1.
3. 2.
−yi + yj − zij ≤ wij ⇔ −wij − yi + yj ≤ zij zij ≥ 0
Deﬁne the reduced cost wij of an edge (i. A consequence is the the minimum cost ﬂow problem can be solved by simply removing integrality constraints from the model and solve it using an LP solver like eg. The dual problem is: max
i∈V
bi y i −
(i. j. i ∈ V . Hence.
7. −1. 0} for all i. j) ∈ E. j) ∈ E (i.1 Optimality conditions 1. and those corresponding to the capacity constraints are denoted zij . ¯ ¯ the constraint −wij − yi + yj ≤ zij is equivalent to
−wij ≤ zij ¯
.t. CPLEX. The dual LP has two sets of variables: the dual variables corresponding to the ﬂow balance equations are denoted yi . Consider the matrix without the xe ≤ ue constraints.2 is fulﬁlled. There exists a partition {M1 . M2 } of the set M of rows such that each column j containing two nonzeros satisﬁes i∈M1 aij − i∈M2 aij = 0. j) ∈ E
(7. j) ∈ E (i.
that is. −we }. i) ∈ E ∧ xji > 0}. in which the edges indicate how ﬂow excess can be moved in G given that the ﬂow x already is present. j) : (j.e. that is we < 0 ⇒ xe = ue ¯ ¯ Summing up: A primal feasible ﬂow satisfying demands respecting the capacity constraints is optimal if and only if there exists a dual solution ye . that is we > 0 ⇒ xe = 0 ¯ ¯ and ze > 0 ⇒ xe = ue i. Generally. The condition can be used to deﬁne a solution approach for the minimum cost ﬂow problem. − we > 0 ⇒ xe = ue . no capacity constraints for the edges) then ze must be 0 and hence just we ≥ 0 has to hold. the residual graph is (like for maximum ﬂow problem) a graph. and hence ze is “unnecessary” in the dual problem. the ¯ optimal value of ze is uniquely determined from the other variables. As z have negative ¯ coeﬃcients in the objective function of the dual problem the best choice for z is as small as possible. complementary slackness states that 1) each primal variable times the corresponding dual slack must equal 0. If ue = +∞ (i. The residual graph Gx for G wrt. 2. For a legal ﬂow x in G. xe > 0 ⇒ −we ≥ 0. j) ∈ E : (i. The constraint now corresponds to ¯ the primal optimality condition for the LP.e. j) ∈ E ∧ xij < uij } ∪ {(i. −we ) ¯ ¯ i. y) of optimal solutions satisfy these conditions. e ∈ E such that for all e ∈ E it holds that: we < 0 ¯ we > 0 ¯ implies implies xe = ue (= ∞) xe = 0
All pairs (x.e. The only diﬀerence to the residual graph of the maximum ﬂow problem is that each edge additionally has a cost assigned. y and z optimal? Let us derive the optimality conditions based on the primal and dual LP. x is deﬁned by V (Gx ) = V and E(Gx ) = Ex = {(i. 2) and each dual variable times the corresponding primal slack must equal 0 in optimum). ze = max{0.
.120
The Minimum Cost Flow Problem
When is the set of feasible solutions x. To present the approach we initially deﬁne a residual graph. In our case this results in: xe > 0 ⇒ −we = ze = max(0. Therefore. If ue < ∞ then ze ≥ 0 and ze ≥ −we must hold. Complementary slackness conditions is used to obtain optimality conditions. Consider the two diﬀerent cases: 1.
The reason for this deﬁnition can be explained by considering an xincrementing path P .7. Suppose we now send one unit of ﬂow from one end to the other of the path.3. Subsequently the cost must reﬂect the change. A primal feasible ﬂow satisfying demands and respecting the capacity constraints is optimal if and only if an xaugmenting circuit with negative wcost (or negative wcost. i ∈ V are given. ¯ This is the idea behind the identiﬁcation of optimal solutions in the network simplex algorithm that will be presented later.5. there is no diﬀerence).3.x)
p
(G x.2 −2 1
121
b
a
demand=2
4. if costs are added for forward edges and subtracted for backward edges.4. hence we have ′ ′ wij = wij for forward edges and wij = −wji for backward edges. while wij of an edge with xji > 0 is −wji .1 Optimality conditions
capacity=7 b 2 1. P contains both forward and backward edges. u. the cost ¯ remains the same as the original costs as the potentials are “telescoped” to 0.3. see Figure 7. An initial and simple idea is to ﬁnd a feasible ﬂow and then in each iteration test
.c. c. E.0 1. and the cost of a circuit wrt.u. Note that if a set of potentials yi . x) and its residual graph
′ ′ The unit transportation cost wij of an edge with xij < uij is wij .6.5
q capacity=1
a −4 −1
q
4.
Note that a dicircuit with negative cost in Gx corresponds to a negative cost circuit in G. does not exist.3: G = (V. c’)
Figure 7. the reduced costs for the edges (wij = wij + yi − yj ) are calculated. Then xe will be raised by one on forward edges and lowered by one on backward edges.0 2.1
4 1
p demand=6 (G.
Assume G is connected. Otherwise the problem simply reduces to a transshipment problem for each of the components in G. We will therefore turn our attention to one of the methods that is currently most used. Most often these algorithms does not perform as well as the most eﬀective methods.2
Network Simplex for The Transshipment problem
In order to get a better understanding of the network simplex approach we initially look at the minimum cost ﬂow problem without capacities. and no forward edge of ﬁnite capacity then STOP Augment x on C
When performing a complete O(nm) computation in every iteration it is paramount to keep the number of iterations low. Several implementations of the augmenting circuit algorithm exist. E). This algorithm is called the augmenting circuit algorithm Algorithm 14: The Augmenting Circuit Algorithm
1 2 3 4 5 6
Find a feasible ﬂow x while there exists an augmenting circuit do Find an augmenting circuit C if C has no reverse edge. T is a tree in the underlying undirected graph. It has a running time of O(mn). is a ﬂow x ∈ RE satisfying: ∀i ∈ V ∀e ∈ T / : : fx (i) = bi xe = 0
So there is no ﬂow on edges not in T . and all demands are satisﬁed. namely the Network Simplex Method. This problem is also called the transshipment problem. E) with demands di for each vertex i ∈ V and unit transportation cost we for each edge e ∈ E ﬁnd the minimum cost ﬂow that satisﬁes the demand constraints. This test could be done using the BellmannFord algorithm. Note that
. Given a directed graph G = (V. i ∈ V and costs we . A tree in a directed graph is a set T ⊆ E. e ∈ E. such that.
7. demands bi .122
The Minimum Cost Flow Problem
for the existence a negative circuit by trying to ﬁnd a negative cost circuit in the residual graph. A tree solution for the transshipmentproblem given by G = (V.
It can be any vertex. The answer is yes. one containing r denoted Rh and the T T T remainder Ph = V \ Rh (see Figure 7.4). and can be seen by the following constructive argument.7.
(7. Consequently this will allow us to only look at tree solutions.
(7. Select a vertex r and call it the root. since edges with negative ﬂow may exist (cf.2 Network Simplex for The Transshipment problem
123
R
5 1
T h
Ph
4 3 2 7
T
4
4
0
8
1
5
3
0
5
0 2
7 3 6
4 5 h 0
1 8
7
8
8
6
3
4
9
Figure 7.4: Let vertex 1 be the root vertex. Initially we may ask whether there exists a tree solution for any tree T . It is essential to note that any feasible solution can be transformed into a tree solution.4)
T T Alternatively if h starts in Ph and ends in Rh then
xh =
T i∈Rh
bi . In the example in Figure 7. basic solutions and feasible basic solutions in linear programming).4 xh = 8. Consider edge h as shown it deﬁnes T T two sets Rh and Ph tree solutions are normally not feasible. If h starts in Rh and (consequently) T ends in Ph then xh =
T i∈Ph
bi = −
T i∈Rh
bi .
. b) has a feasible solution. The edge h partitions V into two sets. then it also has a feasible tree solution. Theorem 7.1 If the transshipment problem deﬁned by G = (V. E. Consider an edge h in T T .5)
So a tree T uniquely deﬁnes a tree solution. w.
The results of these two theorems are quite powerful. If the ﬂow is a tree solution we are done.2 If the transshipment problem deﬁned by G = (V.124
The Minimum Cost Flow Problem
Proof: Let x be a feasible solution in G. the cost of the path from i to j in T equals yj − yi (why ?). Therefore we must terminate after a ﬁnite number of iterations. Let α = min{xe : e ∈ C. j ∈ V . i ∈ V to the cost of the (simple) path in T from r to i (counted with sign: plus for a forward edge. and by xe − α if e is a backward edge in C. and moreover. This means that the approach used in the proof above can be used to generate a new optimal solution in which fewer edges have a positive ﬂow. e is a backward edge}. We can assume that C has at least one backward edge.5. minus for a backward edge). But then the reduced costs wij (deﬁned by wij + yi − yj ) satisfy: ¯ ∀e ∈ T ∀e ∈ T / : : we = 0 ¯ T we = w(Ce ) ¯
T / If T determines a feasible tree solution x and Ce has nonnegative cost ∀e ∈ T
. Set the vertex potential yi . Consider the vector of potentials y ∈ RV . So it can be proved that: Theorem 7. The new x is feasible. w. C must have zero cost. b) has an optimal solution then it also has an optimal tree solution. Note that each time we make an “update” to the ﬂow. △ If the ﬂow x is an optimal solution and there exists a circuit C with positive ﬂow. It now holds that for all i. If x is not a tree solution then there must exist a circuit C where all edges have a positive ﬂow. the network simplex method moves from tree solution to tree solution using negative cost T circuits Ce consisting of tree edges and exactly one nontree edge e (think of the tree edges as basic variables and the nontree edge as the nonbasic variable of a simplex iteration). Each edge e will deﬁne a unique circuit with the characteristics:
T • Ce ⊆ T ∪ {e} T • e is an forward edge in Ce
This is illustrated in Figure 7. if not. Remember that the nontree edge e uniquely deﬁnes a circuit with T . we continue the procedure. E. From now on we can restrict ourselves to look only on tree solutions. Now replace xe by xe + α if e is a forward edge in C. has one circuit less that the initial ﬂow. Consequently. at least one edge with get ﬂow equal to 0 and that we never put ﬂow onto an edge with a ﬂow of 0.
Now the network simplex algorithm for the ¯ transshipment problem can be stated: Algorithm 15: The Network Simplex Algorithm for the Transshipment Problem 1 ﬁnd a tree T with a corresponding feasible tree solution x 2 select an r ∈ V as root for T 3 compute yi as the length of the r − i–path in T . 3 and the anti clockwise orientation
(i. We can compute y in time O(n) and thereafter w can be computed in O(m). we ≥ 0). 6. 5. 4. 5) deﬁnes the circuit 3. An issue that we have not touched upon is how to construct the initial tree. costs counted with sign 4 while ∃(i. j) : wij = wij + yi − yj < 0 do ¯ 5 Select an edge with wij < 0 ¯ T 6 if all edges in Ce are forward edges then 7 STOP /* the problem is ‘‘unbounded’’ */
8 9 10 11 T ﬁnd θ = min{xj : j backward in Ce } and an edge h with xh = θ T increase x with θ along Ce T ← (T ∪ {e}) \ {h} update y
Remember as ﬂow is increased in the algorithm it is done with respect to the orientation deﬁned by the edge that deﬁnes the circuit. that is.5: An example of the deﬁnition of the unique circuit Ce by the edge e.e.7. With ¯ the simple operations involved and the modest time complexity this is certainly going to be faster than running BellmanFords algorithm. Testing whether T satisﬁes the optimality conditions is relatively easy. One
. increase ﬂow in forward edges and decrease ﬂow in backward edges. then x is optimal.2 Network Simplex for The Transshipment problem
125
2
5
7
3 1 4
6
8
T Figure 7. The edge (6.
So we get the tree solution in the graph in the upper right of Figure 7.10
1
1. and can be computed by (7. Not all these edges may exist.5) deﬁned earlier. Let us demonstrate the network simplex algorithm by the example in Figure 7.126
The Minimum Cost Flow Problem
5 1
2
3
0
3
5 6
5 1
2. 5)} we let vertex 2 be the root vertex. 6). (2.15
3
0
3.15
3
0
3. If they do not exist we add them to G but assign them a large enough cost to secure they will not be part of an optimal solution. (4. 1).10
10 5
10 2
5
0 4
4
10 5
(c) Step 1
(d) Step 2
Figure 7.6: The solution of an transshipment problem using the network simplex algorithm.10
1
1. xe . First.5
5 6
1
1
1
1. (1.4) and (7.6.10
0 4
4.5
3
0
3. The ﬂow corresponds to a value of 115. r) where i ∈ V \ {r} and bi < 0. Now the feasible solution wrt.5
5 6
5 1
2.10
10 2
5
0 4
4.10
1.10
10 5
(a) Initial graph
(b) Initial tree solution
5 1
2. (3. 3).0
1
1
10 2
5
0 4
4
10 5
10 2
5. Having determined the tree as T = {(2. tree solution is uniquely deﬁned. The labels on the edges are ue . Bold edges identify the tree solution
approach is the following: We can use the tree T whose edges are (r. we need to ﬁnd an initial feasible tree solution.15
5 6
1.
. 4). i) where i ∈ V \ {r} and bi ≥ 0 and edges (i.6.
4) we can increase the ﬂow on the other edges by 10. In general. 4) in C1 increase by one. 6). The optimum ﬂow has a value of 95. This gets us the solution in the lower right graph. 8. 4) are backward edges. i ∈ V . 7. It can also be viewed as an interpretation of the bounded variable simplex method of linear programming. The resulting ﬂow is shown in the lower left graph. 5. 6).7. capacities ue . e ∈ E and costs we . We update y and get (1. 0. 6) there are no more negative cost circuits and the algorithm terminates. 4) cannot be a candidate for a negative cost circuit as we have just removed it from the tree solution. As the current ﬂow is 10 on (2. we will remove (4. 3. The edges (3. 0. Here the ﬂow has a value of 105. 4) and (6.3
Network Simplex Algorithm for the Minimum Cost Flow Problem
We now consider the general minimum cost ﬂow problem given by the network G = (V. We can choose either one of them. e ∈ E. The backward edges both have a ﬂow of 10 so we decrease both to zero. 0. The choice is really arbitrary. the algorithm for the general case of the minimum cost ﬂow problem is a straightforward extension of the network simplex algorithm for the transshipment problem as described in the previous section. 4). Let us select the circuit C1 deﬁned by (3. but the edge (6. 5) deﬁnes a negative cycle C2 . 9. Let us check if we have a negative cost circuit. 5) and (3. 3. 3.
7. while ﬂows on the forward edges are increased by 10. 4) is discarded from the tree solution and (3. while the ﬂow decreases by one on the (2.
. (2. In C2 the forward edges are (6. 4. 6). One of the backward edges must leave the tree solution. 4) is added. 5) from the tree solution and add (6. Now we check for potential negative cost circuits: ¯ w24 ¯ w65 = y2 + w24 − y4 = 0 + 5 − 4 = 1 = y6 + w65 − y5 = 6 + 1 − 8 = −1
Naturally edge (2. 4) edge. while (4. 5) both uniquely determine two circuits with cost: w34 ¯ w65 ¯ = y3 + w34 − y4 = 3 + 1 − 5 = −1 = y6 + w65 − y5 = 6 + 1 − 9 = −2
So both edges deﬁne a negative cost circuit. 5). 4. 5) and (3. E) with demands bi . Every time we push one unit of ﬂow around the circuit the ﬂow on all edges but (2.3 Network Simplex Algorithm for the Minimum Cost Flow Problem
127
Now we compute the potential vector y = (1. With the updated y being (1.
j∈Ph }
+
T T {(i. u. The tree solution x must satisfy: ∀i ∈ V ∀e ∈ L ∀e ∈ U : fx (i) = bi : xe = 0 : xe = ue
As before.
T T Consider now the modiﬁed demand B(Ph ) in Ph given that a tree solution is sought: T B(Ph ) =
T i∈Ph
bi uij
−
T T {(i. L and U . E \ T is hence partitioned into two sets of edges.j)∈U :i∈Ph .j∈Rh }
uij
T T If h starts in Rh and ends in Ph then set T xh = B(Ph ) T T and if h starts in Ph and ends in Rh then set T xh = −B(Ph ). Again h partitions V into two: a part containing T T T r denoted Rh and a remainder Ph = V \ Rh . w.
Now the algorithm for the general minimum cost ﬂow problem is based on the following theorem. then it has a feasible tree solution. Consider an edge h in T .
. the tree solution for T is unique: Select a vertex r called the root of T . that will be presented without proof: Theorem 7. E.128
The Minimum Cost Flow Problem
Remember the optimality conditions for a feasible ﬂow x ∈ RE : we < 0 ⇒ ¯ we > 0 ⇒ ¯ xe = ue (= ∞) xe = 0
The concept of a tree solution is extended to capacity constrained problems. A nontree edge e may now have ﬂow either zero or ue . If it has an optimal solution. Edges in L must have ﬂow zero and edges in U must have ﬂow equal to their capacity. b) has a feasible solution.3 If G = (V.j)∈U :i∈Rh . then it has an optimal tree solution.
4).L. (2.7 show the iterations. This will constitute a feasible solution and will present us with a feasible tree solution. the ﬂow must be increased in forward edges respecting capacity constraints and decreased in backward edges respect the nonnegativity constraints. Given e and T. uri = −bi and correspondingly ¯ we get ui¯ = di for a sink vertex i. 5) are in L as their ﬂow is equal to zero. 4). This circuit the satisﬁes:
T. Figure 7.U cost. The capacity on edges from are set to s the supply/demand of that node. 3) and (2. and otherwise it is a backward edge.
With ﬁnite capacities it is not so easy to come up with an initial feasible tree solution. Now edges from r to a source vertex i get capacity equal to ¯ ¯ the absolute value of the demand of i. In this. 4) and (6. Let us take (3. 3) as the ﬂow on this edge only can be increased by ﬁve units. 3) belongs to U whereas (6.U • If e ∈ L it is a forward edge of Ce . We recompute y to be
.3 Network Simplex Algorithm for the Minimum Cost Flow Problem
129
The network simplex algorithm will move from tree solution to tree solution only using circuits formed by adding a single edge to T . We now search for either a nontree edge e with xe = 0 and negative reduced cost or a nontree edges e with xe = ue and positive reduced T. The bottleneck in this circuit becomes (1. 3. 4) which gets us to step 1. T. 4) being a backward edges. 6. that is. Both nontree edges (3. U .L. One approach is to add a “super source” r vertex and a “super sink” ¯ vertex s to G. 0.U • Each edge of Ce is an element if T ∪ {e}. Now we compute y = (1. Let us ﬁnish of the discussion on the network simplex by demonstrating it on an example. then we consider sending ﬂow in the opposite direction. the corresponding circuit is denoted Ce . Note now that the nontree edge (1. We can now state the algorithm for the minimum cost ﬂow problem. L and . 9) and for each of the two nontree edges we get: w34 ¯ w65 ¯ = y3 + w34 − y4 = 3 + 1 − 5 = −1 = y6 + w65 − y5 = 6 + 1 − 9 = −2
Both will improve the solution. This deﬁnes a unique cycle with forward edges (3. In this case it is easy to come up with a feasible tree solution. 5. However. 1) and (1. We increase the ﬂow on forward edges by ﬁve and decrease it by ﬁve on (2. On the “extended” G we solve a maximum ﬂow problem. 5) remains in L.7. if the nontree edge being consider has ﬂow equal to capacity.L.
7: An example of the use of the network simplex algorithm.15
4
0
4.5
1.15.5
4
0
4.10
4
0
4.15.130
The Minimum Cost Flow Problem
5 1
2.15.5
1.15.10
3
0
3.15
1.15
10 5
10 2
5.15.10.15.15.15
1. Labels on the edges represent we .8.0
1.8.5
5 6
5 1
2.8
5 6
5 1
2.5
4
0
4.15. ue .8
5 6
1.10.7
10 5
(c) Step 1
(d) Step 2
Figure 7. Bold edges identify the tree solution
.2
1. ue respectively we .15.10
10 5
(a) Initial graph
(b) Initial tree solution
5 1
2.8.5
5 6
1.10
3
0
3.15.15.10.10
10 5
10 2
5.10
3
0
3.15.0
1.5
3
0
3.0
1.5
1. xe .15.3
10 2
5.15.15.0
10 2
5.15
1.
U by removing e and inserting h in the relevant one of L and U Update y
9 10 11
12
(1. costs counted with sign (plus for forward edges. 7. and an edge h giving rise to θ ← min{θ1 . whereas (1. We ¯ choose (6. For the two nontree edges we no get: w13 ¯ w65 ¯ = y1 + w13 − y3 = 1 + 2 − 4 = −1 = y6 + w65 − y5 = 7 + 1 − 9 = −1
Even though w13 is negative we can only choose (6. θ2 ← min{uj − xj : j T.7. 3) ∈ U .U increase x by θ along Ce T ← (T ∪ {e}) \ {h} Update L. 6) which is included in U . 9). The edge that deﬁnes the bottleneck is edge (3.L. This gets us to step 2 in the ﬁgure. 4.L.
. 0.U 6 if no edge in Ce is backward and no forward edge has limited capacity then 7 STOP /* the problem is unbounded */
8 T. 5. j) ∈ U s. 3) is included in T .t wij = ¯ ¯ wij + yi − yj > 0) do 5 Select one of these T. 5) and in the circuit deﬁned by that edge the ﬂow can be increase by at most three units. 5) because (1. minus for backward edges) 4 while ∃(i.U ﬁnd θ1 ← min{xj : j backward in Ce }. w13 ¯ w36 ¯ = y1 + w13 − y3 = 1 + 2 − 4 = −1 = y3 + w36 − y6 = 4 + 3 − 8 = −1
Because both nontree edges are in U and have negative reduced cost the tree solution is optimal.L. j) ∈ L s.L. θ2 } forward in Ce T.t wij = wij + yi − yj < 0) ∨ (∃(i.U }.3 Network Simplex Algorithm for the Minimum Cost Flow Problem
131
Algorithm 16: The Network Simplex Algorithm for the Minimum Cost Flow Problem 1 Find a tree T with a corresponding feasible tree solution x 2 Select a vertex r ∈ V as root for T 3 Compute yi as the length of the r − i–path in T .
3.8. ﬁnd a vector y that certiﬁes its optimality. Numbers on edges are in order we .) Let N denote the number of calls. where there are n vertices and m edges. G = (V. bandwidth demand (wi ) and revenue (ri ). ([1])For the minimumcost ﬂow problem of Figure 7. Bandwidth packing problem.4. Consider a communications network.2.4
Figure 7.5
Exercises
1.4. A call i is deﬁned by its starting vertex (si ). xe . (Bandwidth demands are additive: if calls 1 and 2 both use the same edge.3
6.4
Supplementary Notes
There is (naturally) a close relationship between the network simplex algorithm and the simplex algorithm. destination vertex (di ).8 determine whether the indicated vector x is optimal.4.
−2.1 4.6.4. E).2 −3.8: Given x.
.3
5.5. Formulate this as a 01 Integer Linear Programming Problem.0 2. and we seek to maximize proﬁt subject to logical ﬂow requirements and bandwidth limits.2.4 3.1 6.3.3. Each edge has a bandwidth.1 3. An indepth discussion of these aspects of the network simplex algorithm can be found in [1]. and unit cost we .2.2.0 2. ﬁnd y 2.
7. ue .2.8 0.0
−2.0 0.3.2
5. the total bandwidth requirement is w1 + w2 .0
3.132
The Minimum Cost Flow Problem
7. If it is optimal. numbers at vertices are demands.3
2. be .
. 2.1.7. He can also purchase new napkins at the cost co . The unit price of buying new napkins are 25 Danish kroner. 2. He requires rt fresh napkins on the t’th day. (b) Solve the problem for the next 14 days.1: Daily requirement of napkins for the next 14 days. 4. The Caterer Problem. which has three speeds of service f = 1. In addition the price of the three diﬀerent laundry solutions are f = 1 costs 8 Danish kroner. t = 1. Show how to convert this problem to the standard minimum costﬂow problem. Piecewise linear objective function. θk ≤
(k. ([3]) A caterer has booked his services for the next T days. 5. The faster the service the higher the cost cf of laundering a napkin. i. Day Req. . Day Req. f = 2 costs 4 Danish kroner and f = 3 costs 2 Danish kroner. . Suppose that the linear objective function for the minimum cost ﬂow problem is replaced with a separable piecewise linear function.j)∈E
xkj ≤ θk
Show how to modify these restrictions to convert the problem into a standard minimum cost ﬂow problem. or 3 days. He sends his soiled napkins to the laundry. Daily requirement of napkins is given by Table 7. He has an initial stock of s napkins. The caterer wishes to minimize the total outlay.5 Exercises
133
3. T . Flow restrictions. . . (a) Formulate the problem as a network problem. 1 500 8 300 2 600 9 1000 3 300 10 400 4 800 11 600 5 2200 12 1400 6 2600 13 3000 7 1200 14 2000
Table 7.e. ([3]) Suppose that in a minimum cost ﬂow problem restrictions are placed on the total ﬂow leaving a node k.
134
The Minimum Cost Flow Problem
.
[2] Laurence A. Mukund N. Dantzig. 1997.Bibliography
[1] William J. Integer Programming. Cunningham. Wiley Interscience.
. Wiley Interscience. Pulleyblank and Alexander Schrijver. Cook. William R. Linear Programming. Springer. 1998. 1998. Thapa. Combinatorial Optimization. Wolsey. [3] George B. William H.
136
BIBLIOGRAPHY
.
Part II
General Integer Programming
.
.
1
The FloydWarshall Algorithm
Another interesting variant of the Shortest Path Problem is the case where we want to ﬁnd the Shortest Path between all pairs of vertices i. This can be resolved by repeated use one of our “one source” algorithms from earlier in this chapter. j in the problem. But there are faster and better ways to solve the “all pairs” problem. One case is routing of vehicles. This problem is relevant in many situations. The algorithm is based on dynamic programming and considers “intermediate”
. Having a ﬂeet of vehicles and a set of customers that needs to be visited often we want to build a set of routes for the vehicles in order to minimize the total distance driven. One of the algorithms for the all pairs problem is the FloydWarshall algorithm. most often as subproblems to other problems. In order to achieve this we need to know the shortest distances between every pair of customers before we can start to compute the best set of routes. In the all pairs problem we no longer maintain just a vector of potentials and a vector of predecessors instead we need to maintain two matrices one for distance and one for predecessor.Chapter
8 Dynamic Programming
8.
. An intermediate vertex of a simple path is any of the vertices in the path except the ﬁrst and the last. .. k − 1}. . the predecessor vertex for j on the shortest path for all pairs of vertices in {1.. j) containing only vertices in the set {1. k}. . j) with wij = ∞. E. pij ← 0 otherwise.. and that pij contains the immediate predecesor of j on Q. 2. . .1 (Correctness of FloydWarshall’s Algorithm) Given a connected. In this way we may gradually construct shortest paths based on more and more vertices until they are based on the entire set of vertices V . y and p. .. based on shortest paths between (i. n}
1 2 3 4 5 6 7
yij ← wij . can construct shortest paths between (i. The FloydWarshall algorithm will produce a “shortest path matrix” such that for any two given vertices i and j yij is the shortest path from i to j. The initialisation sets up the matrices by initialising each entry in each matrix. w). .. directed graph G = (V... Let us assume that the vertices of G are V = {1. which takes O(n2 ).. j) containing only vertices in the set {1.
. Our induction hypothesis is that prior to iteration k it holds that for i. wii equals 0 for all i Result: Two n × nmatrices. . Algorithm 17: FloydWarshall’s algorithm Data: A distance matrix C for a digraph G = (V. otherwise wij = ∞. pij ← i for all (i. j) ∈ E wij is the distance from i to j. k − 1}. n} × {1. . Theorem 8. . w) with length function w : E → R. If the edge (i. containing the length of the shortest path from i to j resp. . 2. E. . 2. . j ∈ v yij contains length of the shortest path Q from i to j in G containing only vertices in the vertex set {1.140
Dynamic Programming
vertices of a shortest path.
Proof: This proof is made by induction. for k ← 1 to n do for i ← 1 to n do for j ← 1 to n do if i = k ∧ j = k ∧ yij > yik + ykj then yij ← yik + ykj pij ← pkj
The time complexity of FloydWarshall’s algorithm is straightforward. The algorithm is based on an observation that we. The overall complexity is hence O(n3 ). n}. The algorithm has three nested loops each of which is performed n times.. .
.1). k} either 1.. k − 1} so va ∈ {1. we note that the FloydWarshall algorithm can detect negative length cycles.
R2 k R1
p[k. R1 is a path from i to k path.. .. k−1}
Figure 8.. In iteration k. and R2 is a path from k to j path with “intermediate vertices” only in {1. v1 ..8.j] i j
Q p[i. vs .and hence is the one found in iteration k − 1.. . It namely computes the shortest path from i to i (the diagonal of the W matrix) and if any of these values are negative it means that there is a negative cycle in the graph.and then can be decomposed into an i....1 The FloydWarshall Algorithm
141
This is obviously true after the initialisation... . R1 = i.1: Proof of the FloydWarshall algorithm The shortest path from i to j in G containing only vertices in the vertex set {1. by the induction hypothesis. k − 1}..j]
contains only vertices from {1. does not contain k . .. j path. that is. Hence the update ensures the correctness of the induction hypothesis after iteration k.
. k followed by a k. contains k . the length of Q is compared to the length of a path R composed of two subpaths. v2 . or 2. k − 1}.. R1 and R2 (see Figure 8. △ Finally. k with “intermediate vertices” only in {1. . . . The shorter of these two is chosen. has been found in iteration k − 1.. . each of which.
142
Dynamic Programming
.
crew scheduling. All of these problems are N Phard. Examples here are vehicle routing. the Shortest Path problem and the Minimum Spanning Tree problem have polynomial algorithms. Solving N Phard discrete optimization problems to optimality is often an im
. and the Quadratic Assignment problem. the majority of the problems in addition share the property that no polynomial method for their solution is known. Branch and Bound is. are easy to state. and have a ﬁnite but usually very large number of feasible solutions. Even then. and production planning. however. In this chapter we review the main principles of Branch and Bound and illustrate the method and the diﬀerent design issues through three examples: the symmetric Travelling Salesman Problem. an algorithm paradigm. the Graph Partitioning problem.g.Chapter
9 Branch and Bound
A large number of realworld planning problems called combinatorial optimization problems share the following properties: They are optimization problems. While some of these as e. which has to be ﬁlled out for each speciﬁc problem type. and numerous choices for each of the components exist. Branch and Bound is by far the most widely used tool for solving large scale N Phard combinatorial optimization problems. principles for the design of eﬃcient Branch and Bound algorithms have emerged over the years.
The alternative is to start by calculating the bound of the selected node and then branch on the node if necessary. which initially only contains the root. bound calculation. and 9. the status of the solution with respect to the search of the solution space is described by a pool of yet unexplored subset of this and the best solution found so far.1. it is checked whether the subspace consists of a single solution. 9. in which case it is compared to the current best solution keeping the best of these. I go into detail with terminology and problem description and give the three examples to be used succeedingly.e.1.2. and the Branch and Bound paradigm is one of the main tools in construction of these. the initial situation and the ﬁrst step of the process is illustrated.4 brieﬂy com
. the whole subspace is discarded. then the ﬁrst operation of an iteration after choosing the node is branching. This strategy is called lazy and is often used when the next node to be processed is chosen to be a live node of maximal depth in the search tree. The use of bounds for the function to be optimized combined with the value of the current best solution enables the algorithm to search parts of the solution space only implicitly.3 then treat in detail the algorithmic components selection. In Figure 9.144
Branch and Bound
mense job requiring very eﬃcient algorithms. If the selection of next subproblem is based on the bound value of the subproblems. and branching. namely the complete solution space. This is in [2] called the eager strategy for node evaluation.1. subdivision of the solution space of the node into two or more subspaces to be investigated in a subsequent iteration.1. Section 9. bounding and branching. else it is stored in the pool of live nodes together with it’s bound. The unexplored subspaces are represented as nodes in a dynamically generated search tree.1. The chapter is organized as follows: In Section 9.1. and the optimal solution is then the one recorded as “current best”. i. The nodes created are then stored together with the bound of the processed node. The sequence of these may vary according to the strategy chosen for selecting the next node to process. However. Otherwise the bounding function for the subspace is calculated and compared to the current best solution. The iteration has three main components: selection of the node to process. and the best solution found so far is ∞. explicit enumeration is normally impossible due to the exponentially increasing number of potential solutions. Initially only one subset exists.1. At any point during the solution process. The search terminates when there is no unexplored parts of the solution space left. For each of these. and each iteration of a classical Branch and Bound algorithm processes one such node. A Branch and Bound algorithm searches the complete space of solutions for a given problem for the best solution. and Section 9. since bounds are calculated as soon as nodes are available. If it can be established that the subspace cannot contain the optimal solution.
145
S
(a)
S1 S3 S2 S4 S1 S2 S3 S4
(b)
S21
S1 S31 * S32 S22 * S4 (c)
S1
S2
S3
S4
* = does not contain optimal solution
S21
S22 S23
S24
Figure 9.1: Illustration of the search space of Branch and Bound.
.
In Figure 9. but has numerous other applications. In many cases.1 and 9.the case of maximization problems can be dealt with similarly. The problem arises naturally in connection with routing of vehicles for delivery and pickup of goods or persons. A subproblem hence corresponds to a subspace of the original solution space. Figure 9.3. and special constraints determining the structure of the feasible set. a function g(x) deﬁned on S (or P ) with the property that g(x) ≤ f (x) for all x in S (resp.2 illustrates the situation where S and P are intervals of reals.
9. The problem of a biking tourist. that these must be nonnegative integers or binary. The problem is to minimize a function f (x) of variables (x1 · · · xn ) over a region of feasible solutions. containing S.g. a set of potential solutions. S :
z = min{f (x) : x ∈ S} The function f is called the objective function and may be of any type.146
Branch and Bound
ments upon methods for generating a good feasible solution prior to the start of the search. The ﬁrst one is one of the most famous combinatorial optimization problems: the Travelling Salesman problem. and the two terms are used interchangeably and in the context of a search tree interchangeably with the term node. A famous and thorough reference is [11].3 discusses the impact of design decisions on the eﬃciency of the complete algorithm. I then describe personal experiences with solving two problems using parallel Branch and Bound in Section 9. for which f is still well deﬁned.2. and Section 9.1
Branch and Bound . P ) arises naturally.2.
Example 1: The Symmetric Travelling Salesman problem.terminology and general description
In the following I consider minimization problems . a map over the Danish island Bornholm is given together with a distance table showing the distances between major cities/tourist attractions. The set of feasible solutions is usually determined by general conditions on the variables.2. is to ﬁnd a tour of
. Both P and g are very useful in the Branch and Bound context. and often. e. P . I will use the terms subproblem to denote a problem derived from the originally given problem through addition of new constraints. naturally comes to mind. who wants to visit all these major points. In order to make the discussions more explicit I use three problems as examples.
3: The island Bornholm and the distances between interesting sites minimum length starting and ending in the same city.9. The problem is called the symmetric Travelling Salesman problem (TSP) since the table of distances is symmetric.
. and the goal is to ﬁnd a Hamilton tour of minimum length.terminology and general description
147
f
g
S P
Figure 9. In general a symmetric TSP is given by a symmetric n × n matrix D of nonnegative distances.2: The relation between the bounding function g and the objective function f on the sets S and P of feasible and potential solutions of a problem. and visiting each other city exactly once.
C
A A
D
B 11 0 13 20 32 37 17 17
C 24 13 0 16 30 39 29 22
D 25 20 16 0 15 23 18 12
E 30 32 30 15 0 9 23 15
F 29 37 39 23 9 0 14 21
G 15 17 29 18 23 14 0 7
H 15 17 22 12 15 21 7 0
0 11 24 25 30 29 15 15
B
B H E A
C D E
G
F
F G H
Figure 9.1 Branch and Bound . Such a tour is called a Hamilton cycle.
Each of these states for a speciﬁc subset Z of V that the number of edges connecting vertices in Z has to be less than Z thereby ruling out that these form a subtour. The problem may also be stated mathematically by using decision variables to describe which edges are to be included in the tour. We introduce 01 variables xij . this implies that exactly n variables are allowed to take the value 1. we consider a complete undirected graph with n vertices Kn and nonnegative lengths assigned to the edges. 1}.. j) is usually modeled by setting cij to ∞.) The problem is then
n−1 n
min
i=1 j=i+1
dij xij
such that
i−1 n
xki +
k=1 k=i+1
xik = 2.. The given constraints determine the set of feasible solutions S. . and interpret the value 0 (1 resp. n} ∅⊂Z⊂V
i. In Section 9. which in a Branch and Bound context turns out to be more appropriate. Exclusion of an edge (i.4.
i. cf.) to mean ”not in tour” (”in tour” resp.. Since each edge has two endpoints..1. The set of potential solutions P is then the family of all sets of subtours such that each i belongs to exactly one of the subtours in each set in the family. The second set of constraints consists of the subtour elimination constraints. n}
The ﬁrst set of constraints ensures that for each i exactly two variables corresponding to edges incident with i are chosen.1 another possibility is decribed. Unfortunately there are exponentially many of these constraints. One obvious way of relaxing this to a set of potential solutions is to relax (i. xij < Z
i ∈ {1. A subproblem of a given symmetric TSP is constructed by deciding for a subset A of the edges of G that these must be included in the tour to be constructed. j ∈ {1. . and the goal is to determine a Hamilton tour of minimum length. discard) the subtour elimination constrains.. whereas the inclusion of an
.j∈Z
xij ∈ {0.148
Branch and Bound
In terms of graphs. while for another subset B the edges are excluded from the tour. Figure 9. 1 ≤ i < j ≤ n.e..
v∈V1 . 5]. We consider a given weighted.
Example 2: The Graph Partitioning Problem.g. graph contraction.u∈V2
.terminology and general description
C
149
D B H E A
G
F
Figure 9. A 11 correspondence between partitions and assignments of values to all variables now exists: xv = 1 (respectively = 0) if and only if v ∈ V1 (respectively v ∈ V2 ). The number of feasible solutions to the problem is (n − 1)!/2. Deﬁne for each vertex v of the given graph a variable xv .9. The cost of a partition is then cuv xv (1 − xu ). and a cost function c : E → N . The problem is to partition V into two disjoint subsets V1 and V2 of equal size such that the sum of costs of edges connecting vertices belonging to diﬀerent subsets is as small as possible. but not feasible solution to the biking tourist’s problem
edge can be handled in various ways as e.1 Branch and Bound . undirected graph G with vertex set V and edge set E .5 shows an instance of the problem: The graph partitioning problem can be formulated as a quadratic integer programming problem. Figure 9. 3 × 1062 △
The following descriptions follow [4. where it is necessary to minimize the number (or weight of) connections between two parts of a network of prescribed size.4: A potential. The Graph Partitioning problem arises in situations. which can attain only the values 0 and 1. which for n = 50 is appr.
5: A graph partitioning problem and a feasible solution. △
Example 3: The Quadratic Assignment Problem. When some of the vertices have been assigned to the sets (the corresponding variables have been assigned values 1 or 0). Initially V1 and V2 are empty corresponding to that no variables have yet been assigned a value.6 × 1034 . xv = V /2. Here. For 2n = 120 the number of feasible solutions is appr. The number of feasible solutions to a GPP with 2n vertices equals the binomial coeﬃcient C(2n. I consider the KoopmansBeckman version of the problem. n). The natural set P of potential solutions are all partitions of V into two nonempty subsets. 9. A constraint on the number of variables attaining the value 1 is included to exclude infeasible partitions. which can informally be stated with reference to the following practical situation: A company is to decide the assignment of n of facilities to an equal number of locations and wants to minimize
. a subproblem has been constructed.
v∈V
The set of feasible solutions S is here the partitions of V into two equalsized subsets.150
Branch and Bound
C 8 B 1 12
2 D
6 5 E
A 1 7 10 4 F 3 G 5
H
Figure 9.
2. and D on sites 1.k is known.1 Branch and Bound .terminology and general description
B 2
151
8
2
5
2
1
2
D 12 A 8 11 C 1 3
4 3 2 3
Figure 9. k. For each pair of facilities (i. and for each pair of locations (l. and subproblems of the original problem arise when some but not all facilities have been assigned to locations.g.6 shows a small example with 4 facilities and 4 locations. the problem can hence formally be stated as
n n
min{
i=1 j=1
fi.3. given that i is assigned to location l and j is assigned to location k.j · dl. △
The solution of a problem with a Branch and Bound algorithm is traditionally
. k) the corresponding distance dl.
A set of potential solutions is e. obtained by allowing more than one facility on each location. The assignment of facilities A. 2.π(j) : π ∈ S}. j) a ﬂow of communication fi.C.9.j · dπ(i). Initially no facilities have been placed on a location. and 4 respectively has a cost of 224.43 × 1018 .B. and the objective of the company is to ﬁnd an assignment minimizing the sum of all transportation costs. Each feasible solution corresponds to a permutation of the facilities. the total transportation cost. and letting S denote the group of permutations of n elements.6: A Quadratic Assignment problem of size 4. Again the number of feasible solutions grows exponentially: For a problem with n facilities to be located. is fi. which for n = 20 is appr. Figure 9.j is known. The transportation cost between facilities i and j. the number of feasible solutions is n!.
In this way the subspace is subdivided into smaller subspaces. Otherwise the possibility of a better solution in the subproblem cannot be ruled out. a feasible solution to the problem is produced in advance using a heuristic. Usually g is required to satisfy the following three conditions: 1. cf. g(Pi ) = f (Pi ) for all leaves in the tree 3. and the descendants of Q are those subproblems. g(Pi ) ≤ f (Pi ) for all nodes Pi in the tree 2. In case no feasible solutions to the subproblem exist the subproblem is also fathomed. In case the node corresponds to a feasible solution or the bound is the value of an optimal solution. If the bound is no better than the incumbent. and which provides closer and closer (or rather not worse) bounds when more information in terms of extra constraints for a subproblem is added to the problem description. For many problems. If the eager strategy is used. in which the root node corresponds to the original problem to be solved. which for any leaf agrees with the objective function. g(Pi ) ≥ g(Pj ) if Pj is the father of Pi These state that g is a bounding function. and the value hereof is used as the current best solution (called the incumbent). whereas for internal nodes the value is a lower bound for the value of any solution in the subspace corresponding to the node. For leaves the bound equals the value of the corresponding solution. and the node (with
. The search tree is developed dynamically during the search and consists initially of only the root node. a branching is performed: Two or more children of the node are constructed through the addition of constraints to the subproblem of the node. and the best solution and its value are kept. To each node in the tree a bounding function g associates a real number called the bound for the node. Given a node Q of the tree. the value hereof is compared to the incumbent. below. the subproblem is discarded (or fathomed). which satisfy the same constraints as Q and additionally a number of others. For each of these the bound for the node is calculated. instances exist with an exponential number of leaves in the search tree. a node is selected for exploration from the pool of live nodes corresponding to unexplored feasible subproblems using some selection strategy. and each other node corresponds to a subproblem of the original problem. In each iteration of a Branch and Bound algorithm. and for all N Phard problems. The leaves correspond to feasible solutions. the children of Q are subproblems derived from Q through imposing (usually) a single new constraint for each subproblem.152
Branch and Bound
described as a search through a search tree. possibly with the result of ﬁnding the optimal solution to the subproblem. since no feasible solution of the subproblem can be better that the incumbent.
terminology and general description
153
the bound as part of the information stored) is then joined to the pool of live subproblems. If the lazy selection strategy is used. Below. and the live nodes are stored with the bound of their father as part of the information. .1 Branch and Bound . the order of bound calculation and branching is reversed. Algorithm 18: Eager Branch and Bound Data: specify Result: specify
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
Incumbent ← ∞ LB(P0 ) ← g(P0 ) Live ← {(P0 . LB(Pi ))} until Live = ∅ OptimalSolution ← Solution OptimalValue ← Incumbent
..Pk for 1 ≤ i ≤ k do Bound Pi : LB(Pi ) := g(Pi ) if LB(Pi ) = f (X) for a feasible solution X and f (X) < Incumbent then Incumbent ← f (X) Solution ← X Live ← ∅ if LB(Pi ) ≥ Incumbent then fathom Pi else Live := Live ∪ {(Pi .9. the two algorithms are sketched. LB(P0 ))} repeat Select the node P from Live to be processed Live ← Live \ {P } Branch on P generating P1 ..
. a bounding function providing for a given subspace of the solution space a lower bound for the best solution value obtainable in the subspace. a strategy for selecting the live solution subspace to be investigated in the current iteration. In the following. 2.
..Pk for 1 ≤ i ≤ k do Live := Live ∪ {(Pi .154
Branch and Bound
Algorithm 19: Lazy Branch and Bound Data: specify Result: specify
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20
Incumbent ← ∞ Live ← {(P0 . . LB(Pi ))} until Live = ∅ OptimalSolution ← Solution OptimalValue ← Incumbent
A Branch and Bound algorithm for a minimization problem hence consists of three main components: 1. a branching rule to be applied if a subspace after investigation cannot be discarded. hereby subdividing the subspace considered into two or more subspaces to be investigated in subsequent iterations. I discuss each of these key components brieﬂy. and 3. LB(P0 ))} repeat Select the node P from Live to be processed Live ← Live \ {P } Bound P LB(P ) ← g(P ) if LB(P ) = f (X) for a feasible solution X and f (X) < Incumbent then Incumbent ← f (X) Solution ← X Live ← ∅ if LB(Pi ) ≥ Incumbent then fathom P else Branch on P generating P1 .
terminology and general description
155
In addition to these. One often experiences a trade oﬀ between quality and time when dealing with bounding functions: The more time spent on calculating the bound.
9. the better the bound value usually is. dominance tests.1. but since obtaining this value is usually in itself N Phard.1 Branch and Bound . for each of which the optimal solution will provide a lower bound for the given objective function. in polynomial time). and weak if the values produced are far from the optimum. and that g(x) ≤ f (x) on P . The “skill” here is of course to chose P and/or g so that one of these is easy to solve and provides tight bounds.e. Hence there are two standard ways of converting the N Phard problem of solving a subproblem to optimality into a Pproblem of determining a lower bound for the objective function. If no such heuristic exists.g.9. It should be noted that other methods to fathom solution subspaces exist. For further reference see [8]. the succeeding discussion. the following is easily seen to hold: minx∈P f (x) minx∈S g(x)
minx∈P g(x) ≤
≤ minx∈S f (x)
If both of P and g exist there are now a choice between three optimization problems. This corresponds
. The ﬁrst is to use relaxation . but these are normally rather problem speciﬁc and will not be discussed further here. Due to the fact that S ⊆ P . cf. A bounding function is called strong. an initial good feasible solution is normally produced using a heuristic whenever this is possible in order to facilitate fathoming of nodes as early as possible. The objective function of the problem is maintained.leave out some of the constraints of the original problem thereby enlarging the set of feasible solutions. It is normally considered beneﬁcial to use as strong a bounding function as possible in order to keep the size of the search tree as small as possible. e. if it in general gives values close to the optimal value for the subproblem bounded.1
Bounding function
The bounding function is the key component of any Branch and Bound algorithm in the sense that a low quality bounding function cannot be compensated for through good choices of branching and selection strategies. the goal is to come as close as possible using only a limited amount of computational eﬀort (i. the initial value of the incumbent is set to inﬁnity. Ideally the value of a bounding function for a given subproblem should equal the value of the best feasible solution to the problem. Bounding functions naturally arise in connection with the set of potential solutions P and the function g mentioned in Section 2.
156
Branch and Bound
to minimizing f over P .e. Otherwise. leaving out the potential costs of transportation between one unassigned and one assigned.
. and costs between two assigned vertices. However. 5]. The TSP provides a good example hereof. as well as between two unassigned facilities. it is also optimal for this. GPP. However. Combining the two strategies for ﬁnding bounding functions means to minimize g over P . The other way of obtaining an easy bound calculation problem is to minimize g over S. In GPP this corresponds to the cost on edges connecting vertices assigned to V1 in the partial solution with vertices assigned to V2 in the partial solution. to maintain the feasible region of the problem. If the optimal solution to the relaxed subproblem satisﬁes all constraints of the original subproblem. and is hence a candidate for a new incumbent. a parameterized family of lower bounds may result. The most trivial and very weak bounding function for a given minimization problem obtained by modiﬁcation is the sum of the cost incurred by the variable bindings leading to the subproblem to be bounded. and ﬁnding the parameter giving the optimal lower bound may after all create very tight bounds. For e. Again one can be sure that a lower bound results from solving the modiﬁed problem to optimality. but modify the objective function at the same time ensuring that for all feasible solutions the modiﬁed function has values less than or equal to the original function. and at ﬁrst glance this seems weaker than each of those. One alternative is the 1tree relaxation.g. in [4. cf. however. the RoucairolHansen bound for GPP and the GilmoreLawler bound for QAP as described e. the value is a lower bound because the minimization is performed over a larger set of values than the objective function values for feasible solutions to the original problem.g. the bounds produced this way are rather weak. one way of relaxing the constraints of a symmetric TSP is to allow subtours.
Example 4: The 1tree bound for symmetric TSP problems. Hence all feasible solutions for the subproblem are assigned the same value by the modiﬁed objective function. and leaving out any evaluation of the possible costs between one assigned and one unassigned vertex. Much better bounds can be obtained if these potential costs are included in the bound. In QAP. i. Bounds calculated by socalled Lagrangean relaxation are based on this observation these bounds are usually very tight but computationally demanding. an initial and very weak bound is the transportation cost between facilities already assigned to locations. it is generally not true that the optimal solution corresponding to the modiﬁed objective function is optimal for the original objective function too. a relaxation is to drop the constraint that the sizes of V1 and V2 are to be equal. As mentioned.
9. “#1”. Hence the set of Hamilton 1 2 tours of G is a subset of the set of 1trees of G. Now for each edge (i. “#1” and all edges incident to this are removed from G. Then the two shortest edges e1 . Since each ij vertex in a Hamilton tour is incident to exactly two edges. we have found the optimal solution to our subproblem – otherwise a vertex of degree at least 3 exists and we have to perform a branching.1. Since the latter is 0. The 1tree bound can be strengthened using the idea of problem transformation: Generate a new symmetric TSP problem having the same optimal tour as the original.7 (c) shows the ﬁrst transformation for the problem of Figure 9. for which the 1tree bound is tighter. there is a tradeoﬀ between time and strength of bound: should one branch or should one try to get an even stronger bound than the current one by a problem transformation ? Figure 9. whereas vertices of degree 1 have too many unattractive edges. Note that the sum over V of the values π equals 0 since Tone has n edges. Denote by πi the degree of vertex i minus 2: πi := deg(vi ) − 2. Figure 9. and hence the sum of deg(vi ) equals 2n. In case Tone is a tour. Hence. and staying within the memory capacity of the computer used. e2 are the two shortest edges incident to “#1” and Trest is the minimum spanning tree in the rest of G. Hence calculating the 1tree bound for the transformed problem often gives a better bound. The total cost of Tone is a lower bound of the value of an optimum tour. for large instances a tour seldomly results. cf. Since e1 .
9. the costs of all tours are unchanged. e2 incident to “#1” are added to Trest producing the 1tree Tone of G with respect to “#1”. but not necessarily a 1tree. which may be any vertex of the graph. The trick may be repeated as many times as one wants.7 (b). The argument for this is as follows: First note that a Hamilton tour in G consists ′ of two edges e′ . and a minimum spanning tree Trest is found for the remaining graph. j) we deﬁne the transformed cost c′ to be cij + πi + πj . but the costs of 1trees in general increase.2
Strategy for selecting next subproblem
The strategy for selecting the next live subproblem to investigate usually reﬂects a trade oﬀ between keeping the number of explored nodes in the search tree low. which is a tour.7. the new cost of a Hamilton tour is equal to the current cost plus two times the sum over V of the values π. the cost of Tone is less than or equal the cost of any Hamilton tour.terminology and general description
157
Here one ﬁrst identiﬁes a special vertex. however. e′ and a tree Trest in the rest of G.1 Branch and Bound . The idea is that vertices of Tone with high degree are incident with too many attractive edges. If one always
.
7: A bound calculation of the Branch and Bound algorithm for the symmetric TSP using the 1tree bound with “#1” equal to A and Lagrangean relaxation for bounding.158
Branch and Bound
C
A A
D
B 11 0 13 20 32 37 17 17
C 24 13 0 16 30 39 29 22
D 25 20 16 0 15 23 18 12
E 30 32 30 15 0 9 23 15
F 29 37 39 23 9 0 14 21
G 15 17 29 18 23 14 0 7
H 15 17 22 12 15 21 7 0
0 11 24 25 30 29 15 15
B
B H E A
C D E
G
F
F G H
Optimal tour.
. cost = 100 (a)
C
D B H E A
Tree in rest of G Edge left out by Kruskal’s MST algorithm 1tree edge
G
F
Cost of 1tree = 97
(b) Modified distance matrix:
C
A A
D B H E A
B 11 0 13 20 31 37 18 17
C 24 13 0 16 29 39 30 22
D 25 20 16 0 14 23 19 12
E 29 31 29 14 0 8 23 14
F 29 37 39 23 8 0 15 21
G 16 18 30 19 23 15 0 8
H 15 17 22 12 14 21 8 0
0 11 24 25 29 29 16 15
B C D E
G F
F G
Cost of 1tree = 97 (c)
H
Figure 9.
Since the lower bound of any subspace containing an optimal solution must be less than or equal to the optimum value. A subproblem P is called critical if the given bounding function when applied to P results in a value strictly less than the optimal solution of the problem in question. and hence memory problems may be expected if BeFS is used. Here a live node with largest level in the search tree is chosen for exploration. no superﬂuous bound calculations take place after the optimal solution has been found. Figure 9. memory problems arise if the number of critical subproblems of a given problem becomes too large. The alternative used is depth ﬁrst search. Figure 9. The situation more or less corresponds to a breath ﬁrst search strategy. called the best ﬁrst search strategy. The preceding argument for optimality of BeFS with respect to number of nodes processed is valid only if eager node evaluation is used since the selection of nodes is otherwise based on the bound value of the father of each node.terminology and general description
159
selects among the live subproblems one of those with the lowest bound. in which the nodes are processed when BeFS is used. Figure 9. only nodes of the search tree with lower bound less than or equal to this will be explored.they can never be discarded by means of the bounding function. Even though the choice of the subproblem with the current lowest lower bound makes good sense also regarding the possibility of producing a good feasible solution. For GPP sparse problems with 120 vertices often produce in the order of a few hundred of critical subproblems when the RoucairolHansen bounding function is used [4].8 (a) shows a small search tree . The explanation of the property regarding superﬂuos bound calculations lies in the concept of critical subproblems. BeFS. however. also be used in combination with lazy node evaluation.9. Nodes in the search tree corresponding to critical subproblems have to be partitioned by the Branch and Bound algorithm no matter when the optimal solution is identiﬁed . The number of nodes at each level of the search tree grows exponentially with the level making it infeasible to do breadth ﬁrst search for larger problems.8 (c) shows the DFS processing sequence number of the nodes. in which all nodes at one level of the search tree are processed before any node at a higher level. DFS.1 Branch and Bound . The memory requirement in terms of number of subproblems to store at the same time is now bounded above by the number of levels in the search tree multiplied by the maximum number of children of any node.the numbers in each node corresponds to the sequence.8 (b) shows the search tree with the numbers in each node corresponding to the BFS processing sequence. After the optimal value has been discovered only critical nodes will be processed in order to prove optimality. BeFS may. which is usually a quite manageable number.6 × 108 critical nodes using GilmoreLawler bounding combined with detection of symmetric solutions [5]. For QAP the famous Nugent20 problem [13] produces 3. and hence BeFS seems feasible.
.
g=0.5
9
f=g=4
f=g=2
7 8
(a)
f=g=3
f=g=4 f=1. g=0. g=0
1
f =1.5
7
f=g=4
f=g=2
8 9
f=g=3
f=g=4
(b)
f=1. g=2.5 2 f=3.160
Branch and Bound
f=1.5 f=g=1
3 4 8
7
f=2. and (c) Depth First Search. g=0
1
f =1.5 f=g=1
4 6 5
3
f=2.8: Search strategies in Branch and Bound: (a) Best First Search.5 2 f=3.
. g=0. g=0.5
9
f=g=4
f=g=2
5 6
f=g=3
f=g=4
(c)
Figure 9. (b) Breadth First Search. g=0
1
f =1. g=0. g=2.5 2 f=3. g=0.5 f=g=1
4 5 6
3
f=2. g=2.
1 Branch and Bound . the theoretical superiority of BeFS should be taken with a grain of salt. the bounding and branching of the basic algorithm is extended with additional tests and calculations at each node in order to enhance eﬃciency of the algorithm.terminology and general description
161
DFS can be used both with lazy and eager node evaluation. An advantage from the programming point of view is the use of recursion to search the tree . Surprisingly. The drawback is that if the incumbent is far from the optimal solution. often in the form of assigning values to variables.
9.3
Branching rule
All branching rules in the context of Branch and Bound can bee seen as subdivision of a part of the search space through the addition of constraints. The node selected for branching is chosen as the one. If the subspace in question is subdivided into two. Hence. For GPP branching is usually performed by choosing a vertex not yet assigned
. the term dichotomic branching is used. otherwise one talks about polytomic branching. which again may lead to superﬂuous calculations. for which the diﬀerence between the lower bounds of its children is as large as possible. the subproblems generated are disjoint .9. Normally. which when the procedure returns to the node with the large lower bound can be used to fathom the node. The idea is that exploring the node with the small lower bound ﬁrst hopefully leads to a good feasible solution. large amounts of unnecessary bounding computations may take place. DFS is often combined with a selection strategy. In [2] an experimental comparison of BeFS and DFS combined with both eager and lazy node evaluation is performed for QAP. both in terms of time and in terms of number of bound calculations.1. so only the constraints added in connection with the creation of each subproblem need to be stored.this enables one to store the information about the current subproblem in an incremental way. DFS is superior to BeFS in all cases.in this way the problem of the same feasible solution appearing in diﬀerent subspaces of the search tree is avoided. Convergence of Branch and Bound is ensured if the size of each generated subproblem is smaller than the original problem. In order to avoid this. and the number of feasible solutions to the original problem is ﬁnite. in which one of the branches of the selected node has a very small lower bound and the other a very large one. Note however that this strategy requires the bound values for children to be known. A combination of DFS as the overall principle and BeFS when choice is to be made between nodes at the same level of the tree is also quite common. The reason turns out to be that in practice.
however. Any heuristic may be used. a number of subproblems equal to the degree is generated. If a node has degree 3 or more in the 1tree. Genetic Algorithms. For TSP branching may be performed based on the 1tree generated during bounding. Figure 9. any such node may be chosen as the source of branching. For the chosen node. Among the general ones (also called metaheuristics or paradigms for heuristics). but a less eﬃcient algorithm. Then no further branching is required.
9.if the heuristic identiﬁes the optimal solution so that the Branch and Bound algorithm essentially veriﬁes the optimality.7.9 shows the branching taking place after the bounding in Figure 9. and a new subproblem is created for each of the free location by assigning the chosen facility to the location. The problem is further discussed as an exercise. In case of QAP.162
Branch and Bound
to any of V1 and V2 and assigning it to V1 in one of the new subproblems (corresponding to that the variable of the node receives the value 1) and to V2 in the other (variable value equal to 0). another key issue in the solution of large combinatorial optimization problems by Branch and Bound is the construction of a good initial feasible solution.
. not necessarily change. The bound does. For QAP and TSP. The eﬀect of the latter is not an incorrect algorithm. and the subspaces generated are disjoint. then even DFS will only explore critical subproblems. and Tabu Search are the most popular. a good and fast heuristic for GPP is the KernighanLin variable depth local search heuristic. and identical subproblems may arise after a number of branchings. the number of subproblems explored when the DFS strategy for selection is used depends on the quality of the initial solution . If BeFS is used. the value of a good initial solution is less obvious. and also here the subspaces are disjoint. In each of these one of the edges of the 1tree is excluded from the graph of the subproblem ruling out the possibility that the bound calculation will result in the same 1tree. Also branching on locations is possible. If all vertices have degree 2 the 1tree is a tour. Regarding the three examples. The scheme is called branching on facilities and is polytomic. This branching scheme is dichotomic. Simulated Annealing. an unassigned facility is chosen. and hence an optimal solution to the subproblem.1. very good results have been obtained with Simulated Annealing.4
Producing an initial solution
Although often not explicitly mentioned. As mentioned. and presently a number of very good general heuristics as well as a wealth of very problem speciﬁc heuristics are available.
A few concepts from parallel processing is. The relative speedup using p processors is deﬁned to be the processing time T (1) using one processor divided by the processing time T (p) using p processors:
S(p) = T (1)/T (p)
. To measure the success in this aspect. the speedup of adding processors is measured. Even though parallelism is not an integral part of Branch and Bound.
9.9: Branching from a 1tree in a Branch and Bound algorithm for the symmetric TSP. I have chosen to present the material. Using parallel processing of the nodes in the search tree of a Branch and Bound algorithm is a natural idea.2
Personal Experiences with GPP and QAP
The following subsections brieﬂy describe my personal experiences using Branch and Bound combined with parallel processing to solve GPP and QAP.2 Personal Experiences with GPP and QAP
C
163
D B H E A
G
F
Exclude (A. necessary. Most of the material stems from [3.9. since the bound calculation and the branching in each node is independent.H)
Exclude (H. 4] and [5]. since the key components of the Branch and Bound are unchanged. however. The aim of the parallel procesing is to speed up the execution time of the algorithm.G)
C
C
C
D B H E A A B H
D
D B H E A E
G
F
G
F
G
F
Cost of 1tree = 98
The optimal tour !
Cost of 1tree = 104
Figure 9.H) Exclude (D.
and its value was usually close to the optimal solution value. No dedicated communication processors were present.is based on Lagrangean relaxation and has a bounding function giving tight. One .164
Branch and Bound
The ideal value of S(p) is p . The “on overload” strategy is based on the idea that if a processor has more than a given threshold of live subproblems. [6] was used.
9. where the pool of live subproblems is distributed over all processors. Regarding termination. whereas the other .two concrete examples are given in the following. An important issue in parallel Branch and Bound is distribution of work: in order to obtain as short running time as possible. The system used was a 32 processor IPSC1/d5 hypercube equipped with Intel 80286 processors and 80287 coprocessors each with 512 KB memory. The ﬁrst feasible solution was generated by the KernighanLin heuristic.then the problem is solved p times faster with p processors than with 1 processor.10. The scheme is illustrated in Figure 9. but additional ones are described in [7]. and the communication facilities were Ethernet connections implying a large startup cost on any communication.2. but computationally expensive bounds.1
Solving the Graph Partitioning Problem in Parallel
GPP was my ﬁrst experience with parallel Branch and Bound.the RHalgorithm . Various possibilities for load balancing schemes exist . Both algorithms were of the distributed type. results regarding processor utilization and relative speed
. the algorithm of Dijkstra et. and as strategy for distributing the workload we used a combined “on demand”/”on overload” strategy. and we implemented two parallel algorithms for the problem in order to investigate the trade oﬀ between bound quality and time to calculate the bound. care must be take to ensure that the system is not ﬂoated with communication and that ﬂow of subproblems between processors takes place during the entire solution process. no processor should be idle at any time during the computation. For the CTalgorithm. this issue becomes particularly crucial since it is not possible to maintain a central pool of live subproblems.called the CTalgorithm . However. al. The selection strategy for next subproblem were BeFs for the RHalgorithm and DFS for the CTalgorithm. If a distributed system or a network of workstations is used. a number of these are sent to neighbouring processors.uses an easily computable bounding function based on the principle of modiﬁed objective function and produces bounds of acceptable quality.
and RHalgorithm on a 70 vertex problem with respect to running times. util. and Figure
.1: Comparison between the CT. time (sec) proc. the algorithm was clearly inferior to the CTalgorithm. util. (a) is the situation. Finally we observed that the best speedup was observed for problems with long running times. of bound calc. Hence the tight bounds did not pay oﬀ for small problems – they resulted idle processors and long running times.
up were promising. of bound calc.
Table 9. and linear speedup close to the ideal were observed for problems solvable also on a single processor. (%) no. For large problems.2 Personal Experiences with GPP and QAP
165
to neighbours queuemax (a)
queueup
queuedown new queuemax
queuemax (b)
queuedown new queuemax
Figure 9. processor utilization. when a processor when checking ﬁnds that it is overloaded. (%) no. and number of subproblems solved. relative speedup and processor utilization. CT 4 1964 97 449123 1533 89 377 8 805 96 360791 1457 76 681 16 421 93 368923 1252 61 990 32 294 93 522817 1219 42 1498
RH
time (sec) proc. We continued to larger problems expecting the problem to disappear. and (b) shows the behaviour of a nonoverloaded processor No. both with respects to running time. of proc.9.10: Illustration of the on overload protocol. The RHalgorithm behaved diﬀerently – for small to medium size problems. a processor utilization near 100% was observed.
2
Solving the QAP in Parallel
QAP is one of my latest parallel Branch and Bound experiences. cf.166
Branch and Bound
32
CTbounding RHbounding ideal 16
8 4 4 8 16 processors 32
Figure 9. The aim of the research was in this case to solve the previously unsolved benchmark problem Nugent20 to optimality using a combination of the most advanced bounding functions and fast parallel hardware.11: Relative speedup for the CTalgorithm and the RHalgorithm for a 70 vertex problem. Here it is obvious that using more processors for the RHmethod just results in a lot of superﬂuous subproblems being solved.
9. Table 9. Each processor has a peak performance of 30 MIPS when doing integer calculation giving an overall peak performance of approxi
.and RHalgorithms.2. The explanation was found in the number of critical subproblems generated. We used a MEIKO system consisting of 16 Intel i860 processors each with 16 MB of internal memory. For the RH method it seemed impossible to use more than 4 processors. as well as any other trick we could ﬁnd and think of.2. 9. We found that the situation did by no means improve. which does not decrease the total solution time.1 shows the results for a 70vertex problem for the CT.11 and Table 9.
Vert. 4 91 11 150 15 453 8 419 26 1757 19 2340 75 3664 44 3895 47 4244
167
No. At each communication the processors exchange information on the respective sizes of subproblem pools.
. subpr. Workload distribution was kept simple and based on local synchronization. subproblems are sent between the processors.7 for a moderately sized problem with a sequential running time of 1482 seconds and a parallel running time with 16 processors of 108 seconds. subpr. Both synchronous and asynchronous communication are possible. The performance of each single i860 processor almost matches the performance of the much more powerful Cray 2 on integer calculations. indicating that the system is very powerful. spending less than one percent of the total running time in the heuristic enabled us to start the parallel solution with the optimal solution as the incumbent. and the software supports pointtopoint communication between any pair of processors. The connections are software programmable. We used simulated annealing. a tailored algorithm were used for this purpose. 30 40 50 60 70 80 100 110 120
Sec.9. 1 2 5 18 188 931 8825 34754 –
Sec.2 Personal Experiences with GPP and QAP CTalgorithm Cr. and as reported in [5].
mately 500 MIPS for the complete system. The selection strategy used was a kind of breadth ﬁrst search. B.2: Number of critical subproblems and bound calculations as a function of problem size. calc. Regarding termination detection. and also both blocking and nonblocking communication exist. The processors each have two Inmos T800 transputers as communication processors. 49 114 278 531 1143 1315 3462 4311 5756
Table 9. Hence only critical subproblems were solved. and based hereon. The feasibility hereof is intimately related to the use of a very good heuristic to generate the incumbent. Each transputer has 4 communication channels each with bandwidth 1. Each processor in the ring communicates with each of its neighbours at certain intervals. calc. B. The speedup obtained with this scheme was 13. 103 234 441 803 2215 3251 6594 11759 56714 171840 526267 901139 2313868 5100293 8469580 15203426 – – RHalgorithm Cr. The basic framework for testing bounding functions was a distributed Branch and Bound algorithm with the processors organized as a ring.4 Mb/second and startup latency 340 µs.
293
Init.
The main results of the research are indicated in Table 9. Time (s) 105773 10 320556 34 24763611 2936 114948381 14777 360148026 57963 471 0.168 Mautor & Roucairol No.026 0.510 35. w.
. and for problems solved by other researchers. 1 1 6 24 41 66
Static dist.925
Table 9. 0. the results clearly indicated the value of choosing an appropriate parallel system for the algorithm in question.746 38.
Problem Nugent 8 Nugent 10 Nugent 12 Nugent 14 Nugent 15 Nugent 16
Dynamic dist.381 13. an algorithm with static workload distribution was also tested. The subproblems distributed to each processor were generated using BeFS sequential Branch and Bound until the pool of live subproblems were suﬃciently large that each processors could get the required number of subproblems. per proc. 0. subpr.040 0.060 0.2 Nugent 17 Nugent 18 Nugent 20 Elshafei 19 Armour & Buﬀa 20
Table 9. symmetry No. nodes. nodes. where p is the number of processors. We managed to solve previously unsolved problems.4 531997 1189
Branch and Bound Fac. br. The optimal number of subproblems pr. To get an indication of the eﬃciency of socalled static workload distribution in our application.3: Result obtained by the present authors in solving large standard benchmark QAPs.4: Result obtained when solving standard benchmark QAPs using static workload distribution. Time (s) 97286 121 735353 969 – – – – – – 575 1.4.5 504452 111
Problem Nugent 15 Nugent 16.112 11.079 0.3. Hence all processors receive equally promising subproblems. processors were determined experimentally and equals roughly (p − 8)4 /100.792 10. The results appear in Table 9. Results obtained by Mautor and Roucairol is included for comparison.328 12. Results obtained with dynamic distribution are included for comparison.
g. 2.
9.9.5
Supplementary Notes Exercises
1. and the time used for ﬁnding such one is often only few percentages of the total running time of the algorithm. Some of the points of the list have already been mentioned in the preceding sections.
169
9. I will in the following try to equip new users of Branch and Bound .both sequential and parallel .with a checklist corresponding to my own experiences. while some are new. This is in particular true if the pure Branch and Bound scheme is supplemented with problem speciﬁc eﬃciency enhancing test for e. and if the branching performed depends on the value of the current best solution. Finish the solution of the biking tourist’s problem on Bornholm.4 9.
Rather than giving a conclusion.3 Ideas and Pitfalls for Branch and Bound users.3
Ideas and Pitfalls for Branch and Bound users. Devise a branching rule.9 may produce nodes in the search tree with nondisjoint sets of feasible solutions. • With a diﬀerence of more than few percent between the value of the initial solution and the optimum the theoretically superior BeFS Branch and Bound shows inferior performance compared to both lazy and eager DFS Branch and Bound. • The importance of ﬁnding a good initial incumbent cannot be overestimated. the choice of node selection strategy and processing strategy makes little diﬀerence. supplementary exclusion of subspaces. Solve the symmetric TSP instance − − (ce ) = − − with n = 5 and distance matrix 10 2 4 6 − 9 3 1 − − 5 6 − − − 2
. 3. which ensures that all subspaces generated are disjoint. Give an example showing that the branching rule illustrated in Figure 9. • In case an initial solution very close to the optimum is expected to be known.
a relaxed unconstrained problem with modiﬁed objective function results for any λ.. 6.1. 5. The total cost of any feasible solution in the subspace determined by a partial solution consists of three terms: costs for pairs of assigned facilities.. Prove that the new objective is less than or equal to the original on the set of feasible solutions for any λ. scheme is the following: Consider now partial solution in which m of the facilities has been assigned to m of the locations. Bounds for each of the two other terms can be found based on the fact that a lower bound for a scalar product (a1 . 7.. the nextlargest in a with the nextsmallest in b etc. costs for pairs consisting of one assigned and one unassigned facility. Consider again the QAP as described in Example 3. is obtained by multiplying the largest element in a with the smallest elements in b. bπ(p) ). . Solve the asymmetric TSP instance − 12 (cij ) = 19 19
with n = 4 and distance matrix 6 4 17 − 7 20 9 − 14 19 5 −
by Branch and Bound using an assignment relaxation to obtain bounds. The simplest bound calculation scheme is described in Section 2.
. and costs for pairs of two unassigned facilities. The ﬁrst term can be calculated exactly. Consider the GPP as described in Example 1. where a and b are given vectors of dimension p and π is a permutation of {1. Prove that this holds also with the weaker condition that the number of critical nodes is polynomially bounded.170
Branch and Bound by Branch and Bound using the 1tree relaxation to obtain lower bounds. then the problem belongs to P... .. though still simple. By including the term λ(
v∈V
xv − V /2)
in the objective function.
4. Prove that if the number of semicritical nodes in the search tree corresponding to a Branch and Bound algorithm for a given problem is polynomially bounded.. Formulate the problem of ﬁnding the optimal value of λ as an optimization problem.. A more advanced.. p}. ap ) · (bπ(1) . . A node in a Branch and Bound search tree is called semicritical if the corresponding bound value is less than or equal to the optimal solution of the problem.
n j=1 cj xj n j=1 aj xj ≤ n
b
x∈B
with aj .5 Exercises
171
For each assigned facility. . (a) Consider the instance given in Figure 6. 8. A solution with value 314 exists.
. n. any feasible solution can be used. the branching strategy described in text. . Find the optimal solution to the instance using the bounding method described above. Consider the 01 knapsack problem: max s. The scalar product is now a lower bound for the communication cost from the facility to the remaining unassigned facilities. the ﬂows to unassigned facilities are ordered decreasingly and the distances from the location of the facility to the remaining free locations are ordered increasingly.t.9. and DFS as search strategy. . cj > 0 for j = 1. where the distances between locations are given as the rectangular distances in the following grid:
1 2 3 4 5 6
The ﬂows between pairs of facilities are given by 0 20 20 0 0 20 15 0 0 30 1 2 0 20 0 2 0 10 15 0 2 0 15 2 0 30 0 15 0 30 1 2 10 2 30 0
F =
Solve the problem using Branch and Bound with the bounding function described above. Try prior to the Branch and Bound execution to identify a good feasible solution. The total transportation cost between unassigned facilities can be bounded in a similar fashion. To generate a ﬁrst incumbent. 2. (b) Consider now the QAP. .
2. . r − 1.172
Branch and Bound (a) Show that if the items are sorted in relation to their cost/weightratio then the LP relaxation can be established as xj = 1 for j = r−1 1. .
. . xr = (b − j=1 aj )/ar and xj = 0 for j > r where r r−1 r is deﬁned by such that j=1 aj ≤ b and j=1 aj > b. . (b) Solve the instance max 17x1 + 10x2 + 25x3 + 17x4 5x1 + 3x2 + 8x3 + 7x4 ≤ 12 x ∈ B4
by Branch and Bound.
Dijkstra. Trienekens. [3] J. “On the Best Search Strategy in Parallel BranchandBound . Gendron and T. de Bruin. Proceedings of POC96 (1996). J. “Derivation of a termination detection algorithm for distributed computations”. Lett. DIKU report 1994/22. Ibaraki. Lazy DepthFirstSearch”. M. Clausen and M. “Parallel BranchandBound Algorithms: Survey and Synthesis”. 42 (1988). [6] E. J. Clausen. [5] J. [8] T.349. 11. L.C. Feijen and A. 11 p. H. L.BestFirstSearch vs. also DIKU Report 96/14. Annals of Operations Research vol. “Solving Large Quadratic Assignment Problems in Parallel”. Parallel Processing Letters 4. 3 . Math. H. 14 p. G.
. p. 217 . 16 (1983). [7] B. 1042 . J. 10. p. “Enumerative Approaches to Combinatorial Optimization”. Prog. J. Res. “A Simulation Tool for the Performance of Parallel Branch and Bound Algorithms”. W.271. “Implementation of parallel BranchandBound a algorithms . “Do Inherently Sequential Branchanda Bound Algorithms Exist ?”. Perregaard. W.Baltzer 1987.. Tr¨ﬀ. Annals of Oper.13. Operations Research 42 (6) (1994). to appear in Computational Optimization and Applications. Proc.219. 33 (1991) 331 . Inf. Tr¨ﬀ. Clausen and M. A. Clausen and J. Perregaard.experiences with the graph partitioning problem”. [2] J.1066. [4] J. 12 (1994). G. 245 .Bibliography
[1] A. p. van Gasteren. Cranic. Rinnooy Kan and H.
[11] E. Lawler. [10] P. DIKU report 94/35. “Parallel Optimization Algorithms . Laursen. 150 . [12] T.Eﬃciency vs. A. Oper. “A new exact algorithm for the solution of quadratic assignment problems”. 337 . Sch¨tt and J. p.Experimental Evaluation of Three Distributed Algorithms”. J. J. Mautor. “The Travelling Salesman: A Guided Tour of Combinatorial Optimization”. Lenstra. of Comp. Perregaard and J. Res. Laursen.173. K. T. Clausen . H. AMS DIMACS Series in Discrete Mathematics and Theoretical Computer Science 22 (1995).thesis. Science. Vollmann. “Simple approaches to parallel Branch and Bound”.351.174
BIBLIOGRAPHY [9] P. Ruml. Clausen. B. L. S. Shmoys (ed. Univ. [15] C. Parallel Computing 19 (1993). S. simplicity”. Ph. [13] C. Dept. “Parallel Algorithms for the Assignment u Problem . to appear in Annals of OR. “Solving Large Job Shop Scheduling Problems in Parallel”.Rinnooy Kan.D. 143 . G.152. p. [14] M. John Wiley 1985. 16 (1968). D. .). DIKUReport 94/31 (1994). p. “An experimental comparison of techniques for the assignment of facilities to locations”. Roucairol. Discrete Applied Mathematics 55 (1994) 281293.
. C. of Copenhagen. Nugent.
dk are • serv2. Start a xterm.imm.dtu. Alternatives to serv1.dtu. select terminals and then terminal ).Appendix
A
A quick guide to GAMS – IMP
To login at IMM from the GBAR is easy: 1. 3. You will be prompted for a password. (Click on the middle mousebuttom. Login at your favorite xterminal at the GBAR. 2.dk • serv3. In that xterm write: • ssh l nh?? serv1.imm.dtu.dk You now have a shellwindow working at IMM. Remember to change password at IMM with the
.dk where nh?? correspond to your course login name.dtu.dtu.imm.dk • sunfire.imm.imm.
g.0. In this course we will only use CPLEX in interactive mode.0 with Simplex.176 passwd command. At IMM we have 20 licenses. licensed to "universitylyngby". CPLEX> To end the session type quit and press the enterkey. To start a CPLEX session type cplex and press the enterkey. so at most 20 persons can run CPLEX independently of each other. C.
A quick guide to GAMS – IMP
To write a ﬁle at IMM start an emacs editor at IMM by writing (in the xterm working at IMM):
emacs & To start cplex write (in the xterm working at IMM):
cplex64 You can start more xterms to work at IMM in the same way.000. You will then get an introductory text something like:
ILOG CPLEX 9. CPLEX can be run in interactive mode or used as a library callable from e. Please do not quit CPLEX by closing the xterm window before having exited CPLEX. Type ’help’ followed by a command name for more information on commands. options: e m b q Welcome to CPLEX Interactive Optimizer 9. C++ or Java programs. If
. Mixed Integer & Barrier Optimizers Copyright (c) ILOG 19972003 CPLEX is a registered trademark of ILOG Type ’help’ for a list of available commands.
These must be entered either through the builtin editor of CPLEX or by entering the problem in ”your favorite editor”. Subject to can be replaced with st. the command display solution variabl and press enter is used.lp <enter> optimize <enter> display solution variables . The events of the session has been:
cplex <enter> read soejle. saving it to a ﬁle. CPLEX solves linear and integer programming problems.
After having entered a problem. An example of the logﬁle corresponding to the solution of the example above is shown below. and then reading the problem into CPLEX by typing read <filename. which records the events of the session.lp and the contents has to be in the lpformat. To see the result. A small example is shown below. The ﬁle has to have extension . ”” indicating that the values of all variables are to be displayed. CPLEX writes a logﬁle. The text written in the start of each line containing the objective function or constraints is optional. it can be solved by giving the command optimize at the CPLEX> prompt and press enter.177 you quit by closing the xterm window before leaving CPLEX in a proper manner it will take the license manager some time to recover the CPLEX license.<enter> quit <enter>
.lp Minimize obj: x1 + x2 + x3 + x4 Subject To c1: x1 + 2 x3 + 4 x4 >= 6 c2: x2 + x3 >= 3 End
The ﬁrst line is a remark and it stretches the entire line.
\Problem name: soejle. so obj: can be omitted.lp>.
5.lp Minimize obj: x1 + x2 + x3 + x4 Subject To c1: x1 + 2 x3 + 4 x4 >= 6 c2: x2 + x3 >= 3 Bounds x1<=10 x2<=10 Integer
.0000000000e+00 0.000000 All other variables in the range 14 are zero.00 sec. but we need to state the upper bound and the integrality condition.000000
Objective = 3. In this case our program will look like: \Problem name: soejle. Presolve time = 0. and 4 nonzeros. That the variables are nonnegative are implicitely assumed by CPLEX. Iteration: 3 Objective = Primal . Iteration: 1 Infeasibility = Switched to devex. . 3 columns. Read time = 0. .lp’ read. Iteration log .000000 3.00 sec.178 and the resulting logﬁle looks like:
A quick guide to GAMS – IMP
Log started (V6. Tried aggregator 1 time. Reduced LP has 2 rows. Iterations = 3 (2)
Variable Name Solution Value x3 3.Optimal: Solution time =
3.00 sec.1) Tue Feb 15 10:24:58 2000
Problem ’soejle. LP Presolve eliminated 0 rows and 1 columns. Now let us take our initial problem and assume that we want x1 and x2 to be integer variables between 0 and 10.
. If you want the integer variable to have no upper bound you can x2<=INF in the bounds section. The bounds section must be placed before the section declaring the integer variables. Also. CPLEX provides help if the current command is not suﬃcient to uniquely to determine an action.179 x1 x2 End Bounds is used to declare bounds on variables. The command help shows the possible commands in the current situation.these possibilities may be investigated by entering help as the ﬁrst command after having entered CPLEX. if one types display CPLEX will respond with listing the options and the question ”Display what ?” CPLEX also oﬀers possibilities to change parameters in a problem already entered . Integer states that x1 and x2 must be integer solutions. As an example. and the section afterwards. It does not seem intuitive nevertheless if you do not state a bounds part CPLEX will assume the integer variables to be binary.
180
A quick guide to GAMS – IMP
.
A crossdock is a special form of warehouse where goods are only stored for a very short time. is then possibly repacked and then put on trucks for their destination. In the current setup of their supply chain products are transported from a factory to a warehouse via a crossdock. The company has three factories F1. CD1 and CD2. and from crossdock to warehouse. F2 and F3 and four warehouses W1. Table B.1 show the transportation capacities measured in truck loads from factory to crossdock. The ﬁrst task that lands on your desk is a preliminary study for a big manufacturing company Rubber Duck. It is used in order to consolidate the transportation of goods. in their operations.
. Rubber Duck uses two crossdocks. The availability of trucks deﬁnes an upper limit on how many truck loads we can transport from a factory to a crossdock and from a crossdock to a warehouse. W2. W3 and W4. Goods arrive at a crossdock.Appendix
B
Extra Assignments
Question 1
You have just been employed as a junior OR consultant in the consulting form Optimal Solutions.
The graph must contain all information necessary when formulation a maximum ﬂow problem.
Question 1. Note that it is not possible to supply warehouse W4 from crossdock CD1 and warehouse W1 from crossdock CD2. Write the value of the maximum ﬂow and list each of the augmenting paths. Therefore give a maximum ﬂow formulation of it.v)∈E xuv ≤ cv . Show the ﬂow graphically on a ﬁgure of the network. This can be viewed as an extension to the maximum ﬂow model.
. Let cv be the maximum number of truck loads which can pass through an internal vertex v. crossdock and warehouse. Your network should have 11 vertices: a source (named 0). and a sink (named 11). Draw a graph showing the problem as a maximum ﬂow problem.3
Just as you are about to return a small report on your ﬁndings to your manager a phone call from a planner from Rubber Ducks logistics department reveals some new information. Find a minimum cut and list the vertices in the minimum cut. It turns out that there is a limit on how much each of the crossdocks can handle.182 F1 34 40 W1 30 — F2 28 40 W2 24 26 F3 27 22 W3 22 12
Extra Assignments
CD1 CD2 CD1 CD2
W4 — 58
Table B.2
Solve the maximum ﬂow problem using the augmenting path algorithm. the ﬂow capacity of vertex v can be expressed as (u. Describe how to modify a network so that this new extension can be handled as a maximum ﬂow problem in a modiﬁed network.1
We want to determine how much Rubber Duck at most can get through their supply chain from the factories to the warehouses.
Question 1.
Question 1. a vertex for each factory.1: Capacities in the Rubber Duck supply chain measured in truck loads. Given a ﬂow xuv from vertex u to vertex v.
and a set of constraints expressing that for each partition of the set of locations into two nonempty subsets with a branch in each. This operator has a backbone network connecting a number of locations. given that all costs are nonnegative. Show the ﬂow graphically on a ﬁgure of the network. Your task is to analyze the situation of the company and determine which connections to lease in order to establish the cheapest intranet. and these prices (although usually positive) may be negative to reﬂect a discount given by Yellow. Fortunately. where 3 branches have to be connected in a network with 6 locations. Find a minimum cut and list the vertices in the minimum cut. however.1 shows a small example. A medium sized company with branches geographically distributed all over Denmark has approached Optimal Solutions. The model should have a decision variable for each connection. and you can lease communication capacity between individual pairs of points in the backbone network and in that way build your own network. The price is calculated as the sum of the prices for the individual connections.
Question 2
Your success on the ﬁrst assignment quickly lands you another case. not all locations house a branch of the company. Each customer is oﬀered a ”customized” price for each connections. The pricing policy of Yellow is as follows: Each customer individually buys in on a set connections chosen by the customer.1
Formulate a general mathematical programming model for the optimization problem of establishing the cheapest intranet.
Question 2. Yellow guarantees that the connections will always be operative. You do therefore not have to consider failure protection. at least one connection joining
. The company you are going to work for has decided to build an intranet based on communication capacity leased from a large telecommunication vendor. all branches are located in one of the locations of the backbone network.183 Given that the capacity of crossdock CD1 is 72 truck loads and of CD2 is 61 truck loads introduce the modiﬁcations and solve the new maximum ﬂow problem. Yellow. Figure B. Write the value of the maximum ﬂow and list each of the augmenting paths.
the two subset must be chosen. Can the model be used also when prices may be negative? Would you recommend the use of the model for large companies with hundreds of branches?
Question 2.1: An example of a telecommunications network . The branch with the shortest distance to b0 will
. Let us call it b0 . As you realize the proximity of the problem to the minimum spanning tree you hit upon the following idea inspired by Prims algorithm: Start by choosing (randomly) a branch. which gives a good (but not necessarily optimal) solution fast. Such a method is called a heuristic.2
Use the model to solve the example problem of Figure B. Now for each other branch compute the shortest distance to b0 . Solve the model also when the integrality constraints on the variables are removed.
Question 2.1 using CPLEX with 01 variables.the branches to be connected of are the black vertices.184
Extra Assignments
5
10 9
4 5 9 6
7
6
5
Figure B. Comment on the two solutions obtained and their objective function values.3
An alternative to solving the problem to optimality is to use a method.
even though you don’t need the connection. You immediately remember something about project management from you engineering studies. In Figure B. To get into the tools and the way of thinking. The process is terminated when all branches are added to the tree. You have been put in charge of the rather large project of implementing the ITsystem and the corresponding organizational changes. and 2) there may be edges of negative length. Then add the connections on the path from b1 to b0 to the set of selected connections. Which connections should be leased.
Question 3
As a result of you analysis Rubber Duck has decided to implement a supply chain optimization ITsystem. the “killer” part of the described algorithm is the shortest path calculations. whereby you end up with a network with only nonnegative edges lengths. the network of your clients company is shown. if these are determined with the method just described with b0 chosen to be A? Show the sequence in which the locations are considered.
. ﬁnd an unconnected branch the shortest distance to any location that has been selected. all branches are connected. Obviously. that is. How would you cope with the nondirectedness of the network. you decide to solve a small test case using PERT/CPM.2. The available data is presented in Table B. However. you realize that it may be easier to solve the problem regarding negative edge lengths by changing the problem to reﬂect that it is always beneﬁcial to choose an edge of negative cost. They have speciﬁcally asked for your involvement in the project. In each of the subsequent iterations we continue this approach. that is. You now are in the position that you may choose any of the shortest path algorithms touched upon in the course.2. Determine the unconnected branch bi with the shortest distance to the tree already generated. and which shortest path algorithm would you suggest to use given that negative edge lengths are present? However. Describe the corresponding problem transformation.185 be denoted b1 . there are two problems: 1) the network is not directed. and the connections added in each iteration.
then either gi is equal to 0 or xi = Di − di in any optimal solution.1
Find the duration of the project if all activities are completed according to their normal times.4
What is the additional cost of shortening the project time for the test case to 16?
. a general LP model is formulated allowing one to ﬁnd the cheapest way to shorten a project.
Question 3.
Question 3. pred.3
For each activity i there is a variable xi . “xi ≥ 0” and “xi ≤ Di − di ”. Denote the dual variables of the latter gi .2
Formulate the model for the test project and state the dual model. Normal time (Di ) 2 3 2 4 4 9 6 Crash time (dj ) 1 2 1 2 1 2 2
Extra Assignments Unit cost when shortening 7 7 5 2 6 2 9
e0 e1 e1 e2 . e3 e0 e4 .5.186 Activity e0 e1 e2 e3 e4 e5 e6 Imm. e5
Table B.
Question 3.
Question 3. Furthermore there are also two constrains. Argue that if di < Di . In Hillier and Lieberman Section 22.2: Data for the activities of the test case.
The company has decided to build an intranet based on communication capacity leased from a large telecommunication vendor.
After having reviewed your solution to the ﬁrst project. the pricing strategy of Yellow seems odd in that connections with negative prices are “too attractive”.
Background for the second project. The pricing policy of Yellow is as follows: Each customer individually buys in on a set connections chosen by the customer. however.1
In order to prepare for possible changes in the pricing strategy a model must be built.
Question 4.187
Background for the ﬁrst project. Yellow.
For convenience. Fortunately. Each customer is oﬀered a ”customized” price for each connections. You are employed by the consulting ﬁrm Optimal Solutions and one of your customers are a medium sized company with branches geographically distributed all over Denmark. not all locations house a branch of the company. The price is the the sum of the prices for the individual connections.
. all branches are located in one of the locations of the backbone network. This operator has a backbone network connecting a number of locations. its is necessary to be able also to ﬁnd the optimal solution to the problem. and these prices (although usually positive) may be negative to reﬂect a discount given by Yellow. and you can lease communication capacity between individual pairs of points in the backbone network and in that way build you own network. Yellow guarantees that the connections will always be operative. 2) Though good solutions found quickly are valuable. the company comes back with the following comments: 1) As you have pointed out. even if this may be quite time consuming. in which the intranet solution proposed must be a connected network. the background from the ﬁrst project is repeated below. You are now hired to analyzed the situation of the company and determine which connections to lease in order to establish the cheapest intranet. so failure protection is not an issue.
3
The constraints in your mathematical model express that no cut in the network with capacity less than one (where “capacity” is measured by the sum over xvalues on edges across the cut) separating two branch nodes must exist. In order to apply Branch & Cut.e. it is not possible to extract the updated coeﬃcients in a tableau row. Comment on the result. i. which given a potential solution returns a violated inequality if such one exists. but which is currently not fulﬁlled. They have informed you that a new price update will arrive soon. Maybe these are applicable? You realize that using CPLEX in interactive mode. You then solve the problem using CPLEX. You initially set up an LP with objective function and the bounds “xij ≤ 1” in CPLEX and add only one constraint. but you can extract the updated coefﬁcients in the objective function (the reduced costs). The network is shown in Figure B.
Question 4. Until then you decide to continue your experiments on some other data you got from your client.4
You suddenly come to think about Gomory cuts. namely that the number of connections chosen must be at least 4.188 Formulate such a mathematical model. You decide to try to solve the problem using Branch & Cut based on your mathematical model from Question 2.2
It now turns out that Yellow has also discovered the ﬂaw in their original pricing strategy. Is it in theory possible to
.
Question 4.1 of the ﬁrst project. you need a separation routine.
Extra Assignments
Question 4. Add this constraint and solve the revised problem. Explain how a maximum ﬂow algorithm can be used as separation routine and estimate the complexity of this approach.3 on page 193. What is the result? Identify a constraint. an algorithm. which must be fulﬁlled by an optimal integer solution.
2?
Question 5
Your boss suddenly runs into your room and cries: “Drop everything . you have to explain why any nonnegative multiplier gives rise to a lower bound for your given problem. and a cost matrix O showing the cost of educating person i for job j (the matrices are shown below). what is the result for the resulting LPtableau from question 1.I have an urgent job for you.189 generate a Gomory cut from the updated coeﬃcients in the objective function? If so. which you then solve. We need to win this new client. you have to use the objective value of the assignment problem resulting from the relaxation of the budget constraint with a nonnegative multiplier. i. The lower bound computation should be done using Lagrangian relaxation as described in the next section. and exactly one person has to be assigned to each job.”
N =
4 10 0 8 3 10 2 9 1 8
6 2 5 4 4
5 8 6 8 9
4 1 6 4 3
O=
4 5 5 8 4 10 2 9 2 5
6 24 8 2 3
5 3 3 10 1
3 6 5 8 4
You decide to solve the problem to optimality using BranchandBound.
. j) the proﬁt earned from letting person i do job j. a proﬁt matrix N showing in entry (i. In order to explain the method to your boss.e. You ﬁrst convert the problem to a cost minimization problem. Find the best assignment of persons to jobs in the following situation: I have 5 persons and 5 jobs. The educational budget is 18.
2
In BranchandCut. Max Flow can be used based on that the xvalues are interpreted as capacities. You should ﬁnd one violated constraint by inspection.
Question 4.
Question 4. you ﬁrst solve the LP relaxation.4
The basic calculations leading to the classical Gomory cut from a chosen cutrow. How to model constraints of the type “xj = 0 implies xi = 0” is described in Hillier and Lieberman . resolve.1
Consider an edge {i. j}. with nonintegral right hand side starts with the equation corresponding to the i’th row of the Simplex tableau.190
Extra Assignments
Suggested line of work for each question
Question 4.
Question 5
Converting the proﬁt maximization problem to a cost minimization problem with nonnegative cost is done by multiplying all values in the proﬁt matrix by −1 and adding the largest element of the original proﬁt matrix to each
. You have to ensure that “if xij = 1 then i must not be separated form any set containing all branches”. i. this has exponentially many constraints.ﬁgure out how to express “xj = 1 implies xi = 1” and use the idea in connection with the usual connectedness constraints know from the ﬁrst project. add it to the system. Start out assuming that the objective function value is nonintegral. and describe the eﬀect.
Question 4. But you have to take care that only cuts separating two branch vertices are taken into consideration. and as for TSP.3
As for TSP. write out the “objective function equation” and check if all steps in the derivation procedure are valid.
e. choose a variable to branch on and create two new subproblems.191 element.g. Formulation of the Lagrangian relaxation can be done by subtracting the righthand side of the budget constraint from the lefthand side. multiplying the diﬀerence with λ and adding the term to the objective function. Collecting terms to get the objective function coeﬃcient for each xij . If the budget constraint is not fulﬁlled. Choose a value of λ. If an optimal solution cannot be found through adjustments of λ (what are the conditions. then try to increase λ and resolve. 1. each of which is treated as the original problem. you get an assignment problem for each ﬁxed value of λ. and solve the assignment problem.
. under which an optimal solution to the relaxed problem is also optimal for the original problem?).
The black locations correspond to your branches to be connected.2: The intranetwork problem of your company.192
Extra Assignments
3 B 5 7 A 9 6 5 1 4 3 10 G 7 9 E 3 5 1 5 6 4 4 4 L H 2 I C 7 5 4 D J 2 3 K
2
F 7
M
Figure B.
.
The black locations correspond to your branches to be connected.3: An example intranetwork problem from your client.
.193
1
B
5 F 3
K
7 A 3 G 2 8 H 3 2 J 6 D 4
10 4 2 1 8 3 2 4 I 3 1 4 5 C 1 1
6
E
L
1
Figure B.